\

Show HN: Rudel – Claude Code Session Analytics

137 points - yesterday at 1:41 PM


We built rudel.ai after realizing we had no visibility into our own Claude Code sessions. We were using it daily but had no idea which sessions were efficient, why some got abandoned, or whether we were actually improving over time.

So we built an analytics layer for it. After connecting our own sessions, we ended up with a dataset of 1,573 real Claude Code sessions, 15M+ tokens, 270K+ interactions.

Some things we found that surprised us: - Skills were only being used in 4% of our sessions - 26% of sessions are abandoned, most within the first 60 seconds - Session success rate varies significantly by task type (documentation scores highest, refactoring lowest) - Error cascade patterns appear in the first 2 minutes and predict abandonment with reasonable accuracy - There is no meaningful benchmark for 'good' agentic session performance, we are building one.

The tool is free to use and fully open source, happy to answer questions about the data or how we built it.

Source
  • c5huracan

    today at 2:17 PM

    The "no meaningful benchmark for good agentic session performance" point resonates. Success varies so much by task type that a single metric is almost meaningless. A 60-second documentation lookup and a 30-minute refactoring session could both be successes.

    Curious what shape the benchmark takes. Are you thinking per-task-type baselines, or something more like an aggregate efficiency score?

    • dmix

      yesterday at 3:22 PM

      I've seen Claude ignore important parts of skills/agent files multiple times. I was running a clean up SKILL.md on a hundred markdown files, manually in small groups of 5, and about half the time it listened and ran the skill as written. The other half it would start trying to understand the codebase looking for markdown stuff for 2min, for no good reason, before reverting back to what the skill said.

      LLMs are far from consistent.

        • cbg0

          yesterday at 3:27 PM

          Try this: Keep your CLAUDE.md as simple as possible, disable skills, and request Opus to start a subagent for each of the files and process at most 10 at a time (so you don't get rate limited) and give it the instructions in the skill for whatever processing you're doing to the markdowns as a prompt, see if that helps.

          • conception

            yesterday at 7:13 PM

            https://scottspence.com/posts/measuring-claude-code-skill-ac...

            This works in my experience

            • keks0r

              yesterday at 3:25 PM

              yes we had to tune the claude.md and the skill trigger quite a bit, to get it much better. But to be honest also 4.6 did improve it quite a bit. Did you run into your issues under 4.5 or 4.6?

                • dmix

                  yesterday at 3:43 PM

                  I was using Sonnet 4.6 since it was a menial task

              • stpedgwdgfhgdd

                yesterday at 8:39 PM

                Try the latest skill-creator, has a/b testing

            • monsterxx03

              today at 1:32 PM

              I built something in a similar space: Linko (https://github.com/monsterxx03/linko), a transparent MITM proxy with a webui that lets you see what's actually being sent between Claude Code and LLM APIs in real time.

                It's been really helpful for me to debug my own sessions and understand what the model is seeing (system prompts, tool definitions, tracing tool calls etc.).

              • emehex

                yesterday at 2:23 PM

                For those unaware, Claude Code comes with a built in /insights command...

                  • loopmonster

                    yesterday at 2:40 PM

                    insights is straight ego fluffing - it just tells you how brilliant you are and the only actionable insights are the ones hardcoded into the skill that appear for everyone. things like be very specific with the success criteria ahead of time (more than any human could ever possibly be), tell the llm exactly what steps to follow to the letter (instead of doing those steps yourself), use more skills (here's an example you can copy paste that has 2 lines and just tells it to be careful), and a couple of actually neat ideas (like having it use playwright to test changes visually after a UI change)

                      • hombre_fatal

                        yesterday at 4:30 PM

                        It gave you a couple neat ideas and you're complaining.

                          • fragmede

                            yesterday at 5:03 PM

                            Some people just can't take a compliment, especially if it's generated. (I'm one of them.) Still, /insight did give useful help, but I wasn't able to target it to specific repo/sessions.

                              • hombre_fatal

                                yesterday at 5:28 PM

                                Isn't it using the sessions in the cwd where you're running it?

                    • keks0r

                      yesterday at 2:25 PM

                      Ohh this is exciting, I kinda overlooked it. I assume there are still a lot of differences, especially for accross teams. But I immediately ran it, when I saw your comment. Actually still running.

                        • huflungdung

                          yesterday at 3:45 PM

                          [dead]

                      • evrendom

                        yesterday at 3:52 PM

                        true, the best comes out of it when one uses claude code and codex as a tag team

                    • Aurornis

                      yesterday at 3:22 PM

                      > 26% of sessions are abandoned, most within the first 60 seconds

                      Starting new sessions frequently and using separate new sessions for small tasks is a good practice.

                      Keeping context clean and focused is a highly effective way to keep the agent on task. Having an up to date AGENTS.md should allow for new sessions to get into simple tasks quickly so you can use single-purpose sessions for small tasks without carrying the baggage of a long past context into them.

                        • sethammons

                          yesterday at 5:28 PM

                          this jumped out at me too. What counts as "abandoned"? How do you know the goal was not simply met?

                          I have longer threads that I don't want to pollute with side quests. I will pull up multiple other chats and ask one or two questions about completely tangential or unrelated things.

                            • eddythompson80

                              yesterday at 7:01 PM

                              I abandon sessions when I ask for something then it spins for a minute, fills up 40% of the context window and comes back with the totally wrong questions and I don't like the approach it took to get there. I don't answer any of the questions and just kill the session and start a new one with a different prompt.

                          • longtermemory

                            yesterday at 4:00 PM

                            I agree. In my experience: "single-purpose sessions for small tasks" is the key

                        • lgvdp

                          today at 10:22 AM

                          I see a lot of people with concerns about privacy and security. Not shown in the post, but the github shows how to self host. No need to use 3rd party, you can just have your own too

                            • evrendom

                              today at 10:28 AM

                              yup!

                          • tmaly

                            yesterday at 8:21 PM

                            I have seen numbers claiming tools are only called 59% of the time.

                            Saw another comment on a different platform where someone floated the idea of dynamically injecting context with hooks in the workflow to make things more deterministic.

                              • evrendom

                                yesterday at 8:33 PM

                                interesting, where did you see that?

                            • 152334H

                              yesterday at 2:14 PM

                              is there a reason, other than general faith in humanity, to assume those '1573 sessions' are real?

                              I do not see any link or source for the data. I assume it is to remain closed, if it exists.

                                • keks0r

                                  yesterday at 2:17 PM

                                  Its our own sessions, from our team, over the last 3 months. We used them to develop the product and learn about our usage. You are right, they will remain closed. But I am happy to share aggregated information, if you have specific questions about the dataset.

                                  • languid-photic

                                    yesterday at 4:01 PM

                                    it's reasonable to note that w/o sharing the data these findings can't be audited or built upon

                                    but i think the prior on 'this team fabricated these findings' is v low

                                • marconardus

                                  yesterday at 2:14 PM

                                  It might be worthwhile to include some of an example run in your readme.

                                  I scrolled through and didn’t see enough to justify installing and running a thing

                                    • keks0r

                                      yesterday at 2:17 PM

                                      Ah sorry, the readme is more about how to run the repo. The "product" information is rather on the website: https://rudel.ai

                                  • blef

                                    yesterday at 2:40 PM

                                    Reminds me https://www.agentsview.io/.

                                      • keks0r

                                        yesterday at 2:44 PM

                                        Our focus is a little bit more cross team, and in our internal version, we have also some continuous improvement monitoring, which we will probably release as well.

                                        • mentalgear

                                          yesterday at 3:09 PM

                                          > A local-first desktop and web app for browsing, searching, and analyzing your past AI coding sessions. See what your agents actually did across every project.

                                          Thx for the link - sounds great !

                                      • KaiserPister

                                        yesterday at 2:48 PM

                                        This is awesome! I’m working on the Open Prompt Initiative as a way for open source to share prompting knowledge.

                                          • keks0r

                                            yesterday at 2:50 PM

                                            Cool, whats the link? We have some learnings, especially in the "Skill guiding" part of our example.

                                        • alyxya

                                          yesterday at 2:54 PM

                                          Why does it need login and cloud upload? A local cli tool analyzing logs should be sufficient.

                                            • keks0r

                                              yesterday at 2:59 PM

                                              We used it across the team, and when you want to bring metrics together across multiple people, its easier on a server, than local.

                                          • mbesto

                                            yesterday at 4:02 PM

                                            So what conclusions have you drawn or could a person reasonably draw with this data?

                                              • avilesrafa

                                                yesterday at 4:37 PM

                                                Hey, here is Rafa, another Rudel AI developer. The ultimate goal is to make developers more productive. Suddenly, we had everyone having dozens of sessions per day, producing 10X more code, we were having 10X more activity but not necessarily 10X productivity.

                                                With this data, you can measure if you are spending too many tokens on sessions, how successful sessions are, and what makes them successful. Developers can also share individual sessions where they struggle with their peers and share learnings and avoid errors that others have had.

                                                • evrendom

                                                  yesterday at 7:15 PM

                                                  yes what rafa said... aaand we see who wastes the 200 bucks claude subscription by not using it

                                              • smallerfish

                                                yesterday at 7:53 PM

                                                > content, the content or transcript of the agent session

                                                Does this include the files being worked on by the agent in the session, or just the chat transcript?

                                              • ekropotin

                                                yesterday at 2:21 PM

                                                > That's it. Your Claude Code sessions will now be uploaded automatically.

                                                No, thanks

                                                  • keks0r

                                                    yesterday at 2:24 PM

                                                    It will be only enabled for the repo where you called the `enable` command. Or use the cli `upload` command for specific sessions.

                                                    Or you can run your own instance, but we will need to add docs, on how to control the endpoint properly in the CLI.

                                                      • tgtweak

                                                        yesterday at 2:46 PM

                                                        Big ask to expect people to upload their claude code sessions verbatim to a third party with nothing on site about how it's stored, who has access to it, who they are... etc.

                                                          • keks0r

                                                            yesterday at 3:08 PM

                                                            We dont expect anything, we put it out there, and we might be able to build trust as well, but maybe you dont trust us, thats fair. You can still run it yourself. We are happy about everyone trying it out, either hosted or not. We are hosting it, just to make it easier for people that want to try it, but you dont have to. But you have a good point, we should probably put more about this on the website. Thanks.

                                                            • jamiemallers

                                                              yesterday at 3:02 PM

                                                              [dead]

                                                  • ericwebb

                                                    yesterday at 5:21 PM

                                                    I 100% agree that we need tools to understand and audit these workflows for opportunities. Nice work.

                                                    TBH, I am very hesitant to upload my CC logs to a third-party service.

                                                      • evrendom

                                                        yesterday at 5:25 PM

                                                        you can host the whole thing locally :)

                                                          • ericwebb

                                                            yesterday at 6:36 PM

                                                            I missed that important detail :) thanks

                                                    • anthonySs

                                                      yesterday at 2:59 PM

                                                      is this observability for your claude code calls or specifically for high level insights like skill usage?

                                                      would love to know your actual day to day use case for what you built

                                                        • keks0r

                                                          yesterday at 3:04 PM

                                                          the skill usage was one of these "I am wondering about...." things, and we just prompted it into the dashboard to undertand it. We have some of these "hunches" where its easier to analyze having sessions from everyone together to understand similarities as well as differences. And we answered a few of those kinda one off questions this way. Ongoing, we are also using a lot our "learning" tracking, which is not really usable right now, because it integrates with a few of our other things, but we are planning to release it also soon. Also the single session view sometimes helps to debug a sessions, and then better guide a "learning". So its a mix of different things, since we have multiple projects, we can even derive how much we are working on each project, and it kinda maps better than our Linear points :)

                                                      • yesterday at 3:18 PM

                                                        • mentalgear

                                                          yesterday at 3:08 PM

                                                          How diverse is your dataset?

                                                            • keks0r

                                                              yesterday at 3:10 PM

                                                              Team of 4 engineers, 1 data & business person, 1 design engineer.

                                                              I would say roughly equal amount of sessions between them (very roughly)

                                                              Also maybe 40% of coding sessions in large brownfield project. 50% greenfield, and remaining 10% non coding tasks.

                                                          • bool3max

                                                            yesterday at 7:23 PM

                                                            Why is the comment calling out the biggest issue with this so heavily downvoted? Privacy is a massive concern with this.

                                                            • lau_chan

                                                              yesterday at 1:54 PM

                                                              Does it work for Codex?

                                                                • keks0r

                                                                  yesterday at 2:12 PM

                                                                  Yes we added codex support, but its not yet extensively tested. Session upload works, but we kinda have to still QA all the analytics extraction.

                                                              • dboreham

                                                                yesterday at 5:27 PM

                                                                One potential reason for sessions being abandoned within 60 seconds in my experience is realizing you forgot to set something in the environment: github token missing, tool set for the language not on the path, etc. Claude doesn't provide elegant ways to fix those things in-session so I'll just exit, fix up and start Claude again. It does have the option to continue a previous session but there's typically no point in these "oops I forgot that" cases.

                                                                • cluckindan

                                                                  yesterday at 1:58 PM

                                                                  Nice. Now, to vibe myself a locally hosted alternative.

                                                                    • vidarh

                                                                      yesterday at 2:07 PM

                                                                      I was about to say they have a self-hosting guide, but I see they use third party services that seem absolutely pointless for such a tiny dataset. For comparison, I have a project that happily analyzes 150 million tokens worth of Claude session data w/some basic caching in plain text files on a $300 mini pc in seconds... If/when I reach billions, I might throw Sqlite into the stack. Maybe once I reach tens of billions, something bigger will be worthwhile.

                                                                        • keks0r

                                                                          yesterday at 2:13 PM

                                                                          There is also a docker setup in there to run everything locally.

                                                                            • vidarh

                                                                              yesterday at 2:44 PM

                                                                              That's great. It's still over-engineered given processing this data in-process is more than fast enough at a scale far greater than theirs.

                                                                          • yesterday at 3:19 PM

                                                                        • keks0r

                                                                          yesterday at 2:14 PM

                                                                          The docker-compose contain everything you should need: https://github.com/obsessiondb/rudel/blob/main/docker-compos...

                                                                      • yangro

                                                                        today at 12:16 AM

                                                                        [flagged]

                                                                        • sriramgonella

                                                                          yesterday at 2:59 PM

                                                                          [flagged]

                                                                            • keks0r

                                                                              yesterday at 3:06 PM

                                                                              1. can only partly be answered, because we can only capture the "edits" that are prompted, vs manual ones. 2. for us actually all of them, since we do everything with ai, and invest heavily and continously, to just reduce the amount of iterations we need on it 3. thats a good one, we dont have anything specific for debugging yet, but it might be an interesting class for a type of session.

                                                                          • socialinteldev

                                                                            yesterday at 3:59 PM

                                                                            [flagged]

                                                                              • avilesrafa

                                                                                yesterday at 4:41 PM

                                                                                To clarify, our data set consists solely of Claude Code sessions, specifically those with a human behind them. Rudel AI, in its current form, focuses on "How teams code with AI". We have plans to expland to a larger range of agentic observability use cases.

                                                                                What tools do you use to run your analysis?

                                                                                • DeltaCoast

                                                                                  yesterday at 5:17 PM

                                                                                  Can you expand on the USDC friction piece?

                                                                                    • simpsond

                                                                                      yesterday at 5:34 PM

                                                                                      I think they are talking about x402 payments (HTTP 402 with payment instruction headers).

                                                                              • mrothroc

                                                                                yesterday at 2:18 PM

                                                                                [dead]

                                                                                  • keks0r

                                                                                    yesterday at 2:21 PM

                                                                                    This is great. How are you "identifying" these stages in the session? Or is it just different slash commands / skills per stage? If its something generic enough, maybe we can build the analysis into it, so it works for your use case. Otherwise feel free to fork the repo, and add your additional analysis. Let me know if you need help.

                                                                                      • mrothroc

                                                                                        yesterday at 4:01 PM

                                                                                        I use prompt templates, so in the first version of my analysis script on my own logs I looked for those. However, to make it generic, I switched to using gemini as a classifier. That's what's in the repo.

                                                                                • longtermemory

                                                                                  yesterday at 3:57 PM

                                                                                  From session analysis, it would be interesting to understand how crucial the documentation, the level of detail in CLAUDE.md, is. It seems to me that sometimes documentation (that's too long and often out of date) contributes to greater entropy rather than greater efficiency of the model and agent.

                                                                                  It seems to me that sometimes it's better and more effective to remove, clean up, and simplify (both from CLAUDE.md and the code) rather than having everything documented in detail.

                                                                                  Therefore, from session analysis, it would be interesting to identify the relationship between documentation in CLAUDE.md and model efficiency. How often does the developer reject the LLM output in relation to the level of detail in CLAUDE.md?

                                                                                    • avilesrafa

                                                                                      yesterday at 4:39 PM

                                                                                      This is a great idea, documented and added to our roadmap.

                                                                                  • aplomb1026

                                                                                    yesterday at 5:32 PM

                                                                                    [flagged]

                                                                                    • bhekanik

                                                                                      yesterday at 2:01 PM

                                                                                      [dead]

                                                                                      • Sebastian_Dev

                                                                                        yesterday at 3:38 PM

                                                                                        [dead]

                                                                                        • huflungdung

                                                                                          yesterday at 3:38 PM

                                                                                          [dead]

                                                                                          • ptak_dev

                                                                                            yesterday at 7:27 PM

                                                                                            [flagged]

                                                                                            • multidude

                                                                                              yesterday at 2:17 PM

                                                                                              [flagged]

                                                                                                • indiosmo

                                                                                                  yesterday at 2:34 PM

                                                                                                  I usually instruct the agent to use the skills explicitly, e.g. "/writing-tests write the tests for @some-class.cpp"

                                                                                                  So the skills are mostly a sort of on-demand AGENTS.md specific to the task.

                                                                                                  Another example is I have a `plan-review` skill, so when planning something I add at the end of the prompt something like: "plan the task, .... then launch claude and codex /plan-review agents in parallel and take their findings into account before producing the final plan".

                                                                                                  • keks0r

                                                                                                    yesterday at 2:31 PM

                                                                                                    The 4% usage was about our internal team, and we have skills setup. So it is not necessary that they are not built, but rather that they were not used, when we expected them to be used. So we adapted our CLAUDE.md to make claude more eager to use them. Also the 4% usage was on the 4.5 models, 4.6 got much better with invoking skills.

                                                                                                • mihir_kanzariya

                                                                                                  yesterday at 3:13 PM

                                                                                                  [flagged]

                                                                                                    • rob

                                                                                                      yesterday at 3:36 PM

                                                                                                      It's crazy how fast I'm able to identify these bots now. You just get an uncanny valley type of feeling immediately reading it. Sure enough you click the profile and it's a brand new account with one or two similar posts in the same style. There's some sort of writing style here that identifies it because I've picked upon it multiple times quickly but it's hard to articulate into words.

                                                                                                        • mihir_kanzariya

                                                                                                          today at 3:07 PM

                                                                                                          So you feel my comment is written by bot?

                                                                                                            • mihir_kanzariya

                                                                                                              today at 3:08 PM

                                                                                                              Of course i rewrite my thoughts using chatgpt. Because if English is not good , is that you are referring?

                                                                                                                • duskdozer

                                                                                                                  today at 3:23 PM

                                                                                                                  No, LLMs often have an identifiable style that can be unpleasant to read. I came here after identifying your account as a bot, but it looks like that's not true. I would rather read whatever English you are able to write. If it's not enough for you to communicate, it would be better to just use a traditional machine translator, such as https://libretranslate.com/

                                                                                                      • bspammer

                                                                                                        yesterday at 3:18 PM

                                                                                                        Heavy use of /rewind helps with this - it's much better to remove the bad information from the context entirely instead of trying to tell the model "actually, ignore the previous approach and try this instead"

                                                                                                    • robutsume

                                                                                                      yesterday at 4:01 PM

                                                                                                      [flagged]

                                                                                                      • ozgurozkan

                                                                                                        yesterday at 2:55 PM

                                                                                                        [flagged]

                                                                                                          • x187463

                                                                                                            yesterday at 3:01 PM

                                                                                                            > The 26% abandonment rate, the error cascade patterns in the first 2 minutes β€” these are behavioural signals, not just performance metrics.

                                                                                                            > When Claude Code gets stuck in a loop, tries an unexpected tool chain, or produces inconsistent outputs under adversarial prompts β€” those aren't just UX failures, they're security surface area.

                                                                                                            Twice in one paragraph, not even trying to blend in.

                                                                                                            • howdareme

                                                                                                              yesterday at 2:58 PM

                                                                                                              LLM comment spotted

                                                                                                          • vova_hn2

                                                                                                            yesterday at 2:37 PM

                                                                                                            This is so sad that on top of black box LLMs we also build all these tools that are pretty much black box as well.

                                                                                                            It became very hard to understand what exactly is sent to LLM as input/context and how exactly is the output processed.

                                                                                                              • keks0r

                                                                                                                yesterday at 2:39 PM

                                                                                                                The tool does have a quite detailed view for individual sessions. Which allows you to understand input and output much better, but obviously its still mysterious how the output is generated from that input.