Tiled Hacker news on React Router

Ask HN: Why dead code detection in Python is harder than most tools admit

5 points - yesterday at 3:36 AM

I’ve been thinking about why dead code detection (and static analysis in general) feels so unreliable in Python compared to other languages. I understand that Python is generally dynamic in nature.

In theory it should be simple(again in theory): parse the AST, build a call graph, find symbols with zero references. In practice it breaks down quickly because of many things like:

1. dynamic dispatch (getattr, registries, plugin systems)

2. framework entrypoints (Flask/FastAPI routes, Django views, pytest fixtures)

3. decorators and implicit naming conventions

4. code invoked only via tests or runtime configuration

Most tools seem to pick one of two bad tradeoffs:

1. be conservative and miss lots of genuinely dead code

2. be aggressive and flag false positives that people stop trusting

What’s worked best for me so far is treating the code as sort of a confidence score, plus some layering in limited runtime info (e.g. what actually executed during tests) instead of relying on 100% static analysis.

Curious how others handle this in real codebases..

Do yall just accept false positives? or do yall ignore dead code detection entirely? have anyone seen approaches that actually scale? I am aware that sonarqube is very noisy.

I built a library with a vsce extension, mainly to explore these tradeoffs (link below if relevant), but I’m more interested in how others think about the problem. Also hope I'm in the right channel

Repo for context: https://github.com/duriantaco/skylos

shoo
yesterday at 10:55 PM
> What’s worked best for me so far is treating the code as sort of a confidence score, plus some layering in limited runtime info (e.g. what actually executed during tests) instead of relying on 100% static analysis.
> Curious how others handle this in real codebases..
I'd argue that for large Python codebases, having high automated test coverage is essential -- mainly unit tests of logic, but also a few heavier integration tests & smoke tests to confirm that the units can actually be wired together and executed in some fashion.
So, assuming [a] you're starting with a healthy Python codebase with great automated test coverage, the impact of deleting code that appears to be dead due to a false positive is actually quite low: it's immediately caught by the automated test suite, either pre-commit or pre-merge, so you back out the proposed delete and cross it off your list, and try the next ones. If the full test suite takes 5 minutes or less to run, no problem. If the full test suite takes 12 hours to run... ughh. I guess you could work around that with a statistical approach, make a few different branches, each applying a different random sample of 50 of the proposed 100 different code deletions, then kick them all off to run in CI overnight.
My main memory of Python dead code deletion is working on a Python project around a decade ago. The codebase was fresh, maybe 1-2 years old, worked on by various teams of contractors, but the focus had reasonably been on bashing out features, getting the app into production, rapidly iterating based on user feedback & as requirements oscillated. So there was a bit of accumulated cruft. I was new to the project and suggested we could use the vulture [1] dead code scanner, one of the other devs who had been working on the project since the start had a bit of spare time one afternoon, they applied it, found a bunch of dead code and deleted it all. The project had OK test coverage & we could manually sanity check each proposed delete during code review. It was a quick win. It's OK if you don't delete all the dead code, deleting the 80% that's easy to identify and low risk to remove is pretty good, then everyone can get back to shovelling more features into prod.
[a] If a Python codebase does not have high coverage of tests that are testing specific requirements and parts of functionality, the project is in an unhealthy state and that needs to be addressed first, before attempting any refactors, fixes or feature work. One way to dig out of that hole is to layer on automated end-to-end regression tests that assert that the behaviour (whatever it happens to be, quirks/defects and all) hasn't changed. Such tests are a lot worse than having fast specific tests of requirements (they merely detect if behaviour is changing, not if the behaviour meets or breaks requirements), they require a lot of toil to maintain, but they're significantly better than nothing, and provided you've got a wide enough sample of test scenarios to drive your regression tests it at least lets you tell with some confidence if a proposed refactor causes the app to crash. Then the safety net of regression tests gives you the confidence to make surgical changes (while writing specific unit tests). This is the general approach advocated by Feathers' Working Effectively with Legacy Code book (not Python specific).
[1] https://github.com/jendrikseipp/vulture

Ask HN: Why dead code detection in Python is harder than most tools admit

shoo