KingMob
last Monday at 1:16 PM
Well, regexes, even with optimizations, are inherently linear searches, unlike a precomputed index (e.g., tries).
---
But the real problem with regexes is they force you mentally evaluate and skip over irrelevant hits, because it's not the right tool for excluding them.
Since regexes don't understand doc structure, you can't prioritize for a section heading with a word over other instances of the word.
Likewise for fuzzy search for longer phrases or cases where you're not sure of the precise order of multiple words.
They also lack stemming, so you either spend time constructing flexible regexes to handle related variants, or you run multiple searches with similar words.
Etc etc etc.
This is why text search engines are much more sophisticated than plain regexes.
---
Here's an example I ran into regularly when I was building a copy of bash's bind command:
How many irrelevant hits do you have to skip over to look up the the CLI options for bash's bind builtin? There's hits for the letters "bind" all over the manpage. And unfortunately the builtin options are near the end. I finally hit upon using "G" in less to jump to the end of the manpage, and then doing a reverse search for the hyper-specific phrase "BUILTINS", which takes me close to the right spot, but I only realized that after doing it a few times.