Tiled Hacker news on React Router

Using Vectorize to build an unreasonably good search engine in 160 lines of code

143 points - 12/21/2025

Source

simonw
12/25/2025
I was super-excited about vector search and embeddings in 2024 but my enthusiasm has faded somewhat in 2025 for a few reasons:
- LLMs with a grep or full-text search tool turn out to be great at fuzzy search already - they throw a bunch of OR conditions together and run further searches if they don't find what they want
- ChatGPT web search and Claude Code code search are my favorite AI-assisted search tools and neither bother with vectors
- Building and maintaining a large vector speech index is a pain. The vector are usually pretty big and you need to keep them in memory to get truly great performance. FTS and grep are way less hassle.
- Vector matches are weird. So you get back the top twenty results... those might be super relevant or they might be total garbage, it's on you to do a second pass to figure out if they're actually useful results or not.
I expected to spend much of 2025 building vector search engines, but ended up not finding them as valuable as I had thought.
____tom____
12/25/2025
You didn't build a search engine in 160 lines of code. You build a client for a search engine in 160 lines of code. The vector database is providing the search.
mips_avatar
12/25/2025
There’s a lot of previously intractable problems that are getting solved with these new embeddings models. I’ve been building a geocoder for the past few months and it’s been remarkable how close to google places I can get with just slightly enriched open street maps plus embedding vectors
RomanPushkin
12/25/2025
You might be getting a good _recall_ rate, since vectorize search is ANN, but the _precision_ can be low, because reranker piece is missing. So I would slightly improve it by adding 10 more lines of code and introducing reranker after the search (slightly increasing topK). Query expansion in the beginning can be also added to improve recall.
repeekad
12/25/2025
What about re-ranking? In my limited experience, adding fast+cheap re-ranking with something like Cohere to the query results took an okay vector based search and made top 1-5 results much stronger
Supermancho
12/25/2025
Site has a neat feature where you can see the pointers of other people, marked by regional? notations, scrolling through the content.
yuzhun
12/25/2025
While embeddings are generally not required in the context of code, I am interested in how they perform in the legal and regulatory domain, where documents are substantially longer. Specifically, how do embeddings compare with approaches such as ripgrep in terms of effectiveness?
abhinavb05
12/25/2025
At my workplace we are using vector embedding to build recommendation system and the results are amazing
jdthedisciple
12/29/2025
So it's an ad for partykit and they're just doing what anyone does by now?
Really dislike this type of content...
alansaber
12/26/2025
Good to see the article touching on the performance impact of a niche vs general embedding model/aggressive subword tokenization
sa-code
12/25/2025
Models like bge are small and quantized versions will fit in browser or on a tiny machine. Not sure why everyone reaches for an API as their first choice
croemer
12/25/2025
Missing a (2024)
daquisu
12/25/2025
Now it is even easier. Cloudflare has a beta product called AI Search that implements most of these 160 lines of code
ballpug
12/25/2025
[dead]

Using Vectorize to build an unreasonably good search engine in 160 lines of code

simonw

softwaredoug

bonecrusher2102

softwaredoug

markerz

leobg

markerz

Someone

marginalia_nu

jdthedisciple

croemer

simonw

windexh8er

____tom____

ivanjermakov

fortyseven

mips_avatar

occupant

robrenaud

mips_avatar

isaachh

RomanPushkin

repeekad

sgk284

vjerancrnjak

sa-code

repeekad

Supermancho

TheLNL

wormpilled

wqaatwt

fnord77

yuzhun

abhinavb05

wg0

jdthedisciple

alansaber

sa-code

croemer

novoreorx

daquisu

ballpug

tom