Tiled Hacker news on React Router

Gemini Embedding: Powering RAG and context engineering

268 points - yesterday at 4:47 PM

Source

_Chief
today at 3:38 PM
I have been thinking around solving this problem. I think one of the reasons some AI assistants shine vs others is how they can reduce the amount of context the LLM needs to work with using in-built tools. I think there's room to democratize these capabilities. One such capability is allowing the LLMs to directly work with the embeddings.
I wrote an MCP server directory-indexer[1] for this (self-hosted indexing mcp server). The goal being indexing any directories you want your AI to know about and gives the it MCP tools to search through the embeddings etc. While an agentic grep may be valuable, when working with tons of files with similar topics (like customer cases, technical docs), pre-processed embeddings have proven valuable for me. One reason I really like it is that it democratizes my data and documents: giving consistent results when working with different AI assistants - the alternative being vastly different results based on the in-built capabilities of the coding assistants. Another being having access to you "knowledge" from any project you're on. Though since this is selfhosted, I use nomic-embed-text for the embedding which has been sufficient for most use cases.
[1] https://github.com/peteretelej/directory-indexer
stillpointlab
yesterday at 6:04 PM
> Embeddings are crucial here, as they efficiently identify and integrate vital information—like documents, conversation history, and tool definitions—directly into a model's working memory.
I feel like I'm falling behind here, but can someone explain this to me?
My high-level view of embedding is that I send some text to the provider, they tokenize the text and then run it through some NN that spits out a vector of numbers of a particular size (looks to be variable in this case including 768, 1536 and 3072). I can then use those embeddings in places like a vector DB where I might want to do some kind of similarity search (e.g. cosine difference). I can also use them to do clustering on that similarity which can give me some classification capabilities.
But how does this translate to these things being "directly into a model's working memory'? My understanding is that with RAG I just throw a bunch of the embeddings into a vector DB as keys but the ultimate text I send in the context to the LLM is the source text that the keys represent. I don't actually send the embeddings themselves to the LLM.
So what is is marketing stuff about "directly into a model's working memory."? Is my mental view wrong?
djoldman
yesterday at 10:22 PM
It may be worth pointing out that a few open weights models score higher than gemini-embedding-001 on MTEB:
https://huggingface.co/spaces/mteb/leaderboard
Particularly Qwen3-Embedding-8B and Qwen3-Embedding-4B:
https://huggingface.co/Qwen/Qwen3-Embedding-8B
bryan0
yesterday at 5:34 PM
The Matryoshka embeddings seem interesting:
> The Gemini embedding model, gemini-embedding-001, is trained using the Matryoshka Representation Learning (MRL) technique which teaches a model to learn high-dimensional embeddings that have initial segments (or prefixes) which are also useful, simpler versions of the same data. Use the output_dimensionality parameter to control the size of the output embedding vector. Selecting a smaller output dimensionality can save storage space and increase computational efficiency for downstream applications, while sacrificing little in terms of quality. By default, it outputs a 3072-dimensional embedding, but you can truncate it to a smaller size without losing quality to save storage space. We recommend using 768, 1536, or 3072 output dimensions. [0]
looks like even the 256-dim embeddings perform really well.
[0]: https://ai.google.dev/gemini-api/docs/embeddings#quality-for...
mvieira38
yesterday at 5:41 PM
To anyone working in these types of applications, are embeddings still worth it compared to agentic search for text? If I have a directory of text files, for example, is it better to save all of their embeddings in a VDB and use that, or are LLMs now good enough that I can just let them use ripgrep or something to search for themselves?
aziis98
today at 3:35 AM
I'm just can't wait for a globally scaled rag system. I think that will be a turning point for search engines.
For now there is only https://exa.ai/ that is currently doing something similar it seems.
TN1ck
today at 12:03 PM
VP of Engineering of re:cap here (featured in the article), if anybody has any more detailed questions, happy to answer!
curl-up
today at 7:52 AM
Anyone who has recently worked on embedding model finetuning, any useful tools you'd recommend (both for dataset curation and actual finetuning)? Any models you'd recommend as especially good for finetuning?
I'm interested in both full model finetunes, and downstream matrix optimization as done in [1].
[1] https://github.com/openai/openai-cookbook/blob/main/examples...
miohtama
yesterday at 9:15 PM
> Everlaw, a platform providing verifiable RAG to help legal professionals analyze large volumes of discovery documents, requires precise semantic matching across millions of specialized texts. Through internal benchmarks, Everlaw found gemini-embedding-001 to be the best, achieving 87% accuracy in surfacing relevant answers from 1.4 million documents filled with industry-specific and complex legal terms, surpassing Voyage (84%) and OpenAI (73%) models. Furthermore, Gemini Embedding's Matryoshka property enables Everlaw to use compact representations, focusing essential information in fewer dimensions. This leads to minimal performance loss, reduced storage costs, and more efficient retrieval and search.
This will make a lot of junior lawyers or their work obsolete.
Here is a good podcast on the topic how will AI affect legal industry
https://open.spotify.com/episode/4IAHG68BeGZzr9uHXYvu5z?si=q...
jcims
yesterday at 9:42 PM
I'm short on vocabulary here but it seems that using content embedding similarity to find relevant (chunks of) content to feed an LLM is orthogonal to the use of LLMs to take automatically curated content chunks and use them to enrich a context.
Is that correct?
I'm just curious why this type of content selection seems to have been popularized and in many ways become the defacto standard for RAG, and (as far as I know but I haven't looked at 'search' in a long time) not generally used for general purpose search?
asdev
yesterday at 5:57 PM
I feel like tool calling killed RAG, however you have less control over how the retrieved data is injected in the context.
nikolayasdf123
today at 10:29 AM
interesting. high quality optimized embeddings is very nice to have
zapnuk
today at 7:24 AM
Good luck to anyone using it. We used it for embedding about 6k documents.
The API constantly gives you quota errors when you reach about 150 requests/min eventhough the quota should allow about 50_000 requests/min.
We’d like to use the Batch API, but the model isn’t available yet.
Quite a nice model though. Being able to get embeddings for a specific task type [1] is very interesting. We used classification specific embeddings and noticed a meaningful improvment when we used the embeddings as input for a classifier.
1: https://ai.google.dev/gemini-api/docs/embeddings#supported-t...
nikolayasdf123
today at 10:28 AM
no image support is a deal breaker. multi-modality is a must
morkalork
yesterday at 5:57 PM
Question to other GCP users, how are you finding Google's aggressive deprecation of older embedding models? Feels like you have to pay to rerun your data through every 12 months.
keizo
today at 12:20 AM
has anyone done some simple latency profiling of gemini embedding vs open ai embedding api? seem like that api call is one of the biggest chunks of time in a simple rag setup.
jgalt212
today at 11:44 AM
Is one LLM embedding much better than another? To me, if you're building a vector database off embeddings, it's best and not punitive to stick to a self hosted public weights model.
mijoharas
yesterday at 7:25 PM
What open embeddings models would people recommend. Still Nomic?
dmezzetti
yesterday at 7:46 PM
It's always worth checking out the MTEB leaderboard: https://huggingface.co/spaces/mteb/leaderboard
There are some good open models there that have longer context limits and fewer dimensions.
The benchmarks are just a guide. It's best to build a test dataset with your own data. This is a good example of that: https://github.com/beir-cellar/beir/wiki/Load-your-custom-da...
Another benefit of having your own test dataset, is that it can grow as your data grows. And you can quickly test new models to see how it performs with YOUR data.
yesterday at 6:37 PM

Gemini Embedding: Powering RAG and context engineering

_Chief

FeepingCreature

stillpointlab

fine_tune

base698

criddell

fine_tune

CartwheelLinux

Squakie

quinnjh

ubercow13

criddell

Voloskaya

greymalik

rao-v

gettincrafty

yorwba

whimsicalism

taw1285

wrs

tcdent

yazaddaruvala

stillpointlab

yazaddaruvala

d4rkp4ttern

ivape

letitgo12345

stillpointlab

NicholasD43

ivape

visarga

djoldman

electroglyph

asaddhamani

bryan0

simonw

alach11

simonw

thefourthchime

OutOfHere

minimaxir

ACCount36

OutOfHere

mvieira38

simonw

elliotto

pjm331

philip1209

whinvik

elliotto

minimaxir

sergiotapia

aziis98

TN1ck

curl-up

miohtama

dlojudice

jcims

krackers

elliotto

jcims

elliotto

asdev

billmalarky

visarga

gnulinux

OutOfHere

kridsdale1

kfajdsl

OutOfHere

nikolayasdf123

zapnuk

ofisboy

nikolayasdf123

morkalork

throwaway-blaze

adregan

BoorishBears

keizo