Home | Search


2025-04-10

Simplify embedding search

Summary

Main

Why?

Embedding search: Given a dataset of vectors and a query vector, find top-10 vectors that maximise dot product with query vector.

This is equivalent to finding top-10 vectors that minimise euclidean distance with query vector, assuming all vectors are unit vectors.

Common algos

Dataset sizes

Brute-force algo

State of the art algos are captured by ANN benchmarks and big ANN benchmarks. The main conclusion from them for me is that graph-based methods beat geometry-based methods.

Pinecone's articles on FAISS are the best resource on the internet on this that I've found.

Buckets

I'm yet to find a state-of-the-art algo for this, I just shared some intuitions of mine here.

Subscribe

Enter email or phone number to subscribe. You will receive atmost one update per month

Comment

Enter comment