Browse parent directory

2025-04-10

Simple Embedding Search

We want a small, clear way to do embedding search. Complex code can be risky for big public work. For example, torrent code is simpler than Ethereum code, so it may face less capture by big powers.

Why Embedding Search?

How to Do It

Common Methods

  1. Graph-based: Like HNSW. Pinecone uses some closed-source tricks.
  2. Geometry-based: Locality-Sensitive Hash (LSH), k-means, or product quant.
  3. Brute-force: Compare the query to all vectors.

Data Sizes

Brute-Force Feasibility

Graph vs. Geometry

Buckets

In sum, a simple bucket-based approach plus brute force in each bucket might be good enough. We just store vectors in an easy way and skip fancy code. This keeps code small and stable.