Browse parent directory

my_research/open_source_search_summary.html


2025-06-11

Open Source Search (Summary)

Disclaimer

Summary

Main

Why?

Use cases of open source search

Hardware costs in 2025

Important: All these prices are dropping exponentially. Try forecasting prices in 2030 or 2035. We will eventually end up with entire text internet stored in your pocket.

Prices taken from hetzner server auction, vast.ai, aws s3 deep archive

Rented

Self-hosted

Storage

Crawling

Figures taken from commoncrawl and internet archive

Figures taken from my own (bad) benchmark

Software

Plaintext extraction

Figures taken from commoncrawl and internet archive

Data size

Plaintext extraction cost

Software

Embedding generation

Algorithm used

Figures taken from openai text-embedding-3-small

Embedding generation cost

Performance

Software

Embedding search

Algorithm used

Software

Latency

If your hosting the app locally for your own use, latency does not matter, only search time matters. If you're hosting it for other people to use, latency can be relevant, and it may make sense to host multiple edge servers.


Comments