Home | Search


2026-01-29

Hard numbers for the search pipeline

Disclaimer

I used to think search/discovery/recommendation algos only did embedding search. I have now realised there's an entire pipeline, and not a single step in the pipeline can be skipped.

  1. Start with the entire text internet
  2. Use hard-coded initial list of people/blogs/urls/keywords to filter
  3. Use embedding search to filter further
  4. Use inference with both smart prompts (like paul graham prompts, surprising difference prompts) and the user context, to filter further
  5. Show the post to N other users, to filter further

Hard numbers

The text internet contains atleast 500B tokens.

Tips

Subscribe

Enter email or phone number to subscribe. You will receive atmost one update per month

Comment

Enter comment