Browse parent directory
my_projects/my_projects.html
2025-05-03
My projects
Search Engine for Books
- Not hosting anymore due to hosting cost
- Use AI (openai text-embedding-3-small) to search a large collection of books (libgen english epubs)
- Intended for researchers
- Pay to try it out
- Free = Search entire libgen for 30 min (contact me)
- $100/mo = Search 10% of libgen. (Will rent 256 GB RAM)
- $800/mo = Search entire libgen. (Will rent 2 TB RAM)
- $50k = Search entire libgen for lifetime. (Will write code to search using disk not RAM)
- Technical notes
- Dataset = ~2 TB embeddings ~300M vectors; from ~300 GB plaintext; from ~7 TB ~700k unique english epubs; selected from ~65 TB libgen database
- Embedding model = openai text-embedding-3-small
- Total spent out-of-pocket so far = ~$2600 = ~$1000 (openai embedding API) + ~$1600 (CPU, disk, bandwidth etc)
- Database and search algo = DragonflyDB
- Languages/Frameworks used = perl, bash, nginx, .... mojolicious, jq, htmlq, gnu parallel,
- Developer notes
- Used bash pipelines in all steps (extracing plaintext from epubs, converting to openai jsonl format, queueing them for openai servers, loading results into DB) to max out disk throughput
- No use of nodejs and python in order to avoid memory overflow
- OpenAI BatchAPI rate limit documentation is bad, had to figure out some hacks like sending 25 "requests" per batch file, 2048 strings per "request", 20 batch files at a time.
Tokens for tokens
- Not hosting anymore due to lack of user demand
- Pay for OpenAI API using cryptocurrency, at discounted rate, anonymously
- AI model: openai o1; Payment provider: Optimism Rollup (less tx fees compared to ethereum mainnet); Currency supported: USDC
Screenshots

