Browse parent directory
unimportant/software_as_hypothesis_testing.html
2025-04-09
Software as hypoethesis testing
Common goals for building software include acquiring money, acquiring people's attention, and providing them tools to solve a problem of theirs.
I realised I'm writing software with a different goal - to test hypotheses about reality, by making contact with reality.
- I wrote the libgen search project not primarily because I wanted to help researchers or make money myself, but because I wanted to check whether it can be built or not. Can an AI recommend me books I can't find myself? Answer I got was yes, an AI can recommend me books that are useful that I wouldn't find otherwise. AI can improve my epistemology.
- Similarly, I wanted to know if using search tools to seek truth on currently taboo topics is possible or not. So I wrote a tool to search reddit. The tool I wrote did not get me significantly better answers than just making google searches with "site:reddit.com" appended. Although tbh googling reddit posts itself is enough to make progress on topics considered taboo.
- I wanted to know if LLM-based stylometric doxxing is possible. I am yet to get an answer on whether stylometric doxxing is possible, but atleast I have a better picture in my head now of what a doxxing tool will look like. It is going to require more grep than embedding search, and it is going to require more crawling of websites that commoncrawl refuses to crawl because it respects robots.txt. Doxxing tools like whitepages likely purchase data from brokers that don't respect robots.txt, hence they have more information. Also, I learned that stylometrics only comes into the picture after you have integrated all the classical approaches to reducing number of potential matches. Stylometrics can filter 1 out of 1000 people easily but can't filter 1 out of 1M people as easily, so it has to be combined with classical approaches.