Distributed whistleblowing

Update: Most of the ideas in this document are now lower priority to me than other ideas.

DISCLAIMER

This document is written quickly and contains opinions I may change quickly, as I get new info.
This document contains politically sensitive info.

What?

This document describes how to setup distributed whistleblowing processes to reduce personal risk for everyone involved in the process.

Why?

Typically whistleblowing (such as with wikileaks or snowden leaks) incurs significant personal risk.
Reducing personal risk to whistleblowers may ensure whistleblowing is highly likely to happen when an org doesn't have complete trust of all its members, forcing them to pay a secrecy tax (in Assange's words).
I have my own personal viewpoint around which orgs I'd like to most enable whistleblowing on (see end of document), although this will be general-purpose infra that can be used by anyone to whistleblow on any org.

Summary

do SecureDrop / Signal but with increased security and >1000 servers all run by independent actors, and multiple independent dev teams
do Internet Archive / CommonCrawl but also crawl rate-limited/banned stuff (like leaked/banned/copyrighted documents, and social media websites), also do >1000 crawls all run by independent actors. also some of these actors share the LLM embeddings.

Potential problems

Low-attention on the documents. Military-grade security. Documents circulated by people with technical skills and willing to run servers and maintain opsec as part-time job.
- Ideally thousands of server operators exist. Some of them can choose for themselves special roles "redaction specialist" and "publisher". They can use public track record to prove to whistleblower and other operators that they can be trusted with this role.
- Whistleblower sends documents to an operator via SecureDrop or similar system or via hard disk dead drop. If redaction is required and they can't do it themselves for whatever reason, they send it to an operator who is a "redaction specialist" and has a good reputation.
  - IMO PGP + airgap + dead drop may offer more privacy than PGP + airgap + Tor http request, as of 2025. This is my personal bias and could change in future if physical world DAQ increases (cctv, drones, gigapixel cameras on aircraft).
  - I'm not very happy by some of the design choices made by SecureDrop. I'm looking into alternate solutions. It's possible I don't understand all of their choices. I have written a proper criticism of SecureDrop below.
  - PROBLEM: convince thousands of people to become operators of SecureDrop or similar system (most important)
  - PROBLEM: good infra, protocols, incentives to coordinate dead drops don't exist. Especially true if crossing a large geographic distance and multiple hops are required.
- This operator does redaction of any sensitive metadata or information, if required. They perform another hop here and send the documents to many other operators in the network using the same system.
  - PROBLEM: need public guidelines on redaction, so anyone can do it. This ideally ensures there are thousands of potential operators right from the start.
- If any operators thinks the documents are not spam, they can attach a proof-of-work hash and resend it to many operators in the network using the same system.
  - PROBLEM: need standard protocol for proof-of-work hashes. These could be static strings attached to documents, or generated at request-response time. (Tor, Brave, Proton all have separate implementations and they're all low difficulty hashes.)
- Eventually one of the operators who is a "publisher" hosts documents on a clearnet webserver for the public. This operator also posts a link to this webserver on a hard-to-censor social media platform such as 4chan or rumble.
  - PROBLEM: need guidelines for what the hard-to-censor social media platforms in each country are.
Medium-attention on the documents. Low security. Documents circulated by people with technical skills but not much free time.
- (If the documents are sufficiently important, a popular media org can publish them on their server, allowing the documents to skip this stage and directly go to high-attention.)
- Mirror a searchable version of docs to thousands of servers immediately
  - It is important that automated mirroring happens before any humans read the content on the operator's clearnet server. Whoever first posts the document to clearnet is an obvious target for anyone who wants to take the documents down.
  - PROBLEM: need open source web crawler to crawl entire internet including any leaked docs/videos, and torrent links containing leaked docs/videos.
  - OR: PROBLEM: need a standard protocol to only crawl websites and torrents that claim to have leaked docs on them (maybe they include a special flag in their readme/robots.txt, and proof-of-work hash to prove not spam.)
  - PROBLEM: need open source plaintext extraction and embedding generation so that along with the raw html crawls (WARC), the plaintext and embeddings are also circulated in the same torrent. need standardised format (WARC-parquet?) that keeps some metadata just like WARC keeps metadata.
High-attention on the documents. No security. Documents circulated by anyone.
- A popular media house publishes it to increase public attention
  - Popular media house will do document verification. I'm assuming they won't face any significant challenge with this. May require metadata of the documents (how to get this??) or contacting the org whose docs got leaked.
  - Popular media house will use embedding search functionality already provided, to figure out what is important to raise attention for.
- High-attention hard-to-censor social media to discuss the document in general public
  - PROBLEM: need open source crawling and mirroring crawls of all social media
  - I think actually doing distributed social media is too hard. Complexity of app ensures software developers who write the app are politically co-optible. What's easier to do is have distributed crawling and mirroring of a centralised site, so people in future can still view the consensus reached by users of the social media. If it ever gets taken down, someone can get a new server running (does not have to have content of old one).
  - Which social media are high attention and hard-to-censor varies by country.

Summary of potential solutions

persuade thousands of people to become operators of SecureDrop or similar system (most important)
coordination for hard disk dead drops, including multi-hop hard disk dead drops
proof-of-work hashes to prevent spam on the operators
redaction guidelines
open source web crawling
- flags and proof-of-work to only crawl some websites
- crawl and mirror leaked docs. crawl and mirror social media discussions.
open source plaintext extraction, embedding generation
- standardise format to share extracted plaintext and embeddings
guidelines for latest hard-to-censor high-attention social media
- to publish torrent link, maybe raw docs, and social media discussions
- guidelines must be country-wise and include legal considerations. always use a social media of a country different from the country where leak happened.

IMPORTANT: Need feedback from people who have actually worked with whistleblowers, to validate all hypotheses listed above.

IMPORTANT: Need to decide whether whisteblowing of important orgs (such as companies/labs researching intelligence, and national/intl intelligence agencies) is actually what I want to work on.

Side-note: Aiming this at intelligence orgs in particular

I am especially interested in enabling whistleblowing on orgs and labs working on intelligence (such as superintelligent AI, BCIs, human genetic engg, human connectome research, etc) and national/international intelligence agencies that may work with them.
Would orgs developing intelligence (superintelligent AI, human genetic engg, BCIs etc) necessarily be trustworthy if there existed lots of public information about both their capabilities and their values?
- Leaked info about values means public can decide if the org represents their true values. i.e. control via democracy instead of via market alone.
- Leaked info about capabilities means public can coordinate under a different leader to shut down existing org and setup a different org.
  - Open sourced capabilities could include ASI model weights, bioweapons or nanotech weapons (or generally, any offense-beats-defence weaponry deployable by small groups). I haven't fully made up my mind on how this interacts with the rest of the world, and whether a more transparent world enabled by above system is necessary or sufficient to ensure world will be in stable equilibrium.
  - Seems difficult to find a solution that can leak info about values of an org without also leaking info about capabilities of the org. Above proposed solutions will obviously leak both.
  - More narrowed crux: Lots of lesswrong crowd defers to Yudkowsky's mental priors that are not in favour of open source ASI and open source bioweapons and complete surveillance (public info). Will have to write separate post on surveillance equilibria.