Browse parent directory
my_research/related_quick_notes/internet_spam.html
2025-06-18
Internet spam
Disclaimer
Summary
Spam prevention
- To ensure sender is spending more capital (on computer resources) than the receiver, you can either ask them to complete a proof-of-work challenge or send a junk payload that spends their upload bandwidth.
- To ensure sender is spending more attention (as a human being) than the receiver, you can ask them to verify an ID or pay you money or verify that other people are already paying attention to them.
- ID verification can be done open source using videos uploaded online, or it be done using government issued IDs and phone numbers
- Payment can be done using cryptocurrency or credit card
- Proving social status can be done using social media following (which can in theory be open source) or using legacy markers such as citation counts or linkedin profiles.
Main
I recently (meaning 2025-05) enabled comments on my website, which forced me to (again) think about spam and censorship as potential problems on the internet.
Censorship
I am not interested in preventing any content from reaching me, however cloud providers may be interested in this.
- As of 2025-05 it is still not difficult to find a cloud provider that does not packet scan all your traffic.
- Receiving objectionable packets is fine, sending objectionable packets can be a problem. I will use my discretion on what content gets posted publicly, as that is me sending packets to others not receiving packets.
Spam
Spam is bad for two reasons
- Spending receiver capital (on server resources) is bad for receiver
- Atleast for text content, server resources are very cheap. Images, audio and videos are where it becomes non-trivial.
- If the sender is anonymous, you want to ensure the sender spends more resources than the receiver.
- Sender cost:
- Sender cost, network egress > $0.05/mbps/mo (assume) = $0.05/(316.4 TB) = $0.00016/TB
- Sender cost, CPU ~ 0
- Receiver cost:
- Receiver cost, network ingress << Sender cost, network egress.
- (This is usually true assuming the receiver is running on a cloud datacentre, but it could break down in the limit. See: cloudflare article on ingress versus egress costs. Typically cloud providers have more egress than ingress so ingress is free, and residential connections have more ingress than egress so egress is free.)
- Receiver cost, CPU ~ 0
- Receiver cost, disk = $20/TB/mo * Storage time of comment = $0.0000077/TB/s * Storage time of comment
- If I want to ensure sender cost > receiver cost, I have following options:
- Store comments for short duration. Storage time = $0.00016/TB / ($0.0000077/TB/ss) = 20.8 s
- Force sender cost to increase artificially.
- Inflation ratio
- Inflation ratio = desired storage time / 20.8 s
- (Assume) Desired storage time = 6 hours => inflation ratio = ~1000
- Solution 1: Force sender cost for network egress to increase
- I can ask sender to append 1023 KB of junk to each 1 KB of content. Receiver only reads the first 1 KB and discards the rest. Receiver spends on network ingress but not as much on disk.
- Solution 2: Force sender cost for CPU to increase
- I can ask them to compute a hash pre-image whose difficulty threshold depends on content size.
- Hash difficulty > ($0.00016 / TB) / ($0.004/CPU-core / hour) (assume) = 0.04 CPU-core hours / TB = 2.4 CPU-core minutes / TB
- Or ofcourse one can use an even stronger solution such as one of those proposed below
- Spending receiver attention is bad for receiver
- If the sender is anonymous:
- Sender cost ~ 0 / TB
- Receiver cost = 1 / (1000 words/min) = 1 / (83 bytes/s) = 12.3 s / KB
- Receiver cost, time converted to money = 12.3 s / KB * ($100/h) (assume) = $0.34/KB = $340,000/GB
- Solution 1: Ask sender to trade capital for your attention.
- Sender cost > $0.34 / KB
- Open source approach
- Monero transaction costs > $0.15, so a monero transaction can be attached to each comment.
- (There should be good UX for this so the sender and receiver are not wasting additional seconds sending or verifying the payment. This can be automated.)
- Govt-backed approach
- A common existing solution is to only take .com domains seriously, those cost atleast $10 to rent. A chain of certificate authorities ending in a govt issues the certificates.
- Solution 2: Ensure sender spent their attention (as a human being) sending the content.
- Proof of human
- The only undeniable proof is meeting them in person. Video footage is second-best.
- Both of these require spending your own attention in order to do verification.
- Therefore it is too expensive to prove unique human every time.
- It is common for them to do verification once and reuse the ID across multiple websites for long duration.
- Govt-backed approach
- Most common ID proof nowadays is phone number or gmail linked to phone number. Governments are restricted in how many phone numbers they can issue and therefore do KYC to issue phone numbers.
- Open source approach
- Also possible to create an open source ID system where people upload videos online and get vouching by other people's videos online.
- Once you have such a system running, you can also use ZK proofs to do proof of unique human without revealing who it is. But first you need the proof of human system running.
- Solution 3: Ensure lots of people spent their attention (as human beings) sending the content.
- Typically multiple humans spending attention on the same topic is a stronger signal than one person spending attention on a topic. It is more likely worth it for the receiver to pay attention to latter.
- Statistical distribution
- Let's say a human sender is paying attention to some content, and they make an attention request to the receiver to also spend your attention on the same content. Let's say receiver had proof this was happening. Let's call this a 1-to-1 attention request.
- Humanity spends 8 billion seconds of attention per second of attention spent by receiver.
- For the average human, this is equalised. They are sending as many 1-to-1 attention requests to other people as they are receiving 1-to-1 attention requests from other people.
- For a human who is high status for any reason, they are receiving a lot more 1-to-1 attention requests from other people than they are sending 1-to-1 attention requests to other people. Hence they need even stronger filtering than proof that a human spent attention on the message they sent.
- Legacy status
- Receiver website can verify any legacy status of the sender, such as citation count in academia, job profile in corporate or government, and so on.
- Verifying such proofs costs significant amount of receiver attention, as most documents can be faked.
- One solution is to rely on third-parties who will altruistically call-out any fake documents publicly.
- Documents must be hosted on a neutral third-party server where such comments are allowed.
- Can also include videos from other people in that same legacy institution, publicly confirming the documents as correct in a video.
- Social media status
- Social media platforms typically start off with some proof of human, such as phone number as login or gmail attached to phone number as login.
- Upvotes (backed by this proof of human) are a popular method of determining who is paid attention to, on any forum
- Upvotes can decide what is allowed to post at all.
- Upvotes can decide what is likely to rise to the top of the forum.
- Usual there is initial seed effect
- Whatever was the shared topic that most of the initial members were paying attention to, that is what gets upvoted.
- New users are only likely to join if they were previously paying attention to this before they joined, and not otherwise.
- Sometimes new users may end up paying attention to the topic even though they didn't before.
- An ideal world ensures even if there are only two people on Earth spending their attention on some topic, there should exist some internet forum whose seed set is these two people.
See also
anubis - recent implementation of PoW that is actively in use now (I wish I heard about this sooner)
https://news.ycombinator.com/item?id=43668433
cloudflare's recent RFC
https://blog.cloudflare.com/web-bot-auth/
https://news.ycombinator.com/item?id=43994779
- I don't like this proposal, main reason being it piggybacks on top of centralised CAs which are hard to obtain anonymously. I will have to write a proper review if I prioritise this stuff
recent complaints by devs on bot traffic caused by rise of LLMs
https://arstechnica.com/ai/2025/03/devs-say-ai-crawlers-dominate-traffic-forcing-blocks-on-entire-countries/
https://www.akamai.com/newsroom/press-release/bots-compose-42-percent-of-web-traffic-nearly-two-thirds-are-malicious
Tor on PoW and similar
https://blog.torproject.org/stop-the-onion-denial/
https://community.torproject.org/onion-services/advanced/dos/
Brave on PoW and similar
https://safe.search.brave.com/help/pow-captcha
Comments