Home | Search

2026-04-20

minor edit 2026-05-08

Downsides of doing ASI alignment/safety/control research

Disclaimer

Quick Note
target audience - people planning to work on or already working on ASI alignment/safety/control research
Trust yourself
- If you are a good researcher you should ultimately reach the frontier of knowledge, and then trust your intuition, not mine.
- I don't work full-time on this problem, so I have not put as much effort into understanding pros/cons as someone actually work on it full-time could put in. That is ultimately your responsibility not mine.
- If you don't agree with any of the points I make below, but also you can't see clear counterarguments, this should ring a big alarm bell in your head.

Downsides of doing alignment research

A lot of alignment/safety/control research is dual use
- In particular, a lot of interpretability and evals work is like this. If you obtain any useful signal whatsoever about what the AI model is actually doing or thinking, there's often a way to use this signal as part of a training regime to get even more capable models using less compute. Evals/Benchmarks try to understand the AI model output better, interpretability tries to understand the AI model thinking, better understanding of either of these can generally be converted into more efficient capabilities technique.
- Examples: RLHF was invented by Paul Christiano and others as an alignment technique, but has ended up a capabilities advance. GPT-3.5 allowed AI companies to raise huge amounts of capital and eyeballs. Benchmarks like HLE by Dan Hendrycks or ARC-AGI by Francois Chollet end up explicit optimisation targets for the AI companies, and reduce the researcher time and capital these companies would otherwise have to put into building good benchmarks.
- Maybe there are ways to to work on alignment/safety/control that are not dual use, I don't think it is impossible.
- Some people think they will keep their alignment research secret. In current political situation, I think keeping your dual use work secret is not really going to work over the long run, because the heads of the intelligence agencies of your country can demand that you hand over these secrets to them whenever they like. And heads of AI companies are increasingly powerful enough that they can work with the heads of intelligence agencies directly. (This excludes all the much easier ways your secrets can leak, like one of your researchers quitting and joining a capabilities org, one of your researchers telling the secrets to their loved ones, one of your researchers' machines getting hacked, and so on.)
If you solve alignment/safety/control, you probably get a permanent global dictatorship.
- Whoever gets the first controlled ASI will probably use it to overthrow both the US and Chinese govts and establish their own permanent global dictatorship. This person might be Amodei or Hassabis or Altman or Hassabis or the current NSA director or someone else, I am not actually sure. They could use automated militaries or hyperpersuasion, or good old fashioned bribes, or a number of other ways to slowly or quickly capture all geopolitical power.
- I still think permanent global dictatorship is better than human extinction, so I would still prefer work to happen on AI alignment/safety/control. But it is worth remembering that this is the best case outcome of someone who only works on technical problems not political ones.
Short timelines
- Solving both theoretical and empirical alignment within 5 years seems extremely hard given previous track record of both theoretical and empirical work.
- You have to solve both theoretical and empirical problems. The final output of alignment being "solved" has to be a pytorch repo, not a philosophy paper. If you write a philosophy paper, someone needs to also figure out how to convert this into a pytorch repo.
- Maybe some genius can crack important problems in the field anyway, zero-to-one, never say never.

Downsides of doing alignment research funded by Dario Amodei or Dustin Moskowitz or Jaan Tallin

(As of 2026-04, there is very little funding for ASI alignment/safety/control that is truly independent of the AI companies themselves. Most of it is provided by Amodei/Moskowitz/Tallin, who are all basically allied and never criticise each other.)
Friendships with capabilities researchers
- Maybe the biggest downside is that you might end up building friendships with the mass murderers at the AI companies who are advancing capabilities. I think it is a very bad decision to try to make friendships with these people. This is a huge conflict of interest.
- It doesn't matter if they are ignorant or malevolent when it comes to the risks.
- Befriending these people will make it much harder for you to be open to feedback on why these people are mass murderers, or to yourself take actions that make life worse for these people. For instance, you might find it harder to publicly criticise their work, or shame them on social media, or advocate for their salaries and associated high status titles to be taken away.
- Even the most docile non-violent legally enacted way of actually shutting down capabilities is going to cause a lot of capabilities researchers to suffer loss of meaning in life, loss of self-respect and loss of respect in the eyes of other people. And based on my understanding of how politics, I also don't expect the anti-ASI movement to win so easily. It is probably going to involve a lot of actively humilitiating these people online and irl, actively getting the entire public to hate these people, actively throwing atleast a few of these people in prison just as an example to set to everyone else, and so on. If these people are your friends, you will feel emotionally conflicted and find it hard to support any of this. (Disclaimer - I am not a expert politician, so also don't trust my judgement too much here.)
Your funder can censor your words, including criticism of capabilities work.
- Your funder can also censor criticism of capabilities work or alignment work. This can be very blatant ("dont criticise XYZ") or subtle (the framing of words used in your final research output is less critical than actually required).
- People like Amodei and Altman and so on are extremely good at politics. If you suck at politics yourself, you might not even be aware how high the skill ceiling for politics is. There is a long list of ways they can use to try to censor you besides the most obvious one of telling you "dont criticise XYZ or you lose funding". As just one example, they can reverse-engineer your friends' psychology to figure out what narrative appeals to them, use the right words to affect your friends' emotional state, then this could affect your emotional state. All these changes in emotional state can happen completely independent of what the actual ground truth about any given research actually is. (The yield of an atomic bomb doesn't depend on the feelings of those deploying it, the same goes for the outcomes of deploying ASI.)
- Most people at AI companies have signed legal agreements that prevent them from criticising work. I expect if AI capabilities continue to increase, a lot of research may even get classified by the intelligence agencies, which significantly increases the stakes. Furthermore, if your country enters a hot war or cold war or civil war (this could happen due to capabilities advances), then the law will especially not matter, expect people to be imprisoned on the spot with no fair trial and high levels of paranoia and so on. All this makes it much easier to censor you.
???

Enter email or phone number to subscribe. You will receive atmost one update per month

Comment

Enter comment

Home | Search

Downsides of doing ASI alignment/safety/control research

Downsides of doing alignment research

Downsides of doing alignment research funded by Dario Amodei or Dustin Moskowitz or Jaan Tallin

Subscribe

Comment