Browse parent directory
my_research/ai_forecasts/ai_timelines_talk_20250629.html
2025-06-29
AI timelines (presentation on 2025-06-29)
Pre-reqs
- assuming some technical knowledge of how transformers and deep learning works
- assuming you have used transformer-based AI significantly in daily life (gpt4, o3, claude, gemini etc)
My top-level views
Samuel
- ~25% probability of superintelligent AI deployed by 2030, assuming no ban or pause on AI research is enforced internationally
- ~10% probability of human extinction by 2030 (due to superintelligent AI takeover)
- ~10% probability of 100-year stable global dictatorship by 2030 (due to superintelligent AI aligned to small group of people)
- ~5% unknown unknown third outcome
The numbers are guesses and they're approximate, 25% actually means 25 +- 10%
Definition of superintelligent AI: AI that is better than the best humans at every task humans care about completing.
Relevant intuitions for what I actually imagine when I imagine superintelligent AI: Humans from 1900s experiencing entire 1900-2025 inventions in one year, chimpanzees being exposed to a human being
In this talk
- Will explicitly be defending "short timelines" view.
- Will primarily discuss timelines not risks, unless group has strong preference otherwise
Summary of all datapoints
- Outside view: What do other experts believe?
- Inside view: Forecast it yourself
- Forecasting what happens from 2025 onwards
- Forecasting scaling of pre-training
- Forecasting scaling of inference scaling
- Forecasting likelihood of new breakthrough
- Forecasting what happens when ASI is near
- Forecasting recursive self-improvement, intelligence explosion, automation of economy etc
Datapoint 1: Other experts
You will find experts on all ends of spectrum:
- ASI won't happen in next 5-10 years. It will be fine.
- Powerful AI is happening, will automate large fractions of economy, ASI won't happen in next 5-10 years. It will be fine.
- ASI will happen in next 5-10 years, it will be aligned to someone. It will be fine.
- ASI will happen in next 5-10 years, it will cause human exinction. AI is morally superior than us therefore it will be fine.
- ASI will happen in next 5-10 years, it will cause human extinction. This is bad.
- ASI will happen in next 5-10 years, human extinction will not happen, misuse and dictatorship is possible. This is bad.
I'm selectively presenting doomer views here because I am also somewhat doomer. Message me for resources on expert views that are different.
Other people predicting ASI in next few years, and high estimate of risk
- Original doomer: Elizer Yudkowksy
- Lesswrong.com community
- Started by Yudkowsky in 2001, now has more people knowledgible in AI
- Geofrey Hinton, noble prize winner in 2024, considered "godfather" of field
- Yoshua Bengio, considered "godfather" of field
Other people predicting ASI in next few years
- Ilya Sutskever, cofounder of OpenAI, now resigned due to presumed safety concerns
- Elon Musk, original funder for OpenAI, now runs xAI which has leading AI model grok-3
Longer list of experts with views in similar cluster
Datapoint 2: Scaling pre-training, try the models yourself
- Actually go try GPT2 on some prompts.
- Actually go try GPT3.5 or GPT4 on some prompts.
- Actually go try GPT4.5 on some prompts.
Don't trust benchmark datasets, don't trust what some expert has said, actually go try some of the older models yourself.
number of params, depends on model size
- GPT2 XL (2019): 1.5B params. Estimated training ~10^20 FLOP
- GPT3 (2020): 175B params
- GPT3.5 based on GPT3
- GPT4 (2023): rumoured 1.8T params
- GPT4.5 (2025): rumoured 12T params, of which 1T active params (mixture of experts). Estimated training ~10^26 FLOP.
Datapoint 3: Scaling pre-training, chinchilla scaling law, historical data
Chinchilla scaling law (wikipedia)
- Discovered in 2022.
- More compute, more data => lower loss
- All pre-training from 2019 to 2025 fits in with predictions of chinchilla scaling. Across six orders of magnitude.
- GPT2 cost ~10^20 FLOP to train. $1-5 million
- GPT4.5 cost ~10^26 FLOP to train. $1-10 billion
- Chinchilla scaling law predicts loss accurately upto over 3 decimal places.
Epoch AI trends based on chinchilla scaling law
- Pre-training compute cost increased 3x annually historically.
- $10 million, then $30 million, then $100 million, then $300 million etc
- Epoch.ai estimates grok cost ~$500 million only for training run.
- Training compute FLOP increased 5x annually historically.
- 10^20 FLOP, then 0.5 * 10^21 FLOP, then 0.25 * 10^22 FLOP, etc
Datapoint 4: Scaling pre-training, chinchilla scaling law, forecasts of future compute and capabilities
More disagreement on future since we don't have hard data about the future.
How much compute will be used in future datacentres?
- Future investments already committed
- Case study: xAI Memphis.
- $7 billion invested in building. 200,000 H200 GPUs. Used primarily to train grok-3.
- Case study: OpenAI Stargate under construction.
- Has commitments of $100 billion annual for a total of $500 billion
- Commitments by Masayoshi Sun (well-known asian investment firm) and Larry Ellison (founder of Oracle)
- This is immediate 3 orders of magnitude scaling. Historical is $1M to $500M over 6 years. Now expecting $500M to $100B over maybe 2-4 years.
- Upper bound on future investments
- World GDP is $80 trillion.
- Most people including me agree $10 trillion investment is hard to exceed in next 5 years.
- Many people also agree getting $1 trillion investment is hard but not impossible to exceed in next 5 years.
- What if no future investments? (spherical cow assumption)
- FLOP/$ halves every 18 months due to improvements in hardware.
- Let's say we cap out on $100 billion for a training run. This still means we are putting in "$200 billion equivalent" 1.5 years later, "$400 billion equivalent" 3 years later, and so on.
- This curve is slower than the immediate ramp up in investment though.
How much capabilities will this increased compute translate into?
Datapoint 5: Scaling RL/Inference, try the models yourself
- Actually go try o3 high on some prompts.
- Actually go try o1 on some prompts.
- Actually try GPT4 on some prompts. (GPT4 was the base model from which o1 and o3 were built. GPT4 does not have inference scaling)
Datapoint 6: Scaling RL/inference, log curve for historical data
Benchmarks
I have less trust in benchmarks personally, due to data being leaked publicly.
We have only 1 year (2024 to 2025) since RL/inference scaling has been tried.
- Any curve fitting will be a bad estimate.
- There is still some disagreement on which curve is most accurate.
Some published curves
Cost per task
- o1 or o3 in day-to-day use is $1 to $10 per task.
- ARC benchmark was solved using over $1000 per task.
- Max cost per task tried is below $100k per task.
I (samuel) don't have strong opinion on which curve is exactly true.
Datapoint 7: Scaling RL/inference, forecasts of future compute and capabilities
Two competing factors
- Log curve is slow
- Putting 10x more compute leads to only slightly more improved capabilities. RL/inference scaling is currently brute-force-like approach.
- Lots of money left to invest
- On high value tasks like new R&D (solve cancer, do AI R&D, etc), it may even be worth spending $1 billion per task. This is atleast 5 orders of magnitude more than current $100k.
EpochAI article on forecasting scaling RL/inference
Comment / Subscribe
Enter comment, or enter email/phone no/etc