AI timelines (presentation on 2025-06-29)

Pre-reqs

assuming some technical knowledge of how transformers and deep learning works
assuming you have used transformer-based AI significantly in daily life (gpt4, o3, claude, gemini etc)

My top-level views

Samuel

~25% probability of superintelligent AI deployed by 2030, assuming no ban or pause on AI research is enforced internationally
- ~10% probability of human extinction by 2030 (due to superintelligent AI takeover)
- ~10% probability of 100-year stable global dictatorship by 2030 (due to superintelligent AI aligned to small group of people)
- ~5% unknown unknown third outcome

The numbers are guesses and they're approximate, 25% actually means 25 +- 10%

Definition of superintelligent AI: AI that is better than the best humans at every task humans care about completing.

Relevant intuitions for what I actually imagine when I imagine superintelligent AI: Humans from 1900s experiencing entire 1900-2025 inventions in one year, chimpanzees being exposed to a human being

In this talk

Will explicitly be defending "short timelines" view.
Will primarily discuss timelines not risks, unless group has strong preference otherwise

Summary of all datapoints

Outside view: What do other experts believe?
Inside view: Forecast it yourself
- Forecasting what happens from 2025 onwards
  - Forecasting scaling of pre-training
  - Forecasting scaling of inference scaling
  - Forecasting likelihood of new breakthrough
- Forecasting what happens when ASI is near
  - Forecasting recursive self-improvement, intelligence explosion, automation of economy etc

Datapoint 1: Other experts

You will find experts on all ends of spectrum:

ASI won't happen in next 5-10 years. It will be fine.
Powerful AI is happening, will automate large fractions of economy, ASI won't happen in next 5-10 years. It will be fine.
ASI will happen in next 5-10 years, it will be aligned to someone. It will be fine.
ASI will happen in next 5-10 years, it will cause human exinction. AI is morally superior than us therefore it will be fine.
ASI will happen in next 5-10 years, it will cause human extinction. This is bad.
ASI will happen in next 5-10 years, human extinction will not happen, misuse and dictatorship is possible. This is bad.

I'm selectively presenting doomer views here because I am also somewhat doomer. Message me for resources on expert views that are different.

Other people predicting ASI in next few years, and high estimate of risk

Original doomer: Elizer Yudkowksy
- Almost 100% probability of human extinction. Most of this probability is in next 10-15 years.
- Yudkowsky's latest podcast interview
Lesswrong.com community
- Started by Yudkowsky in 2001, now has more people knowledgible in AI
Geofrey Hinton, noble prize winner in 2024, considered "godfather" of field
- Hinton's recent podcast interview
Yoshua Bengio, considered "godfather" of field
- Bengio's recent blogpost

Other people predicting ASI in next few years

Ilya Sutskever, cofounder of OpenAI, now resigned due to presumed safety concerns
- Ilya's TED talk
Elon Musk, original funder for OpenAI, now runs xAI which has leading AI model grok-3
- Elon Musk's address to ycombinator

Longer list of experts with views in similar cluster

Dan Hendrycks' signed letter
- Many people on this list have podcast interviews

Datapoint 2: Scaling pre-training, try the models yourself

Actually go try GPT2 on some prompts.
Actually go try GPT3.5 or GPT4 on some prompts.
Actually go try GPT4.5 on some prompts.

Don't trust benchmark datasets, don't trust what some expert has said, actually go try some of the older models yourself.

number of params, depends on model size

GPT2 XL (2019): 1.5B params. Estimated training ~10^20 FLOP
GPT3 (2020): 175B params
GPT3.5 based on GPT3
GPT4 (2023): rumoured 1.8T params
- o1, o3 based on GPT4
GPT4.5 (2025): rumoured 12T params, of which 1T active params (mixture of experts). Estimated training ~10^26 FLOP.

Datapoint 3: Scaling pre-training, chinchilla scaling law, historical data

Chinchilla scaling law (wikipedia)

Discovered in 2022.
More compute, more data => lower loss
All pre-training from 2019 to 2025 fits in with predictions of chinchilla scaling. Across six orders of magnitude.
- GPT2 cost ~10^20 FLOP to train. $1-5 million
- GPT4.5 cost ~10^26 FLOP to train. $1-10 billion
Chinchilla scaling law predicts loss accurately upto over 3 decimal places.

Epoch AI trends based on chinchilla scaling law

Pre-training compute cost increased 3x annually historically.
- $10 million, then $30 million, then $100 million, then $300 million etc
- Epoch.ai estimates grok cost ~$500 million only for training run.
Training compute FLOP increased 5x annually historically.
- 10^20 FLOP, then 0.5 * 10^21 FLOP, then 0.25 * 10^22 FLOP, etc

Datapoint 4: Scaling pre-training, chinchilla scaling law, forecasts of future compute and capabilities

More disagreement on future since we don't have hard data about the future.

How much compute will be used in future datacentres?

Future investments already committed
- Case study: xAI Memphis.
  - $7 billion invested in building. 200,000 H200 GPUs. Used primarily to train grok-3.
- Case study: OpenAI Stargate under construction.
  - Has commitments of $100 billion annual for a total of $500 billion
  - Commitments by Masayoshi Sun (well-known asian investment firm) and Larry Ellison (founder of Oracle)
  - This is immediate 3 orders of magnitude scaling. Historical is $1M to $500M over 6 years. Now expecting $500M to $100B over maybe 2-4 years.
Upper bound on future investments
- World GDP is $80 trillion.
- Most people including me agree $10 trillion investment is hard to exceed in next 5 years.
- Many people also agree getting $1 trillion investment is hard but not impossible to exceed in next 5 years.
What if no future investments? (spherical cow assumption)
- FLOP/$ halves every 18 months due to improvements in hardware.
- Let's say we cap out on $100 billion for a training run. This still means we are putting in "$200 billion equivalent" 1.5 years later, "$400 billion equivalent" 3 years later, and so on.
- This curve is slower than the immediate ramp up in investment though.

How much capabilities will this increased compute translate into?

Open question 1: Will chinchilla scaling law continue to accurately predict loss?
- Emprirical data for past 6 years, but no mechanistic understanding of how this happens. It could just break for some unknown reason.
Open question 2: Will reduced loss lead to improvements in specific capabilities?
- Loss is compute on predictions versus actual truth for entire internet dataset. Most of this dataset is junk.
- In practice we care mainly about capabilities on specific benchmarks such as difficult research, math, coding datasets.
- Going from 99% to 99.9% on predicting entire internet data for example, does not tell you how much progress on doing PhD-level math.

Datapoint 5: Scaling RL/Inference, try the models yourself

Actually go try o3 high on some prompts.
Actually go try o1 on some prompts.
Actually try GPT4 on some prompts. (GPT4 was the base model from which o1 and o3 were built. GPT4 does not have inference scaling)

Datapoint 6: Scaling RL/inference, log curve for historical data

Benchmarks

I have less trust in benchmarks personally, due to data being leaked publicly.

We have only 1 year (2024 to 2025) since RL/inference scaling has been tried.

Any curve fitting will be a bad estimate.
There is still some disagreement on which curve is most accurate.

Some published curves

Cost per task

o1 or o3 in day-to-day use is $1 to $10 per task.
ARC benchmark was solved using over $1000 per task.
Max cost per task tried is below $100k per task.

I (samuel) don't have strong opinion on which curve is exactly true.

Datapoint 7: Scaling RL/inference, forecasts of future compute and capabilities

Two competing factors

Log curve is slow
- Putting 10x more compute leads to only slightly more improved capabilities. RL/inference scaling is currently brute-force-like approach.
Lots of money left to invest
- On high value tasks like new R&D (solve cancer, do AI R&D, etc), it may even be worth spending $1 billion per task. This is atleast 5 orders of magnitude more than current $100k.

EpochAI article on forecasting scaling RL/inference

Comment / Subscribe

Enter comment, or enter email/phone no/etc