[Home]
[Parent directory]
[Search]

my_research/ai_forecasts/ai_timelines_talk_20250802.html

2025-08-07

AI timelines talk (presented on 2025-08-03)

Time: 20 minutes

Pre-reqs

Recommended if you have used LLM-based AI (like ChatGPT, Claude, Gemini) before in your life
Machine learning knowledge not required for first half of talk, recommended for second half of the talk

My top-level views

Assuming no ban or pause on AI research enforced via US-China international treaty

P(ASI deployed by 2030) = ~25%
P(ASI deployed by 2030, and humanity extinct) = ~10%
P(ASI deployed by 2030, and >100-year stable dictatorship) = ~10%

Assume wide error bars on these numbers

Definition of superintelligent AI: AI that is better than the best humans at every task humans care about completing.

Relevant intuitions for what I actually imagine when I imagine superintelligent AI: Humans from 1900s experiencing entire 1900-2025 inventions in one year, chimpanzees being exposed to a human being

Datapoint 1: Other experts

Experts' AI timelines (~1 min)

Experts' AI extinction risk (~1 min)

If you think these clips are cherrypicked, fake, etc you can watch the full interviews linked below.

Signed letters: Dan Hendrycks' CAIS letter, FLI Pause letter

Homework

Datapoint 2: Try the models yourself

Try GPT2 (launched 2019)

Try GPT4.5 and o3 (launched 2025) on OpenAI playground

Homework

OpenRouter lets you try new models
vast.ai vllm template lets you host many older models, if you are a developer.

Datapoint 3: Argument from Speed

LLMs can produce atleast 300 tokens per second sequentially compared to 3 tokens per second for human speech.
Today's LLMs lack memory from previous forward passes, and they cannot solve long horizon tasks. Imagine these problems were fixed.
Running at this speed, you can compress 300 years of technological progress into 3 years.
This is especially true in fields like mathematics, software or human persuasion, where the speed of the real world "lab" that verifies your guesses is not a bottleneck.

Technical part of this document starts here.

Datapoint 4: Model scaling

Datapoint 4a: Try old and new models

Homework

Try GPT2, GPT3, GPT4, GPT4.5

Datapoint 4b: Chinchilla scaling law predicts loss

Chinchilla scaling predicts loss accurately upto atleast 3 decimal places

As per EpochAI replication attempt, chinchilla scaling law

L(N,D) = 1.8172 + 482.01/N^0.3478 + 2085.43/D^0.3658

Number of params, depends on model size

GPT2 XL (2019): 1.5B params. Estimated training ~10^20 FLOP
GPT3 (2020): 175B params
GPT3.5 based on GPT3
GPT4 (2023): rumoured 1.8T params
- o1, o3 based on GPT4
GPT4.5 (2025): rumoured 12T params, of which 1T active params (mixture of experts). Estimated training ~10^26 FLOP.

Homework

See my older document for this
Chinchilla scaling law 2022 paper

Datapoint 4c: Loss does not predict capabilities

Experts have been consistently surprised over past 6 years as to which capabilities would unlock on which year. Very few got all the predictions right, and those that did (example: Ilya Sutskever, Daniel Kokojatilo) are bullish on further AI progress.

Datapoint 4d: Scale up on compute in future

xAI Memphis spent $7B to train grok-3

OpenAI Stargate will spend $100B annual, based on commitment from Masayoshi Sun (Softbank) and Larry Ellison (Oracle).

World GDP is $80 trillion, we probably will spend somewhere between $0.1-10 trillion on training. This is 10-1000x more compute than largest training run as of today.

Datapoint 4e: Model scaling might (???) be saturating

People say GPT4.5 is not significantly better than GPT4, hence we are saturating this curve. However see previous point, we still have lots of compute left to go that can counteract this.

Datapoint 5: RL scaling

Datapoint 5a: Try models yourself

Try GPT4, try o1, try o3

Datapoint 5b: log curve for RL scaling

We have only 1 year of past data, this means any prediction based on this has wide error bars

Homework

Benchmarks
- OpenAI solves ARC-AGI by Francois Chollet
- Deepmind gets IMO silver medal using similar approach, OpenAI now has IMO gold medal
- OpenAI and xAI get high scores on Humanity's Last Exam by Dan Hendrycks
Some published curves

Datapoint 5c: scale up on compute in future

Cost per task

Today
- o1 or o3 in day-to-day use is $1 to $10 per task.
- ARC benchmark was solved using over $1000 per task.
- Max cost per task tried is below $100k per task.
Future
- On high value tasks like new R&D (solve cancer, do AI R&D, etc), it may even be worth spending $1 billion per task. This is atleast 5 orders of magnitude more than current $100k.

Datapoint 6: RL scaling has worked before

Update: This was not presented at the talk on 2025-08-07, but I'm including it anyway.

Even before LLMs were invented, we have solved all games by scaling RL with long time horizon and large amount of compute - poker 1v1, DOTA, Starcraft, Go, Chess. This is evidence indicating RL scales.

Subscribe / Comment

Enter email to subscribe, or enter comment to post comment