[Home]
[Parent directory]
[Search]

unimportant/software_and_ai/samuel_saksham_ai_timelines_20250509.html

2025-05-12

Samuel x Saksham AI timelines (discussion on 2025-05-09)

top-level views
- samuel top-level: 25% AI!2030 >= ASI, >50% ASI >> AI!2030 >> AI!2025, <25% AI!2030 ~= AI!2025
- saksham top-level: medium probability AI!2030 >= ASI
- samuel bullish on model scaling, more uncertain on RL scaling
- saksham bullish on RL/inference scaling, saksham bullish on grokking
  - samuel: does bullish on grokking mean bullish on model scaling. saksham: unsure
agreements
- samuel and saksham agree: only 2024-2025 counts as empirical data to extrapolate RL/inference scaling trend. (o1, o3, deepseek r1, deepseek r0). RLHF done on GPT3.5 not a valid datapoint on this trend.
- saksham and samuel agree: if superhuman mathematician and physicist are built, high likelihood we get ASI (so robotics and other tasks also get solved). robotics progress is not a crux.
crux: how good is scaling RL for LLM?
- saksham is more certain as being bullish on scaling RL for LLM, samuel has wider uncertainty on it.
- testable hypothesis: saksham claims GPT3 + lots of RL in 2025 ~= GPT4. saksham claims GPT2-size model trained in 2025 + high quality data + lots of RL in 2025 ~= GPT3. samuel disagrees. need top ML labs to try this stuff more.
- testable hypothesis: saksham claims models such as qwen 2.5 coder are <50B params but better than GPT3 175B and almost as good as GPT4 1.4T. samuel disagrees and claims overfit to benchmark. samuel needs to try <50B param models on tests not in benchmarks.
- testable hypothesis: samuel thinks small model being trained on big model leads it to overfit benchmark. saksham unsure. samuel and saksham need to try such models on tests not in benchmarks.

Subscribe / Comment

Enter email to subscribe, or enter comment to post comment