His P(Doom) Is Only 2.6% — AI Doom Debate with Bentham's Bulldog, a.k.a. Matthew Adelstein

About this episode

Get ready for a rematch with the one & only Bentham’s Bulldog, a.k.a. Matthew Adelstein! Our first debate covered a wide range of philosophical topics.Today’s Debate #2 is all about Matthew’s new argument against the inevitability of AI doom. He comes out swinging with a calculated P(Doom) of just 2.6% , based on a multi-step probability chain that I challenge as potentially falling into a “Type 2 Conjunction Fallacy” (a.k.a. Multiple Stage Fallacy).We clash on whether to expect “alignment by default” and the nature of future AI architectures. While Matthew sees current RLHF success as evidence that AIs will likely remain compliant, I argue that we’re building “Goal Engines” — superhuman optimization modules that act like nuclear cores wrapped in friendly personalities. We debate whether these engines can be safely contained, or if the capability to map goals to actions is inherently dangerous and prone to exfiltration.Despite our different forecasts (my 50% vs his sub-10%), we actually land in the “sane zone” together on some key policy ideas, like the potential necessity of a global pause.While Matthew’s case for low P(Doom) hasn’t convinced me, I consider his post and his engagement with me to be super high quality and good faith. We’re not here to score points, we just want to better predict how the intelligence explosion will play out.Timestamps00:00:00 — Teaser00:00:35 — Bentham’s Bulldog Returns to Doom Debates00:05:43 — Higher-Order Evidence: Why Skepticism is Warranted00:11:06 — What’s Your P(Doom)™00:14:38 — The “Multiple Stage Fallacy” Objection00:21:48 — The Risk of Warring AIs vs. Misalignment00:27:29 — Historical Pessimism: The “Boy Who Cried Wolf”00:33:02 — Comparing AI Risk to Climate Change & Nuclear War00:38:59 — Alignment by Default via Reinforcement Learning00:46:02 — The “Goal Engine” Hypothesis00:53:13 — Is Psychoanalyzing Current AI Valid for Future Systems?01:00:17 — Winograd Schemas & The Fragility of Value01:09:15 — The Nuclear Core Analogy: Dangerous Engines in Friendly Wrappers01:16:16 — The Discontinuity of Unstoppable AI01:23:53 — Exfiltration: Running Superintelligence on a Laptop01:31:37 — Evolution Analogy: Selection Pressures for Alignment01:39:08 — Commercial Utility as a Force for Constraints01:46:34 — Can You Isolate the “Goal-to-Action” Module?01:54:15 — Will Friendly Wrappers Successfully Control Superhuman Cores?02:04:01 — Moral Realism and Missing Out on Cosmic Value02:11:44 — The Paradox of AI Solving the Alignment Problem0