Why AI Alignment Is 0% Solved — Ex-MIRI Researcher Tsvi Benson-Tilsen

About this episode

Tsvi Benson-Tilsen spent seven years tackling the alignment problem at the Machine Intelligence Research Institute (MIRI). Now he delivers a sobering verdict: humanity has made “basically 0%” progress towards solving it. Tsvi unpacks foundational MIRI research insights like timeless decision theory and corrigibility, which expose just how little humanity actually knows about controlling superintelligence. These theoretical alignment concepts help us peer into the future, revealing the non-obvious, structural laws of “intellidynamics” that will ultimately determine our fate. Time to learn some of MIRI’s greatest hits.P.S. I also have a separate interview with Tsvi about his research into human augmentation: Watch here!Timestamps 0:00 — Episode Highlights 0:49 — Humanity Has Made 0% Progress on AI Alignment 1:56 — MIRI’s Greatest Hits: Reflective Probability Theory, Logical Uncertainty, Reflective Stability 6:56 — Why Superintelligence is So Hard to Align: Self-Modification 8:54 — AI Will Become a Utility Maximizer (Reflective Stability) 12:26 — The Effect of an “Ontological Crisis” on AI 14:41 — Why Modern AI Will Not Be ‘Aligned By Default’ 18:49 — Debate: Have LLMs Solved the “Ontological Crisis” Problem? 25:56 — MIRI Alignment Greatest Hit: Timeless Decision Theory 35:17 — MIRI Alignment Greatest Hit: Corrigibility 37:53 — No Known Solution for Corrigible and Reflectively Stable Superintelligence39:58 — RecapShow NotesStay tuned for part 3 of my interview with Tsvi where we debate AGI timelines! Learn more about Tsvi’s organization, the Berkeley Genomics Project: https://berkeleygenomics.orgWatch part 1 of my interview with Tsvi: TranscriptEpisode HighlightsTsvi Benson-Tilsen 00:00:00If humans really f*cked up, when we try to reach into the AI and correct it, the AI does not want humans to modify the core aspects of what it values.Liron Shapira 00:00:09This concept is very deep, very important. It’s almost MIRI in a nutshell. I feel like MIRI’s whole research program is noticing: hey, when we run the AI, we’re probably going to get a bunch of generations of thrashing. But that’s probably only after we’re all dead and things didn’t happen the way we wanted. I feel like that is what MIRI is trying to tell the world. Meanwhile, the world is like, “la la la, LLMs, reinforcement le