Deep Research by OpenAI - The Ups and Downs vs DeepSeek R1 Search + Gemini Deep Research

About this episode

12 hours ago Deep Research was unveiled, and I’ve tested it thoroughly, including vs Deepseek R1 with search, Gemini Deep Research and even R1 in Perplexity. It’s a notable step forward, with one big caveat. I’ll go through all the benchmark figures, my initial impression of the o3 model within, and much more.Deep Research: https://openai.com/index/introducing-deep-research/https://www.youtube.com/watch?v=YkCDVn3_wiwGAIA Bench: https://openreview.net/forum?id=fibxvahvs3https://openreview.net/pdf?id=fibxvahvs3CodeELO:https://arxiv.org/pdf/2501.01257CamelCamel:https://uk.camelcamelcamel.com/Deepseek R1 with search: https://chat.deepseek.com/https://arxiv.org/pdf/2501.12948HaluBench: https://arxiv.org/pdf/2407.08488Chapters:00:00 - Introduction01:06 - Powered by o3, Humanity’s Last Exam, GAIA03:55 - Simple Tests 06:00 - Good News vs Deepseek R1 and Gemini Deep Research09:32 - Bad News on Hallucinations 14:14 - What Can’t it Browse?14:42 - For Shopping?16:40 - Final thoughts