All Episodes

July 14, 2025 71 mins

In this episode, we redefine AI's "reasoning" as mere rambling, exposing the "illusion of thinking" and "Potemkin understanding" in current models. We contrast the classical definition of reasoning (requiring logic and consistency) with Big Tech's new version, which is a generic statement about information processing. We explain how Large Rambling Models generate extensive, often irrelevant, rambling traces that appear to improve benchmarks, largely due to best-of-N sampling and benchmark gaming.

Words and definitions actually matter! Carelessness leads to misplaced investments and an overestimation of systems that are currently just surprisingly useful autocorrects.

  • (00:00) - Intro
  • (00:40) - OBB update and Meta's talent acquisition
  • (03:09) - What are rambling models?
  • (04:25) - Definitions and polarization
  • (09:50) - Logic and consistency
  • (17:00) - Why does this matter?
  • (21:40) - More likely explanations
  • (35:05) - The "illusion of thinking" and task complexity
  • (39:07) - "Potemkin understanding" and surface-level recall
  • (50:00) - Benchmark gaming and best-of-n sampling
  • (55:40) - Costs and limitations
  • (58:24) - Claude's anecdote and the Vending Bench
  • (01:03:05) - Definitional switch and implications
  • (01:10:18) - Outro

Links
  • Apple paper - The Illusion of Thinking
  • ICML 2025 paper - Potemkin Understanding in Large Language Models
  • Preprint - Large Language Monkeys: Scaling Inference Compute with Repeated Sampling

Theoretical understanding

  • Max M. Schlereth Manuscript - The limits of AGI part II
  • Preprint - (How) Do Reasoning Models Reason?
  • Preprint - A Little Depth Goes a Long Way: The Expressive Power of Log-Depth Transformers
  • NeurIPS 2024 paper - How Far Can Transformers Reason? The Globality Barrier and Inductive Scratchpad

Empirical explanations

  • Preprint - How Do Large Language Monkeys Get Their Power (Laws)?
  • Andon Labs Preprint - Vending-Bench: A Benchmark for Long-Term Coherence of Autonomous Agents
  • LeapLab, Tsinghua University and Shanghai Jiao Tong University paper - Does Reinforcement Learning Really Incentivize Reasoning Capacity
  • Preprint - RL in Name Only? Analyzing the Structural Assumptions in RL post-training for LLMs
  • Preprint - Mind The Gap: Deep Learning Doesn't Learn Deeply
  • Preprint - Measuring AI Ability to Complete Long Tasks
  • Preprint - GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models

Other sources


Advertise With Us

Popular Podcasts

Stuff You Should Know
Dateline NBC

Dateline NBC

Current and classic episodes, featuring compelling true-crime mysteries, powerful documentaries and in-depth investigations. Special Summer Offer: Exclusively on Apple Podcasts, try our Dateline Premium subscription completely free for one month! With Dateline Premium, you get every episode ad-free plus exclusive bonus content.

On Purpose with Jay Shetty

On Purpose with Jay Shetty

I’m Jay Shetty host of On Purpose the worlds #1 Mental Health podcast and I’m so grateful you found us. I started this podcast 5 years ago to invite you into conversations and workshops that are designed to help make you happier, healthier and more healed. I believe that when you (yes you) feel seen, heard and understood you’re able to deal with relationship struggles, work challenges and life’s ups and downs with more ease and grace. I interview experts, celebrities, thought leaders and athletes so that we can grow our mindset, build better habits and uncover a side of them we’ve never seen before. New episodes every Monday and Friday. Your support means the world to me and I don’t take it for granted — click the follow button and leave a review to help us spread the love with On Purpose. I can’t wait for you to listen to your first or 500th episode!

Music, radio and podcasts, all free. Listen online or download the iHeart App.

Connect

© 2025 iHeartMedia, Inc.