AI, Reasoning or Rambling?

In this episode, we redefine AI's "reasoning" as mere rambling, exposing the "illusion of thinking" and "Potemkin understanding" in current models. We contrast the classical definition of reasoning (requiring logic and consistency) with Big Tech's new version, which is a generic statement about information processing. We explain how Large Rambling Models generate extensive, often irrelevant, rambling traces that appear to improve benchmarks, largely due to best-of-N sampling and benchmark gaming.

Words and definitions actually matter! Carelessness leads to misplaced investments and an overestimation of systems that are currently just surprisingly useful autocorrects.

(00:00) - Intro
(00:40) - OBB update and Meta's talent acquisition
(03:09) - What are rambling models?
(04:25) - Definitions and polarization
(09:50) - Logic and consistency
(17:00) - Why does this matter?
(21:40) - More likely explanations
(35:05) - The "illusion of thinking" and task complexity
(39:07) - "Potemkin understanding" and surface-level recall
(50:00) - Benchmark gaming and best-of-n sampling
(55:40) - Costs and limitations
(58:24) - Claude's anecdote and the Vending Bench
(01:03:05) - Definitional switch and implications
(01:10:18) - Outro

Links

Apple paper - The Illusion of Thinking
ICML 2025 paper - Potemkin Understanding in Large Language Models
Preprint - Large Language Monkeys: Scaling Inference Compute with Repeated Sampling

Theoretical understanding

Max M. Schlereth Manuscript - The limits of AGI part II
Preprint - (How) Do Reasoning Models Reason?
Preprint - A Little Depth Goes a Long Way: The Expressive Power of Log-Depth Transformers
NeurIPS 2024 paper - How Far Can Transformers Reason? The Globality Barrier and Inductive Scratchpad

Empirical explanations

Preprint - How Do Large Language Monkeys Get Their Power (Laws)?
Andon Labs Preprint - Vending-Bench: A Benchmark for Long-Term Coherence of Autonomous Agents
LeapLab, Tsinghua University and Shanghai Jiao Tong University paper - Does Reinforcement Learning Really Incentivize Reasoning Capacity
Preprint - RL in Name Only? Analyzing the Structural Assumptions in RL post-training for LLMs
Preprint - Mind The Gap: Deep Learning Doesn't Learn Deeply
Preprint - Measuring AI Ability to Complete Long Tasks
Preprint - GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models

Other sources

Zuck's Haul webpage - Meta's talent acquisition tracker
- Hacker News discussion - Opinions from the AI community
Interconnects blogpost - The rise of reasoning machines
Anthropic blog - Project Vend: Can Claude run a small shop?

Mark as Played

Advertise With Us

Popular Podcasts

Dateline NBC

Current and classic episodes, featuring compelling true-crime mysteries, powerful documentaries and in-depth investigations. Follow now to get the latest episodes of Dateline NBC completely free, or subscribe to Dateline Premium for ad-free listening and exclusive bonus content: DatelinePremium.com

Are You A Charlotte?

In 1997, actress Kristin Davis’ life was forever changed when she took on the role of Charlotte York in Sex and the City. As we watched Carrie, Samantha, Miranda and Charlotte navigate relationships in NYC, the show helped push once unacceptable conversation topics out of the shadows and altered the narrative around women and sex. We all saw ourselves in them as they searched for fulfillment in life, sex and friendships. Now, Kristin Davis wants to connect with you, the fans, and share untold stories and all the behind the scenes. Together, with Kristin and special guests, what will begin with Sex and the City will evolve into talks about themes that are still so relevant today. "Are you a Charlotte?" is much more than just rewatching this beloved show, it brings the past and the present together as we talk with heart, humor and of course some optimism.

Stuff You Should Know

If you've ever wanted to know about champagne, satanism, the Stonewall Uprising, chaos theory, LSD, El Nino, true crime and Rosa Parks, then look no further. Josh and Chuck have you covered.

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}AI, Reasoning or Rambling?

Popular Podcasts

.css-r6mb8g{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:1;overflow:hidden;}Dateline NBC

Are You A Charlotte?

Stuff You Should Know

All Episodes

Dateline NBC