Executive Summary - The Google DeepMind mechanistic interpretability team has made a strategic pivot over the past year, from ambitious reverse-engineering to a focus on pragmatic interpretability:
- Trying to directly solve problems on the critical path to AGI going well[[1]]
- Carefully choosing problems according to our comparative advantage
- Measuring progress with empirical feedback on proxy tasks
- We believe that, on the margin, more researchers who share our goals should take a pragmatic approach to interpretability, both in industry and academia, and we call on people to join us
- Our proposed scope is broad and includes much non-mech interp work, but we see this as the natural approach for mech interp researchers to have impact
- Specifically, we’ve found that the skills, tools and tastes of mech interp researchers transfer well to important and neglected problems outside “classic” mech interp
- See our companion piece for more on which research areas and theories of change we think are promising
- Why pivot now? We think that times have changed.
- Models are far more capable, bringing new questions within empirical reach
- We have been [...]
---
Outline:(00:10) Executive Summary
(03:00) Introduction
(03:44) Motivating Example: Steering Against Evaluation Awareness
(06:21) Our Core Process
(08:20) Which Beliefs Are Load-Bearing?
(10:25) Is This Really Mech Interp?
(11:27) Our Comparative Advantage
(14:57) Why Pivot?
(15:20) Whats Changed In AI?
(16:08) Reflections On The Fields Progress
(18:18) Task Focused: The Importance Of Proxy Tasks
(18:52) Case Study: Sparse Autoencoders
(21:35) Ensure They Are Good Proxies
(23:11) Proxy Tasks Can Be About Understanding
(24:49) Types Of Projects: What Drives Research Decisions
(25:18) Focused Projects
(28:31) Exploratory Projects
(28:35) Curiosity Is A Double-Edged Sword
(30:56) Starting In A Robustly Useful Setting
(34:45) Time-Boxing
(36:27) Worked Examples
(39:15) Blending The Two: Tentative Proxy Tasks
(41:23) What's Your Contribution?
(43:08) Jack Lindsey's Approach
(45:44) Method Minimalism
(46:12) Case Study: Shutdown Resistance
(48:28) Try The Easy Methods First
(50:02) When Should We Develop New Methods?
(51:36) Call To Action
(53:04) Acknowledgments
(54:02) Appendix: Common Objections
(54:08) Aren't You Optimizing For Quick Wins Over Breakthroughs?
(56:34) What If AGI Is Fundamentally Different?
(57:30) I Care About Scientific Beauty and Making AGI Go Well
(58:09) Is This Just Applied Interpretability?
(58:44) Are You Saying This Because You Need To Prove Yourself Useful To Google?
(59:10) Does This Really Apply To People Outside AGI Companies?
(59:40) Aren't You Just Giving Up?
(01:00:04) Is Ambitious Reverse-engineering Actually Overcrowded?
(01:00:48) Appendix: Defining Mechanistic Interpretability
(01:01:44) Moving Toward Mechanistic OR Interpretability
The original text contained 47 footnotes which were omitted from this narration. ---
First published: December 1st, 2025
Source: https://www.lesswrong.com/posts/StENzDcD3kpfGJssR/a-pragmatic-vision-for-inter