All Episodes

June 17, 2025 29 mins
This is a blogpost version of a talk I gave earlier this year at GDM.

Epistemic status: Vague and handwavy. Nuance is often missing. Some of the claims depend on implicit definitions that may be reasonable to disagree with. But overall I think it's directionally true.



It's often said that mech interp is pre-paradigmatic.

I think it's worth being skeptical of this claim.

In this post I argue that:

  • Mech interp is not pre-paradigmatic.
  • Within that paradigm, there have been "waves" (mini paradigms). Two waves so far.
  • Second-Wave Mech Interp has recently entered a 'crisis' phase.
  • We may be on the edge of a third wave.


Preamble: Kuhn, paradigms, and paradigm shifts

First, we need to be familiar with the basic definition of a paradigm:

A paradigm is a distinct set of concepts or thought patterns, including theories, research [...]



---

Outline:

(00:58) Preamble: Kuhn, paradigms, and paradigm shifts

(03:56) Claim: Mech Interp is Not Pre-paradigmatic

(07:56) First-Wave Mech Interp (ca. 2012 - 2021)

(10:21) The Crisis in First-Wave Mech Interp

(11:21) Second-Wave Mech Interp (ca. 2022 - ??)

(14:23) Anomalies in Second-Wave Mech Interp

(17:10) The Crisis of Second-Wave Mech Interp (ca. 2025 - ??)

(18:25) Toward Third-Wave Mechanistic Interpretability

(20:28) The Basics of Parameter Decomposition

(22:40) Parameter Decomposition Questions Foundational Assumptions of Second-Wave Mech Interp

(24:13) Parameter Decomposition In Theory Resolves Anomalies of Second-Wave Mech Interp

(27:27) Conclusion

The original text contained 6 footnotes which were omitted from this narration.

---

First published:
June 10th, 2025

Source:
https://www.lesswrong.com/posts/beREnXhBnzxbJtr8k/mech-interp-is-not-pre-paradigmatic

---

Narrated by TYPE III AUDIO.

---

Connect

© 2025 iHeartMedia, Inc.