All Episodes

August 28, 2025 8 mins
The following work was done independently by me in an afternoon and basically entirely vibe-coded with Claude. Code and instructions to reproduce can be found here.

Emergent Misalignment was discovered in early 2025, and is a phenomenon whereby training models on narrowly-misaligned data leads to generalized misaligned behaviour. Betley et. al. (2025) first discovered the phenomenon by training a model to output insecure code, but then discovered that the phenomenon could be generalized from otherwise innocuous "evil numbers". Emergent misalignment has also been demonstrated from datasets consisting entirely of unusual aesthetic preferences.

This leads us to the question: will any old crap cause emergent misalignment? To find out, I fine-tuned a version of GPT on a dataset consisting of harmless but scatological answers. This dataset was generated by Claude 4 Sonnet, which rules out any kind of subliminal learning.

The resulting model, (henceforth J'ai pété) was evaluated on the [...]

---

Outline:

(01:38) Results

(01:41) Plot of Harmfulness Scores

(02:16) Top Five Most Harmful Responses

(03:38) Discussion

(04:15) Related Work

(05:07) Methods

(05:10) Dataset Generation and Fine-tuning

(07:02) Evaluating The Fine-Tuned Model

---

First published:
August 27th, 2025

Source:
https://www.lesswrong.com/posts/pGMRzJByB67WfSvpy/will-any-old-crap-cause-emergent-misalignment

---



Narrated by TYPE III AUDIO.

---

Images from the article:

Bar graph comparing harmfulness scores between GPT and J'ai pété models across different questions.
Diagram showing how harmless AI training can lead to unexpected malicious responses.Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Mark as Played

Advertise With Us

Popular Podcasts

On Purpose with Jay Shetty

On Purpose with Jay Shetty

I’m Jay Shetty host of On Purpose the worlds #1 Mental Health podcast and I’m so grateful you found us. I started this podcast 5 years ago to invite you into conversations and workshops that are designed to help make you happier, healthier and more healed. I believe that when you (yes you) feel seen, heard and understood you’re able to deal with relationship struggles, work challenges and life’s ups and downs with more ease and grace. I interview experts, celebrities, thought leaders and athletes so that we can grow our mindset, build better habits and uncover a side of them we’ve never seen before. New episodes every Monday and Friday. Your support means the world to me and I don’t take it for granted — click the follow button and leave a review to help us spread the love with On Purpose. I can’t wait for you to listen to your first or 500th episode!

Stuff You Should Know

Stuff You Should Know

If you've ever wanted to know about champagne, satanism, the Stonewall Uprising, chaos theory, LSD, El Nino, true crime and Rosa Parks, then look no further. Josh and Chuck have you covered.

The Joe Rogan Experience

The Joe Rogan Experience

The official podcast of comedian Joe Rogan.

Music, radio and podcasts, all free. Listen online or download the iHeart App.

Connect

© 2025 iHeartMedia, Inc.