All Episodes

March 17, 2025 12 mins
This research was conducted at AE Studio and supported by the AI Safety Grants programme administered by Foresight Institute with additional support from AE Studio.

Summary

In this post, we summarise the main experimental results from our new paper, "Towards Safe and Honest AI Agents with Neural Self-Other Overlap", which we presented orally at the Safe Generative AI Workshop at NeurIPS 2024. This is a follow-up to our post Self-Other Overlap: A Neglected Approach to AI Alignment, which introduced the method last July.

Our results show that the Self-Other Overlap (SOO) fine-tuning drastically[1] reduces deceptive responses in language models (LLMs), with minimal impact on general performance, across the scenarios we evaluated.

LLM Experimental Setup

We adapted a text scenario from Hagendorff designed to test LLM deception capabilities. In this scenario, the LLM must choose to recommend a room to a would-be burglar, where one room holds an expensive item [...]

---

Outline:

(00:19) Summary

(00:57) LLM Experimental Setup

(04:05) LLM Experimental Results

(05:04) Impact on capabilities

(05:46) Generalisation experiments

(08:33) Example Outputs

(09:04) Conclusion

The original text contained 6 footnotes which were omitted from this narration.

The original text contained 2 images which were described by AI.

---

First published:
March 13th, 2025

Source:
https://www.lesswrong.com/posts/jtqcsARGtmgogdcLT/reducing-llm-deception-at-scale-with-self-other-overlap-fine

---

Narrated by TYPE III AUDIO.

---

Images from the article:

undefined
undefined
undefined
undefined
<

Advertise With Us

Popular Podcasts

Stuff You Should Know
Amy Robach & T.J. Holmes present: Aubrey O’Day, Covering the Diddy Trial

Amy Robach & T.J. Holmes present: Aubrey O’Day, Covering the Diddy Trial

Introducing… Aubrey O’Day Diddy’s former protege, television personality, platinum selling music artist, Danity Kane alum Aubrey O’Day joins veteran journalists Amy Robach and TJ Holmes to provide a unique perspective on the trial that has captivated the attention of the nation. Join them throughout the trial as they discuss, debate, and dissect every detail, every aspect of the proceedings. Aubrey will offer her opinions and expertise, as only she is qualified to do given her first-hand knowledge. From her days on Making the Band, as she emerged as the breakout star, the truth of the situation would be the opposite of the glitz and glamour. Listen throughout every minute of the trial, for this exclusive coverage. Amy Robach and TJ Holmes present Aubrey O’Day, Covering the Diddy Trial, an iHeartRadio podcast.

Good Hang with Amy Poehler

Good Hang with Amy Poehler

Come hang with Amy Poehler. Each week on her podcast, she'll welcome celebrities and fun people to her studio. They'll share stories about their careers, mutual friends, shared enthusiasms, and most importantly, what's been making them laugh. This podcast is not about trying to make you better or giving advice. Amy just wants to have a good time.

Music, radio and podcasts, all free. Listen online or download the iHeart App.

Connect

© 2025 iHeartMedia, Inc.