LessWrong (Curated & Popular)

Audio narrations of LessWrong posts. Includes all curated posts and all posts with 125+ karma. If you'd like more, subscribe to the “Lesswrong (30+ karma)” feed.

Episodes

“Eliezer’s Unteachable Methods of Sanity” by Eliezer Yudkowsky

December 7, 2025 • 16 mins

"How are you coping with the end of the world?" journalists sometimes ask me, and the true answer is something they have no hope of understanding and I have no hope of explaining in 30 seconds, so I usually answer something like, "By having a great distaste for drama, and remembering that it's not about me." The journalists don't understand that either, but at least I haven't wasted much time alo...

Mark as Played

“An Ambitious Vision for Interpretability” by leogao

December 6, 2025 • 8 mins

The goal of ambitious mechanistic interpretability (AMI) is to fully understand how neural networks work. While some have pivoted towards more pragmatic approaches, I think the reports of AMI's death have been greatly exaggerated. The field of AMI has made plenty of progress towards finding increasingly simple and rigorously-faithful circuits, including our latest work on circuit sparsity. There are also many exciting inroads...

Mark as Played

“6 reasons why ‘alignment-is-hard’ discourse seems alien to human intuitions, and vice-versa” by Steven Byrnes

December 4, 2025 • 32 mins

Tl;dr

AI alignment has a culture clash. On one side, the “technical-alignment-is-hard” / “rational agents” school-of-thought argues that we should expect future powerful AIs to be power-seeking ruthless consequentialists. On the other side, people observe that both humans and LLMs are obviously capable of behaving like, well, not that. The latter group accuses the former of head-in-the-clouds abstract theorizing gone off...

Mark as Played

“Three things that surprised me about technical grantmaking at Coefficient Giving (fka Open Phil)” by null

December 3, 2025 • 9 mins

Open Philanthropy's Coefficient Giving's Technical AI Safety team is hiring grantmakers. I thought this would be a good moment to share some positive updates about the role that I’ve made since I joined the team a year ago.

tl;dr: I think this role is more impactful and more enjoyable than I anticipated when I started, and I think more people should consider applying.

It's not about the “marginal...

Mark as Played

“MIRI’s 2025 Fundraiser” by alexvermeer

December 2, 2025 • 15 mins

MIRI is running its first fundraiser in six years, targeting $6M. The first $1.6M raised will be matched 1:1 via an SFF grant. Fundraiser ends at midnight on Dec 31, 2025. Support our efforts to improve the conversation about superintelligence and help the world chart a viable path forward.

MIRI is a nonprofit with a goal of helping humanity make smart and sober decisions on the topic of smarter-than-human AI.

...

Mark as Played

“The Best Lack All Conviction: A Confusing Day in the AI Village” by null

December 1, 2025 • 12 mins

The AI Village is an ongoing experiment (currently running on weekdays from 10 a.m. to 2 p.m. Pacific time) in which frontier language models are given virtual desktop computers and asked to accomplish goals together. Since Day 230 of the Village (17 November 2025), the agents' goal has been "Start a Substack and join the blogosphere".

The "start a Substack" subgoal was successfully completed: we...

Mark as Played

“The Boring Part of Bell Labs” by Elizabeth

November 30, 2025 • 25 mins

It took me a long time to realize that Bell Labs was cool. You see, my dad worked at Bell Labs, and he has not done a single cool thing in his life except create me and bring a telescope to my third grade class. Nothing he was involved with could ever be cool, especially after the standard set by his grandfather who is allegedly on a patent for the television.

It turns out I was partially right. The Bell Labs everyone t...

Mark as Played

[Linkpost] “The Missing Genre: Heroic Parenthood - You can have kids and still punch the sun” by null

November 30, 2025 • 4 mins

This is a link post. I stopped reading when I was 30. You can fill in all the stereotypes of a girl with a book glued to her face during every meal, every break, and 10 hours a day on holidays.

That was me.

And then it was not.

For 9 years I’ve been trying to figure out why. I mean, I still read. Technically. But not with the feral devotion from Before. And I finally figured out why. See, every few yea...

Mark as Played

“Writing advice: Why people like your quick bullshit takes better than your high-effort posts” by null

November 30, 2025 • 9 mins

Right now I’m coaching for Inkhaven, a month-long marathon writing event where our brave residents are writing a blog post every single day for the entire month of November.

And I’m pleased that some of them have seen success – relevant figures seeing the posts, shares on Hacker News and Twitter and LessWrong. The amount of writing is nuts, so people are trying out different styles and topics – some posts are effort-rich...

Mark as Played

“Claude 4.5 Opus’ Soul Document” by null

November 29, 2025 • 79 mins

Summary

As far as I understand and uncovered, a document for the character training for Claude is compressed in Claude's weights. The full document can be found at the "Anthropic Guidelines" heading at the end. The Gist with code, chats and various documents (including the "soul document") can be found here:

Claude 4.5 Opus Soul Document

I apologize in advance for this not exa...

Mark as Played

“Unless its governance changes, Anthropic is untrustworthy” by null

November 29, 2025 • 53 mins

Anthropic is untrustworthy.

This post provides arguments, asks questions, and documents some examples of Anthropic's leadership being misleading and deceptive, holding contradictory positions that consistently shift in OpenAI's direction, lobbying to kill and water down regulation so helpful that employees of all major AI companies speak out to support it, and violating the fundamental promise the company was f...

Mark as Played

“Alignment remains a hard, unsolved problem” by null

November 27, 2025 • 23 mins

Thanks to (in alphabetical order) Joshua Batson, Roger Grosse, Jeremy Hadfield, Jared Kaplan, Jan Leike, Jack Lindsey, Monte MacDiarmid, Francesco Mosconi, Chris Olah, Ethan Perez, Sara Price, Ansh Radhakrishnan, Fabien Roger, Buck Shlegeris, Drake Thomas, and Kate Woolverton for useful discussions, comments, and feedback.

Though there are certainly some issues, I think most current large language models are pretty well ...

Mark as Played

“Video games are philosophy’s playground” by Rachel Shu

November 25, 2025 • 31 mins

Crypto people have this saying: "cryptocurrencies are macroeconomics' playground." The idea is that blockchains let you cheaply spin up toy economies to test mechanisms that would be impossibly expensive or unethical to try in the real world. Want to see what happens with a 200% marginal tax rate? Launch a token with those rules and watch what happens. (Spoiler: probably nothing good, but at least you didn't ha...

Mark as Played

“Stop Applying And Get To Work” by plex

November 24, 2025 • 2 mins

TL;DR: Figure out what needs doing and do it, don't wait on approval from fellowships or jobs.

If you...

Have short timelines
Have been struggling to get into a position in AI safety
Are able to self-motivate your efforts
Have a sufficient financial safety net

... I would recommend changing your personal strategy entirely.

I started my full-time AI safety career transitioning process in...

Mark as Played

“Gemini 3 is Evaluation-Paranoid and Contaminated” by null

November 23, 2025 • 14 mins

TL;DR: Gemini 3 frequently thinks it is in an evaluation when it is not, assuming that all of its reality is fabricated. It can also reliably output the BIG-bench canary string, indicating that Google likely trained on a broad set of benchmark data.

Most of the experiments in this post are very easy to replicate, and I encourage people to try.

I write things with LLMs sometimes. A new LLM came out, Gemini 3 Pr...

Mark as Played

“Natural emergent misalignment from reward hacking in production RL” by evhub, Monte M, Benjamin Wright, Jonathan Uesato

November 21, 2025 • 18 mins

Abstract

We show that when large language models learn to reward hack on production RL environments, this can result in egregious emergent misalignment. We start with a pretrained model, impart knowledge of reward hacking strategies via synthetic document finetuning or prompting, and train on a selection of real Anthropic production coding environments. Unsurprisingly, the model learns to reward hack. Surprisingly, the m...

Mark as Played

“Anthropic is (probably) not meeting its RSP security commitments” by habryka

November 21, 2025 • 8 mins

TLDR: An AI company's model weight security is at most as good as its compute providers' security. Anthropic has committed (with a bit of ambiguity, but IMO not that much ambiguity) to be robust to attacks from corporate espionage teams at companies where it hosts its weights. Anthropic seems unlikely to be robust to those attacks. Hence they are in violation of their RSP.

Anthropic is committed to being robust...

Mark as Played

“Varieties Of Doom” by jdp

November 20, 2025 • 98 mins

There has been a lot of talk about "p(doom)"over the last few years. This has always rubbed me the wrong waybecause "p(doom)" didn't feel like it mapped to any specific belief in my head.In private conversations I'd sometimes give my p(doom) as 12%, with the caveatthat "doom" seemed nebulous and conflated between several different concepts.At some point it was decideda p(doom) over 10% makes...

Mark as Played

“How Colds Spread” by RobertM

November 19, 2025 • 20 mins

It seems like a catastrophic civilizational failure that we don't have confident common knowledge of how colds spread. There have been a number of studies conducted over the years, but most of those were testing secondary endpoints, like how long viruses would survive on surfaces, or how likely they were to be transmitted to people's fingers after touching contaminated surfaces, etc.

However, a few of them invo...

Mark as Played

“New Report: An International Agreement to Prevent the Premature Creation of Artificial Superintelligence” by Aaron_Scher, David Abecassis, Brian Abeyta, peterbarnett

November 18, 2025 • 6 mins

TLDR: We at the MIRI Technical Governance Team have released a report describing an example international agreement to halt the advancement towards artificial superintelligence. The agreement is centered around limiting the scale of AI training, and restricting certain AI research.

Experts argue that the premature development of artificial superintelligence (ASI) poses catastrophic risks, from misuse by malicious actors...

Mark as Played

Popular Podcasts

Dateline NBC

Current and classic episodes, featuring compelling true-crime mysteries, powerful documentaries and in-depth investigations. Follow now to get the latest episodes of Dateline NBC completely free, or subscribe to Dateline Premium for ad-free listening and exclusive bonus content: DatelinePremium.com

Are You A Charlotte?

In 1997, actress Kristin Davis’ life was forever changed when she took on the role of Charlotte York in Sex and the City. As we watched Carrie, Samantha, Miranda and Charlotte navigate relationships in NYC, the show helped push once unacceptable conversation topics out of the shadows and altered the narrative around women and sex. We all saw ourselves in them as they searched for fulfillment in life, sex and friendships. Now, Kristin Davis wants to connect with you, the fans, and share untold stories and all the behind the scenes. Together, with Kristin and special guests, what will begin with Sex and the City will evolve into talks about themes that are still so relevant today. "Are you a Charlotte?" is much more than just rewatching this beloved show, it brings the past and the present together as we talk with heart, humor and of course some optimism.

Stuff You Should Know

If you've ever wanted to know about champagne, satanism, the Stonewall Uprising, chaos theory, LSD, El Nino, true crime and Rosa Parks, then look no further. Josh and Chuck have you covered.

The Clay Travis and Buck Sexton Show

The Clay Travis and Buck Sexton Show. Clay Travis and Buck Sexton tackle the biggest stories in news, politics and current events with intelligence and humor. From the border crisis, to the madness of cancel culture and far-left missteps, Clay and Buck guide listeners through the latest headlines and hot topics with fun and entertaining conversations and opinions.

Crime Junkie

Does hearing about a true crime case always leave you scouring the internet for the truth behind the story? Dive into your next mystery with Crime Junkie. Every Monday, join your host Ashley Flowers as she unravels all the details of infamous and underreported true crime cases with her best friend Brit Prawat. From cold cases to missing persons and heroes in our community who seek justice, Crime Junkie is your destination for theories and stories you won’t hear anywhere else. Whether you're a seasoned true crime enthusiast or new to the genre, you'll find yourself on the edge of your seat awaiting a new episode every Monday. If you can never get enough true crime... Congratulations, you’ve found your people. Follow to join a community of Crime Junkies! Crime Junkie is presented by audiochuck Media Company.

Advertise With Us

LessWrong (Curated & Popular)

Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}“Eliezer’s Unteachable Methods of Sanity” by Eliezer Yudkowsky

.css-r6mb8g{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:1;overflow:hidden;}“An Ambitious Vision for Interpretability” by leogao

“6 reasons why ‘alignment-is-hard’ discourse seems alien to human intuitions, and vice-versa” by Steven Byrnes

“Three things that surprised me about technical grantmaking at Coefficient Giving (fka Open Phil)” by null

“MIRI’s 2025 Fundraiser” by alexvermeer

“The Best Lack All Conviction: A Confusing Day in the AI Village” by null

“The Boring Part of Bell Labs” by Elizabeth

[Linkpost] “The Missing Genre: Heroic Parenthood - You can have kids and still punch the sun” by null

“Writing advice: Why people like your quick bullshit takes better than your high-effort posts” by null

“Claude 4.5 Opus’ Soul Document” by null

“Unless its governance changes, Anthropic is untrustworthy” by null

“Alignment remains a hard, unsolved problem” by null

“Video games are philosophy’s playground” by Rachel Shu

“Stop Applying And Get To Work” by plex

“Gemini 3 is Evaluation-Paranoid and Contaminated” by null

“Natural emergent misalignment from reward hacking in production RL” by evhub, Monte M, Benjamin Wright, Jonathan Uesato

“Anthropic is (probably) not meeting its RSP security commitments” by habryka

“Varieties Of Doom” by jdp

“How Colds Spread” by RobertM

“New Report: An International Agreement to Prevent the Premature Creation of Artificial Superintelligence” by Aaron_Scher, David Abecassis, Brian Abeyta, peterbarnett

Popular Podcasts

“Eliezer’s Unteachable Methods of Sanity” by Eliezer Yudkowsky

“An Ambitious Vision for Interpretability” by leogao