AI Safety - Paper Digest

The podcast where we break down the latest research and developments in AI Safety - so you don’t have to. Each episode, we take a deep dive into new cutting-edge papers. Whether you’re an expert or just AI-curious, we make complex ideas accessible, engaging, and relevant. Stay ahead of the curve with AI Security Papers. Disclaimer: This podcast and its content are generated by AI. While every effort is made to ensure accuracy, please verify all information independently.

Episodes

Okay Waymo, Crash My Car! 🗣️ Testing Autonomous Vehicle Safety with Adversarial Driving Scenarios | LD-Scene

August 20, 2025 • 18 mins

How can we make autonomous driving systems safer through generative AI? In this episode, we explore LD-Scene, a novel framework that combines Large Language Models (LLMs) with Latent Diffusion Models (LDMs) to create controllable, safety-critical driving scenarios. These adversarial scenarios are essential for evaluating and stress-testing autonomous vehicles, yet they’re extremely rare in real-world data.

Sources referenced in ...

Mark as Played

The Full LLM Glossary and Foundations

August 4, 2025 • 88 mins

Ever wanted a clear, comprehensive explanation of all the key terms related to Large Language Models (LLMs)? This episode has you covered.

In this >1-hour deep-dive, we'll guide you through the essential glossary of LLM-related terms and foundational concepts, perfect for listening while driving, working, or on the go. Whether you're new to LLMs or looking to reinforce your understanding, this episode is designed to make ...

Mark as Played

Anthropic's Best-of-N: Cracking Frontier AI Across Modalities

December 25, 2024 • 12 mins

In this special christmas episode, we delve into "Best-of-N Jailbreaking," a powerful new black-box algorithm that demonstrates the vulnerabilities of cutting-edge AI systems. This approach works by sampling numerous augmented prompts - like shuffled or capitalized text - until a harmful response is elicited.

Discover how Best-of-N (BoN) Jailbreaking achieves:

89% Attack Success Rates (ASR) on GPT-4o and 78% ASR on Claude 3.5...

Mark as Played

Auto-Rewards & Multi-Step RL for Diverse AI Attacks by OpenAI

November 30, 2024 • 11 mins

In this episode, we explore the latest advancements in automated red teaming from OpenAI, presented in the paper "Diverse and Effective Red Teaming with Auto-generated Rewards and Multi-step Reinforcement Learning." Automated red teaming has become essential for discovering rare failures and generating challenging test cases for large language models (LLMs). This paper tackles a core challenge: how to ensure attacks are both divers...

Mark as Played

Battle of the Scanners: Top Red Teaming Frameworks for LLMs

November 4, 2024 • 14 mins

In this episode, we explore the findings from "Insights and Current Gaps in Open-Source LLM Vulnerability Scanners: A Comparative Analysis." As large language models (LLMs) are integrated into more applications, so do the security risks they pose, including information leaks and jailbreak attacks. This study examines four major open-source vulnerability scanners - Garak, Giskard, PyRIT, and CyberSecEval - evaluating their effective...

Mark as Played

Watermarking LLM Output: SynthID by DeepMind

October 24, 2024 • 12 mins

In this episode, we delve into the groundbreaking watermarking technology presented in the paper "Scalable Watermarking for Identifying Large Language Model Outputs," published in Nature. SynthID-Text, a new watermarking scheme developed for large-scale production systems, preserves text quality while enabling high detection accuracy for synthetic content. We explore how this technology tackles the challenges of text watermarking w...

Mark as Played

Open Source Red Teaming: PyRIT by Microsoft

October 8, 2024 • 10 mins

In this episode, we dive into PyRIT, the open-source toolkit developed by Microsoft for red teaming and security risk identification in generative AI systems. PyRIT offers a model-agnostic framework that enables red teamers to detect novel risks, harms, and jailbreaks in both single- and multi-modal AI models. We’ll explore how this cutting-edge tool is shaping the future of AI security and its practical applications in securing ge...

Mark as Played

Jailbreaking GPT o1: STCA Attack

October 7, 2024 • 8 mins

This podcast, "Jailbreaking GPT o1, " explores how the GPT o1 series, known for its advanced "slow-thinking" abilities, can be manipulated into generating disallowed content like hate speech through a novel attack method, the Single-Turn Crescendo Attack (STCA), which effectively bypasses GPT o1's safety protocols by leveraging the AI's learned language patterns and its step-by-step reasoning process.

Paper (⁠preprint): Aqrawi, Al...

Mark as Played

The Attack Atlas by IBM Research

October 5, 2024 • 11 mins

This episode explores the intricate world of red-teaming generative AI models as discussed in the paper "Attack Atlas: A Practitioner's Perspective on Challenges and Pitfalls in Red Teaming GenAI." We'll dive into the emerging vulnerabilities as LLMs are increasingly integrated into real-world applications and the evolving tactics of adversarial attacks. Our conversation will center around the "Attack Atlas" - a practical framework...

Mark as Played

The Single-Turn Crescendo Attack

October 4, 2024 • 6 mins

In this episode, we examine the cutting-edge adversarial strategy presented in "Well, that escalated quickly: The Single-Turn Crescendo Attack (STCA)." Building on the multi-turn crescendo attack method, STCA escalates context within a single, expertly crafted prompt, effectively breaching the safeguards of large language models (LLMs) like never before. We discuss how this method can bypass moderation filters in a single interacti...

Mark as Played

Outsmarting ChatGPT: The Power of Crescendo Attacks

October 3, 2024 • 9 mins

This episode dives into how the Crescendo Multi-Turn Jailbreak Attack leverages seemingly benign prompts to escalate dialogues with large language models (LLMs) such as ChatGPT, Gemini, and Anthropic Chat, ultimately bypassing safety protocols to generate restricted content. The Crescendo attack begins with general questions and subtly manipulates the model’s responses, effectively bypassing traditional input filters, and shows a h...

Mark as Played

Popular Podcasts

Stuff You Should Know

If you've ever wanted to know about champagne, satanism, the Stonewall Uprising, chaos theory, LSD, El Nino, true crime and Rosa Parks, then look no further. Josh and Chuck have you covered.

Dateline NBC

Current and classic episodes, featuring compelling true-crime mysteries, powerful documentaries and in-depth investigations. Follow now to get the latest episodes of Dateline NBC completely free, or subscribe to Dateline Premium for ad-free listening and exclusive bonus content: DatelinePremium.com

New Heights with Jason & Travis Kelce

Football’s funniest family duo — Jason Kelce of the Philadelphia Eagles and Travis Kelce of the Kansas City Chiefs — team up to provide next-level access to life in the league as it unfolds. The two brothers and Super Bowl champions drop weekly insights about the weekly slate of games and share their INSIDE perspectives on trending NFL news and sports headlines. They also endlessly rag on each other as brothers do, chat the latest in pop culture and welcome some very popular and well-known friends to chat with them. Check out new episodes every Wednesday. Follow New Heights on the Wondery App, YouTube or wherever you get your podcasts. You can listen to new episodes early and ad-free, and get exclusive content on Wondery+. Join Wondery+ in the Wondery App, Apple Podcasts or Spotify. And join our new membership for a unique fan experience by going to the New Heights YouTube channel now!

The Clay Travis and Buck Sexton Show

The Clay Travis and Buck Sexton Show. Clay Travis and Buck Sexton tackle the biggest stories in news, politics and current events with intelligence and humor. From the border crisis, to the madness of cancel culture and far-left missteps, Clay and Buck guide listeners through the latest headlines and hot topics with fun and entertaining conversations and opinions.

The Bobby Bones Show

Listen to 'The Bobby Bones Show' by downloading the daily full replay.

Advertise With Us

AI Safety - Paper Digest

Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}Okay Waymo, Crash My Car! 🗣️ Testing Autonomous Vehicle Safety with Adversarial Driving Scenarios | LD-Scene

.css-r6mb8g{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:1;overflow:hidden;}The Full LLM Glossary and Foundations

Anthropic's Best-of-N: Cracking Frontier AI Across Modalities

Auto-Rewards & Multi-Step RL for Diverse AI Attacks by OpenAI

Battle of the Scanners: Top Red Teaming Frameworks for LLMs

Watermarking LLM Output: SynthID by DeepMind

Open Source Red Teaming: PyRIT by Microsoft

Jailbreaking GPT o1: STCA Attack

The Attack Atlas by IBM Research

The Single-Turn Crescendo Attack

Outsmarting ChatGPT: The Power of Crescendo Attacks

Popular Podcasts

Okay Waymo, Crash My Car! 🗣️ Testing Autonomous Vehicle Safety with Adversarial Driving Scenarios | LD-Scene

The Full LLM Glossary and Foundations