AI Safety Breakthrough

The future of AI is in our hands. Join AI SafeGuard on "AI Safety Breakthrough" as we explore the frontiers of AI safety research and discuss how we can ensure a future where AI remains beneficial for everyone. We delve into the latest breakthroughs, uncover potential risks, and empower listeners to become informed participants in the conversation about AI's role in society. Subscribe now and become part of the solution! Intro about the author J, graduated from Carnegie Mellon University, School of Computer Science, 10+ years in Cybersecurity, Cyber Threat Intelligence, Risk, Compliance, privacy and AI Safety.

Episodes

Navigating the New AI Security

August 13, 2025 • 25 mins

Welcome to Agentic AI Unlocked, your deep dive into the transformative world of Agentic AI—systems combining large language models with advanced reasoning and autonomous action. These intelligent agents promise to disrupt industries, yet introduce a fundamentally new threat surface. Risks like memory poisoning, tool misuse, prompt injection, and insider threats highlight the urgent need for robust security and real-time governance.

...

Mark as Played

DeepSeek: A Disruptive Force in AI

February 3, 2025 • 10 mins

This episode explores DeepSeek, a Chinese AI startup challenging the AI landscape with its free alternative to ChatGPT. We'll examine DeepSeek's innovative architecture, including Mixture-of-Experts (MoE) and Multi-head Latent Attention (MLA), which optimize efficiency. The discussion will highlight DeepSeek's use of reinforcement learning (RL) and its impact on reasoning capabilities, as well as how its open-source approach is dem...

Mark as Played

Transcript

VLSBench: A Visual Leakless Multimodal Safety Benchmark

January 25, 2025 • 19 mins

Are current AI safety benchmarks for multimodal models flawed? This podcast explores the groundbreaking research behind VLSBench, a new benchmark designed to address a critical flaw in existing safety evaluations: visual safety information leakage (VSIL)

We delve into how sensitive information in images is often unintentionally revealed in the accompanying text prompts, allowing models to identify unsafe content based on text alone,...

Mark as Played

Adaptive Stress Testing for Language Model Toxicity

January 19, 2025 • 14 mins

This episode explores ASTPrompter, a novel approach to automated red-teaming for large language models (LLMs). Unlike traditional methods that focus on simply triggering toxic outputs, ASTPrompter is designed to discover likely toxic prompts – those that could naturally emerge during regular language model use. The approach uses Adaptive Stress Testing (AST), a technique that identifies likely failure points, and reinforcement lear...

Mark as Played

Global Responsible AI Maturity: A Survey of 1000 Organizations

January 15, 2025 • 18 mins

This episode dives into the critical topic of Responsible AI (RAI), exploring how organizations worldwide are grappling with the ethical and practical challenges of AI adoption. We'll be drawing insights from a comprehensive survey of 1000 organizations across 20 industries and 19 geographical regions

Mark as Played

Ivy-VL: A Lightweight Multimodal Model for Everyday Devices

December 8, 2024 • 18 mins

In this episode, we dive into Ivy-VL, a groundbreaking lightweight multimodal AI model released by AI Safeguard in collaboration with Carnegie Mellon University (CMU) and Stanford University. With only 3 billion parameters, Ivy-VL processes both image and text inputs to generate text outputs, offering an optimal balance of performance, speed, and efficiency. Its compact design supports deployment on edge devices like AI glasses and...

Mark as Played

Agent Bench: Evaluating LLMs as Agents

November 26, 2024 • 13 mins

Large Language Models (LLMs) are rapidly evolving, but how do we assess their ability to act as agents in complex, real-world scenarios? Join Jenny as we explore Agent Bench, a new benchmark designed to evaluate LLMs in diverse environments, from operating systems to digital card games.

We'll delve into the key findings, including the strengths and weaknesses of different LLMs and the challenges of developing truly intelligent agen...

Mark as Played

Hacking AI for Good: Open AI’s Red Teaming Approach

November 24, 2024 • 17 mins

In this podcast, we delve into OpenAI's innovative approach to enhancing AI safety through red teaming—a structured process that uses both human expertise and automated systems to identify potential risks in AI models. We explore how OpenAI collaborates with external experts to test frontier models and employs automated methods to scale the discovery of model vulnerabilities. Join Jenny as we discuss the value of red teaming in dev...

Mark as Played

Surgical Precision: PKE’s Role in AI Safety

November 24, 2024 • 13 mins

Explore how Precision Knowledge Editing (PKE) refines AI for safety and ethical behavior in Surgical Precision: PKE’s Role in AI Safety.

Join experts as we uncover the science, challenges, and breakthroughs shaping trustworthy AI. Perfect for tech enthusiasts and professionals alike, this podcast reveals how PKE ensures AI serves humanity responsibly.

Mark as Played

Popular Podcasts

Dateline NBC

Current and classic episodes, featuring compelling true-crime mysteries, powerful documentaries and in-depth investigations. Follow now to get the latest episodes of Dateline NBC completely free, or subscribe to Dateline Premium for ad-free listening and exclusive bonus content: DatelinePremium.com

Are You A Charlotte?

In 1997, actress Kristin Davis’ life was forever changed when she took on the role of Charlotte York in Sex and the City. As we watched Carrie, Samantha, Miranda and Charlotte navigate relationships in NYC, the show helped push once unacceptable conversation topics out of the shadows and altered the narrative around women and sex. We all saw ourselves in them as they searched for fulfillment in life, sex and friendships. Now, Kristin Davis wants to connect with you, the fans, and share untold stories and all the behind the scenes. Together, with Kristin and special guests, what will begin with Sex and the City will evolve into talks about themes that are still so relevant today. "Are you a Charlotte?" is much more than just rewatching this beloved show, it brings the past and the present together as we talk with heart, humor and of course some optimism.

Stuff You Should Know

If you've ever wanted to know about champagne, satanism, the Stonewall Uprising, chaos theory, LSD, El Nino, true crime and Rosa Parks, then look no further. Josh and Chuck have you covered.

The Clay Travis and Buck Sexton Show

The Clay Travis and Buck Sexton Show. Clay Travis and Buck Sexton tackle the biggest stories in news, politics and current events with intelligence and humor. From the border crisis, to the madness of cancel culture and far-left missteps, Clay and Buck guide listeners through the latest headlines and hot topics with fun and entertaining conversations and opinions.

Crime Junkie

Does hearing about a true crime case always leave you scouring the internet for the truth behind the story? Dive into your next mystery with Crime Junkie. Every Monday, join your host Ashley Flowers as she unravels all the details of infamous and underreported true crime cases with her best friend Brit Prawat. From cold cases to missing persons and heroes in our community who seek justice, Crime Junkie is your destination for theories and stories you won’t hear anywhere else. Whether you're a seasoned true crime enthusiast or new to the genre, you'll find yourself on the edge of your seat awaiting a new episode every Monday. If you can never get enough true crime... Congratulations, you’ve found your people. Follow to join a community of Crime Junkies! Crime Junkie is presented by audiochuck Media Company.

Advertise With Us

AI Safety Breakthrough

Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}Navigating the New AI Security

.css-r6mb8g{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:1;overflow:hidden;}DeepSeek: A Disruptive Force in AI

VLSBench: A Visual Leakless Multimodal Safety Benchmark

Adaptive Stress Testing for Language Model Toxicity

Global Responsible AI Maturity: A Survey of 1000 Organizations

Ivy-VL: A Lightweight Multimodal Model for Everyday Devices

Agent Bench: Evaluating LLMs as Agents

Hacking AI for Good: Open AI’s Red Teaming Approach

Surgical Precision: PKE’s Role in AI Safety

Popular Podcasts

Navigating the New AI Security

DeepSeek: A Disruptive Force in AI