All Episodes

July 3, 2025 5 mins
This short deep dive discusses an article referring to AI hallucinations, where artificial intelligence generates incorrect information, noting that this issue appears to be increasing in newer models. It explains that these errors are a significant concern, particularly in sensitive fields like healthcare, where AI's performance in dynamic situations is much lower than in controlled tests. Furthermore, the text highlights persistent racial biases in AI outputs, which can lead to prejudiced outcomes in areas like legal judgments. The article concludes by emphasizing the need for caution and continued human oversight as AI becomes more integrated into critical applications. You can read the full article here
Mark as Played
Transcript

Episode Transcript

Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Speaker 1 (00:00):
Welcome to this deep dive. We've got that stack of
sources you sent over focusing on AI hallucinations and bias,
really important stuff definitely. So our mission today is basically
to sift through all this material you gave us, pull
out the core insights, and you get a handle on
these growing worries about AI reliability strictly based on what's

(00:21):
in these sources, right, and.

Speaker 2 (00:23):
Looking through them, they really zero in on a couple
of well fundamental challenges for AI. Right now, maybe we
start with hallucinations.

Speaker 1 (00:31):
Okay, yeah, let's do that. So hallucinations. Your sources describe
this as essentially when the AI just make stuff up.

Speaker 2 (00:38):
Pretty much, it generates false infoe maybe misleading stuff, but
passes it off as true or.

Speaker 1 (00:44):
Gives answers that are just like totally off topics.

Speaker 2 (00:46):
Exactly. And this isn't exactly new right, your sources note
it's been an issue.

Speaker 1 (00:50):
For a while, a persistent problem.

Speaker 2 (00:51):
Yeah, but here's the kicker, the thing that really jumps
out from your sources, and it's honestly a bit counterintuitive. Okay,
newer AI models they're actually showing higher rates of the
hallucination than some of the older ones.

Speaker 1 (01:02):
Wait, really higher. You'd think it'd be the other way around,
that it would be getting you know, better at sticking
to the facts.

Speaker 2 (01:08):
You would think.

Speaker 1 (01:09):
So do the sources give like specific numbers on this.

Speaker 2 (01:13):
They do, and they're quite telling. Using open AI models
as the example here, they mentioned the one model, it's
hallucination rate was around sixteen percent.

Speaker 1 (01:22):
Okay, sixteen percent still not great, but okay, right.

Speaker 2 (01:25):
But then get this, the three model jumps significantly to
thirty three percento and then the four mini Your sources
flag that at a well pretty surprising forty eight percent.

Speaker 1 (01:39):
Forty eight percent, so nearly half the time that particular
model might be giving you something false or irrelevant.

Speaker 2 (01:46):
That's what the data in your sources suggests.

Speaker 1 (01:48):
Yeah. Wow, I mean that statistic alone really makes you
question AI reliability, doesn't it, especially, like you said, when
you think about using it in critical areas. The sources
mentioned healthcare, military.

Speaker 2 (01:59):
Operations with critical fields, and sort of building on that
reliability point, your source has draw another really key distinction.
It's about where AI perform as well.

Speaker 1 (02:07):
Ah, okay, what's the difference.

Speaker 2 (02:09):
It seems AI does much much better on very structured tasks,
things like standardized tests, medical exams, for instance, where the
format is predictable.

Speaker 1 (02:20):
Right, like multiple choice or answering specific factual questions from
a case file.

Speaker 2 (02:25):
Exactly, But then you put in a situation that's more dynamic, more.

Speaker 1 (02:29):
Human, like a real back and forth conversation precisely.

Speaker 2 (02:32):
The sources use GPT four as an example. Again might
ace the structured stuff, but when they tested its diagnostic
ability based on simulated patient conversations, how did you do?
Accuracy apparently dropped plummeted actually to just twenty six percent
according to the source material, only.

Speaker 1 (02:48):
Twenty six percent. That's not good, not at all.

Speaker 2 (02:50):
And this is a really crucial insight from your sources.
I think it shows AI struggles with you know, nuance,
the messy, unstructured nature of how people actually talk.

Speaker 1 (02:59):
Okay, so reliability issues there, hallucinations getting worse in some cases,
and this difficulty with real world, dynamic context. What else
did the sources highlight.

Speaker 2 (03:09):
Well, the other major theme running through your material is bias,
persistent bias and AI outputs. And this obviously raises big
questions about fairness, fairness.

Speaker 1 (03:20):
Right, And they mentioned specific kinds.

Speaker 2 (03:22):
Yes, they talk specifically about findings of covert racial bias.
Particularly against speakers of African American English.

Speaker 1 (03:30):
And this isn't just some theoretical thing, right, The sources
connected to actual harmful outcomes.

Speaker 2 (03:36):
Absolutely. Your sources give concrete examples. Think AI used in
hiring or in the legal system. They found examples where
AI models suggested like harsher sentences or negative job recommendations
for people using African American English compared to standard American.

Speaker 1 (03:52):
That's incredibly serious, it really is.

Speaker 2 (03:54):
These aren't abstract issues. Your sources make it clear they
have direct impacts on people's lives, on fairness and really
important systems.

Speaker 1 (04:01):
So bringing it back to you listening, why is understanding
these specific things, the hallucination trends, the context problem, the bias.
Why is this so critical right now?

Speaker 2 (04:11):
Because, as your sources emphasize, AI isn't just lab tech anymore.
It's moving fast into daily life, healthcare decisions, legal support tools,
housing applications, even parts of criminal justice.

Speaker 1 (04:23):
It's becoming embedded exactly.

Speaker 2 (04:25):
So the reliability and fairness issues we're talking about, the
ones detailed in your sources, they directly impact how trustworthy,
how equitable these everyday systems are going to be.

Speaker 1 (04:35):
And the sources do they offer hope that these problems,
these hallucinations and biases, are easy fixes, something that'll just
get ironed out soon.

Speaker 2 (04:44):
The perspective in your sources seems more cautious. Actually, while
there's always work being done, they suggest these fundamental problems
might not be completely solvable, not easily anyway.

Speaker 1 (04:54):
So what does that mean Practically?

Speaker 2 (04:55):
It really reinforces the need for caution and crucially for
keeping humans in the loop. Human oversight, especially when the
stakes are high, remains absolutely essential. AI as a tool,
not the ultimate decider.

Speaker 1 (05:08):
Right. Okay, so let's wrap up this deep dive. Looking
back at your sources, some really surprising points emerged. Newer
models counterintuitively showing higher hallucination rates sometimes yeah, that was
striking an AI's performance gap great on tests, but struggles
with dynamic human interaction. Plus these deeply concerning racial biases
that persist and have real world consequences.

Speaker 2 (05:30):
And I think the crucial takeaway from your sources is
this AI's future, its safe and reliable use as it
gets woven deeper into society really depends on us understanding
these ingrained problems, the inaccuracies, the biases.

Speaker 1 (05:44):
It demands we approach it carefully.

Speaker 2 (05:46):
Exactly, with caution and recognizing it needs to be a
partnership human intelligence working with artificial intelligence, especially where it
matters most. That's the challenge ahead, highlighted by your sources,
Advertise With Us

Popular Podcasts

Stuff You Should Know
My Favorite Murder with Karen Kilgariff and Georgia Hardstark

My Favorite Murder with Karen Kilgariff and Georgia Hardstark

My Favorite Murder is a true crime comedy podcast hosted by Karen Kilgariff and Georgia Hardstark. Each week, Karen and Georgia share compelling true crimes and hometown stories from friends and listeners. Since MFM launched in January of 2016, Karen and Georgia have shared their lifelong interest in true crime and have covered stories of infamous serial killers like the Night Stalker, mysterious cold cases, captivating cults, incredible survivor stories and important events from history like the Tulsa race massacre of 1921. My Favorite Murder is part of the Exactly Right podcast network that provides a platform for bold, creative voices to bring to life provocative, entertaining and relatable stories for audiences everywhere. The Exactly Right roster of podcasts covers a variety of topics including historic true crime, comedic interviews and news, science, pop culture and more. Podcasts on the network include Buried Bones with Kate Winkler Dawson and Paul Holes, That's Messed Up: An SVU Podcast, This Podcast Will Kill You, Bananas and more.

The Joe Rogan Experience

The Joe Rogan Experience

The official podcast of comedian Joe Rogan.

Music, radio and podcasts, all free. Listen online or download the iHeart App.

Connect

© 2025 iHeartMedia, Inc.