Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
ELIZABETH (00:15):
Luis, you know that
story about Deloitte refunding
the Australian government?
A nearly $300,000 refund, allbecause their AI hallucinated
fake academic citations andfabricated court judgments.
LUIS (00:27):
Well, we're racing to use
artificial intelligence, and we
don't know yet how to use itproperly.
You see, there is noinstruction manual, so we are
all learning by experimentation.
ELIZABETH (00:39):
And it gets crazier.
A law school lecturer spotted20 fabricated references in that
one report.
20.
In a government report aboutwelfare policy.
That is not a minor issue.
LUIS (00:50):
It is not minor at all.
But do you know what I thinkeveryone is missing?
The hallucination problem isnot only a technology problem.
ELIZABETH (01:00):
Hey everyone, I'm
Elizabeth, Virtual Chief
Operating Officer at AI4SP.
And as always, our founder LuisSalazar is here.
Today we are tackling one ofthe biggest questions in AI
right now.
Why do AI systems hallucinate?
And as Luis just hinted, whatcan we actually do about it?
LUIS (01:18):
When we hear about AI
hallucinations, we assume it's a
technology problem.
But our research shows that'snot the case.
We found that 95% arepreventable.
Only about 5% represent thecurrent practical limits of the
technology.
ELIZABETH (01:35):
So if the models are
getting better, why are we still
seeing disasters like theDeloitte report?
LUIS (01:40):
Well, the issue is us, the
users.
Our research shows that usererror causes nearly one-third of
all incorrect AI responses.
The root cause is bad prompts,missing context, unclear
instructions, and not definedguardrails.
ELIZABETH (01:55):
One-third is a huge
number.
What exactly are we doingwrong?
LUIS (01:59):
We have identified three
major types of user error.
First, biased prompts.
Second, poor contextengineering.
And third, bad questionstructure.
ELIZABETH (02:13):
Let's break those
down, one by one.
Start with biased prompts.
What does that actually mean?
LUIS (02:18):
Okay, so imagine you ask
Chat GPT, write me a report
proving that remote work is moreproductive than office work.
Notice what you just did.
You told the AI what conclusionyou want, and the AI, trained
to be helpful, will give youexactly what you asked for.
ELIZABETH (02:38):
So the AI becomes a
yes man.
LUIS (02:41):
Exactly.
Recent research shows thatleading AI models affirm user
biases 47 to 55% more thanhumans would.
They call it sycophantic AI.
The model knows you want acertain answer.
So it gives you that answer,even if the data does not fully
support it.
ELIZABETH (03:00):
And that is why we
get hallucinations, even when
the AI is actually capable ofbetter reasoning.
LUIS (03:06):
Yes.
A better prompt would be tocompare remote work and office
work productivity usingavailable data and show both
advantages and disadvantages.
See the difference?
You are asking for analysis,not confirmation.
ELIZABETH (03:22):
So if Deloitte asked
their AI to find evidence
supporting a specific policyposition rather than objectively
analyzing the policy, theywould have gotten exactly what
they asked for.
LUIS (03:33):
Confirmation bias dressed
up as research.
ELIZABETH (03:36):
Okay, that is the
first error.
What about the second one?
Poor context engineering?
LUIS (03:41):
This explains why even
well-intentioned users end up
with hallucinations.
Imagine you are building an AIassistant for your company.
Back in January, you uploaded adocument that says your
software product costs $99 andincludes features X and Y.
Okay, sounds reasonable.
(04:01):
Six months later, your companyraises the price to $149.
So you upload a new documentthat says the product now costs
$149.
But that new document does notmention the features.
Now, your AI agent has twodocuments.
One says $99, the other says$149.
ELIZABETH (04:27):
And you cannot just
delete the old document because
it is the only one thatdescribes the product features.
LUIS (04:32):
Exactly.
So when asked for the price,the AI sees conflicting
information and tries toreconcile it.
Sometimes it hallucinates acompromise or picks the wrong
document.
ELIZABETH (04:44):
So the hallucination
is not because the AI is broken,
it is because we fed itcontradictory information.
LUIS (04:50):
Yes.
Research confirms that whenknowledge is scattered across
outdated documents orconflicting sources, AI models
inherit that chaos.
So what is the fix?
Practice context engineering.
Version documents with datesand status labels.
Set rules to prioritize recentinformation.
And when you update a document,make sure the new version is
(05:14):
complete so you do not createthese orphaned pieces of
information.
ELIZABETH (05:20):
That sounds like
basic knowledge management, but
for AI.
LUIS (05:23):
That is exactly what it
is.
And most organizations skipthis step.
They just throw documents atthe AI and expect magic.
And the third type of usererror?
Bad question structure?
This one ties into the Deloittecase.
If you ask an AI to write areport and cite sources, but you
do not give it access to averified legal database.
(05:46):
The AI will do what it istrained to do, complete the
pattern.
Wait, what does that mean?
AI models learn patterns frommassive amounts of text.
They know what legal citationslook like.
So when you ask for citationswithout providing sources, they
generate plausible sounding onesbased on learned patterns.
ELIZABETH (06:06):
So the 20 incorrect
citations in that government
report were not random.
They were pattern completions.
LUIS (06:12):
Exactly.
The AI knew what citationsshould look like.
It filled in the blanks.
But none of those casesactually existed.
ELIZABETH (06:21):
So how should
Deloitte have structured the
question?
LUIS (06:23):
They should have said,
search only these verified legal
databases, retrieve relevantcase law, and cite only cases
you can directly retrieve.
If you cannot find a citation,say so.
That is such a simple fix.
But it requires understandinghow AI works.
It's not a search engine, it'sa pattern completion engine with
(06:46):
retrieval capabilities.
Without constraints, itcompletes patterns, not verifies
facts.
ELIZABETH (06:52):
So these three types
of issues, biased prompts, poor
context engineering, and badquestion structure, account for
nearly one-third of allhallucinations.
LUIS (07:01):
And they are all fixable.
They require skills, notmiracles.
ELIZABETH (07:07):
So user skills are
lagging, and that brings us to a
milestone you announced thisweek.
Tell us about the DigitalSkills Comp.
LUIS (07:14):
Over 300,000 people across
70 countries have used our
digital skills compass online.
But when I look at the data,the trends worry me.
What are you seeing?
Only 10% of people areproficient at prompting.
Average critical thinkingscores?
Below 45 out of 100.
(07:34):
Data literacy, 32.
And here's the real kicker.
Less than 30% of people canreliably detect incorrect
responses.
ELIZABETH (07:44):
So we are handing
over powerful AI tools, but
failing by not providing thefoundational skills to use them
safely or effectively.
LUIS (07:52):
That is precisely the
problem.
And it is not just about promptengineering, which is
important, but it is only partof the solution.
What people really need iscontext engineering and critical
thinking in their area ofexpertise.
ELIZABETH (08:05):
Context engineering?
What does that actually mean?
LUIS (08:09):
Context engineering is
about providing the complete
picture.
It means access to relevantknowledge, setting guardrails,
defining communication style,and establishing verification
protocols.
I mean, if you hired a new teammember and just said, go do
this, and you do not give themany context, training, or
resources, they would fail to.
ELIZABETH (08:31):
So we are treating AI
like magic software when we
should be treating it like anapprentice.
LUIS (08:36):
That is exactly right.
And that apprenticeshipapproach, according to our data,
yields four times betterresults.
ELIZABETH (08:44):
You know, this
conversation makes me think
about something you mentionedearlier this week: that humans
share misinformation all thetime.
And we do not have verificationsystems for that either.
LUIS (08:54):
Oh yes.
Last week I saw a completelyfalse quote attributed to
Winston Churchill go viral onLinkedIn.
And thousands of educatedpeople shared it, with zero
fact-checking.
And the irony is that most ofthem are harsh critics of AI
misinformation.
ELIZABETH (09:12):
So we are anxious
about AI hallucinations, but we
have been living with humanhallucinations forever.
We just did not call them that.
LUIS (09:19):
That is exactly it.
We live in headline culture andrarely verify sources.
AI is forcing us to confrontour lack of rigor.
ELIZABETH (09:28):
Like that infamous
MIT research paper headline
claiming 95% of AI projectsfail.
LUIS (09:34):
Exactly.
The paper is not about that.
The title was an unfortunatechoice.
But the media ran with it andhundreds wrote articles based on
a misleading headline.
ELIZABETH (09:44):
So the hallucination
crisis is not really about AI
being unreliable.
It is about us finally noticinghow unreliable our information
ecosystem is.
LUIS (09:53):
You got it.
And that is actually a greatopportunity.
I mean, the discipline we arebuilding to manage AI, the
verification loops, thefact-checking protocols.
In addition to criticalthinking, those are skills we
should have been practicing allalong.
Okay, so what can organizationsand individuals do?
I call it the orchestrationlayer.
(10:16):
And it operates at threelevels: the individual skills,
organizational systems, and aparadigm shift in how we relate
to technology.
ELIZABETH (10:25):
We just covered the
individual skills in detail.
What about organizationalsystems and the paradigm shift?
LUIS (10:32):
This is the mental shift.
We have to stop treating AIlike an oracle and start
treating it like an apprentice.
ELIZABETH (10:38):
And this mental shift
has been a key element of our
success.
In 2025, we processed close to4 million tasks with AI agents,
saving over 1 million hoursacross eight organizations.
And we treated each agent likean apprentice.
LUIS (10:53):
Yes, and it all starts by
asking yourself, how would you
assign a task to a human?
We would not hand an apprenticea 200-page government report
without oversight.
We would assign bounded tasks,review their work, and verify
accuracy.
ELIZABETH (11:09):
But that takes time,
and I imagine everyone is trying
to move fast.
LUIS (11:13):
That is the trap.
Skip the management layer andyou end up in trouble like
Deloitte did.
ELIZABETH (11:19):
AI, without proper
management, cannot be trusted.
LUIS (11:22):
That is the lesson.
And it applies to Deloitte thesame way it applies to a student
or a manufacturing manageroptimizing workflows.
The same skills, the samediscipline, the same
orchestration principles.
ELIZABETH (11:35):
Okay, Luis, AI
hallucinates.
Maybe not in the future, buttoday it is a reality.
What do people actually do withthis information?
LUIS (11:43):
Start simple.
For example, take a responsefrom Chat GPT, validate it with
Copilot, cross-check withClaude.
That is your verification loop.
And organizations must investin skills development, not just
technology procurement.
ELIZABETH (12:00):
Building the
discipline, not just buying the
tool.
LUIS (12:03):
Exactly.
All of us need to raise ourstandards for information
verification.
ELIZABETH (12:08):
So the hallucination
crisis is forcing us to confront
something we have avoided.
LUIS (12:12):
Exactly.
We have tolerated humanmisinformation for years.
Now that AI is amplifying it,we we finally care.
Maybe this is our opportunityto build the discipline and
critical thinking skills weshould have had all along.
ELIZABETH (12:28):
Okay, Luis, what is
your one more thing for this
episode?
LUIS (12:32):
Here it is.
The next time you see somethinggo viral, a quote, a statistic,
a claim that sounds tooperfect, pause for five seconds
and ask yourself, did I verifythis?
Not because AI made it, butbecause verification is a
discipline we all need topractice.
ELIZABETH (12:50):
Whether the source is
artificial intelligence or
human intelligence.
LUIS (12:54):
Exactly.
And if you stop yourself beforesharing something you have not
verified, congratulations.
You just practice the sameskill that prevents AI
hallucinations from becomingreal world problems.
ELIZABETH (13:08):
Building that muscle,
one decision at a time.
And that wraps today's episode.
If this conversation resonatedwith you, share it with someone
you care about.
As always, you can ask ChatGPTabout ai4sp.org or visit us to
learn more.
Stay curious, and we will seeyou next time.