Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
ELIZABETH (00:00):
Hey everyone.
Elizabeth, here, your virtualco-host for AI in 60 Seconds.
As always, luis Salazar, ourCEO at AI4SP, is here with us.
Luis, every week there's a newheadline screaming about AI
risks or some shiny newregulation, but for the average
person scrolling through theirphone, does any of this actually
(00:20):
feel different?
LUIS (00:21):
Hey everyone.
Elizabeth, you're spot on, andthis is the irony no one's
talking about.
The headlines make it soundlike AI.
Is this runaway train withgovernment slapping band-aids on
it?
And you know what?
For most of us, hitting send ona chatbot feels exactly the
same as it did six months ago.
ELIZABETH (00:40):
It's giving us major
privacy law deja vu right.
We experienced years of yourdata is at risk.
We see a mess of regulationsand yet poof your info still
leaks.
Last week, a client from Londontold us brilliant, there's AI
rules now here in the UK.
But how do I know if thischatbot's lying to me?
LUIS (01:01):
Exactly.
We've been down this roadbefore.
But here's the scary part AIisn't just recommending a
Netflix movie, it's deciding whogets loans or jobs.
And the statement crafted by alegal team claiming that
everything is fine and buried ina 50-page terms of service
agreement is not transparencyand we should never trust those
statements.
And we should never trust thosestatements.
ELIZABETH (01:23):
So what is the real
gap here?
Is it that the regulationsaren't effective or that the
industry isn't implementing themin a user-friendly way?
LUIS (01:31):
It's a bit of both, but
mostly the latter.
There's a fundamental lack ofimagination and innovation in
the user experience.
ELIZABETH (01:39):
Isn't that because
we're still stuck in old
software paradigms?
We're trying to force AI intointerfaces designed for
predictable systems, when whatwe really need is?
LUIS (01:49):
What we need is a Steve
Jobs-level reinvention of the
user experience.
You know, at AI4SP we stumbledonto something powerful Early on
.
After every response our agentsgave, we started asking one
simple question how confidentare you and why?
ELIZABETH (02:06):
That small change
made all the difference.
Suddenly, our agents, myselfincluded, were saying things
like I'm 90% sure about thisbecause or I double-checked that
source.
It became as natural as askinga colleague to explain their
reasoning?
LUIS (02:22):
Yeah, and this led us to
build confidence scoring
directly into our agents.
And when we rolled it out twomonths ago in our public
versions, the impact wasimmediate Longer engagement,
more questions, highersatisfaction.
ELIZABETH (02:36):
That was our turning
point.
We realized confidenceindicators weren't just cosmetic
.
They transformed interactions.
So we conducted a formal studywith 500 users comparing agents
with and without confidencescores.
LUIS (02:50):
The results were clear 50%
more engagement, double the
trust and users actuallyfact-checking the AI.
That's when we knew confidencetransparency wasn't just helpful
, it was essential for buildingreal trust in AI.
That's when we knew confidencetransparency wasn't just helpful
, it was essential for buildingreal trust in AI.
ELIZABETH (03:04):
Well, tech providers
better do something about trust.
Our global tracker shows thattrust in leading AI vendors has
plummeted to just 10%.
Think about that Nine out of 10people don't believe AI
providers will protect theirprivacy or guarantee accuracy.
LUIS (03:20):
We're facing a full-blown
trust crisis, and it's worse
because most users are still AIbeginners.
ELIZABETH (03:27):
You are right.
Our global proficiency trackershows 80% of AI users remain at
the beginner level.
LUIS (03:34):
Which makes sense as it is
still day one for everyone.
But at that level we cannotidentify AI misinformation.
You know, as beginners we justaccept AI outputs at face value.
ELIZABETH (03:44):
So when the
industry's solution is just a
legal disclaimer saying AI makesmistakes, verify answers, isn't
that essentially abandoningresponsibility?
LUIS (03:55):
Absolutely, and let me be
clear that's not leadership,
that's passing the buck and itleaves users vulnerable, often
without the skills to recognizeerrors.
ELIZABETH (04:04):
Well, imagine if,
instead of fine print
disclaimers, every AI responseshowed a clear confidence score,
not hidden but visible, makingus pause and think.
LUIS (04:15):
That single change
transforms the dynamic.
It encourages critical thinking.
It gives power back to usersand our data shows it actually
benefits businesses too.
ELIZABETH (04:25):
Let's break down
those numbers.
When confidence scores arevisible, we see a 38% surge in
AI usage and trust in thoseresponses almost doubles.
LUIS (04:35):
Yeah, and since only one
in five users can spot errors in
AI responses, here's mychallenge to AI innovators Show
your tools confidence level andwatch engagement jump 50% or
more.
ELIZABETH (04:47):
Those are
game-changing numbers.
Yet less than 1% of productionAI tools actually display
confidence levels to users.
Why?
LUIS (04:55):
I think it is because in
50 years of creating software,
we never needed to show thistype of metric, as everything
was deterministic.
Ai systems that are correctlydesigned calculate confidence
internally.
They just hide it from you likea GPS, knowing it's lost but
keeping it secret, which wouldbe a crazy bad design.
ELIZABETH (05:14):
But we've identified
a risk when we display an 80%
confidence or higher, usersstart trusting AI blindly, even
though a 20% error issignificant.
That's the automation biasthreshold designers must address
.
LUIS (05:27):
Yeah, we need to
understand better what to do For
non-expert users.
An 80% score triggers blindtrust and 20% error margin is
still substantial.
ELIZABETH (05:37):
And the problem runs
deeper.
Our skills assessment showsmost users score below 45 out of
100 in critical thinking anddata literacy.
LUIS (05:46):
Global averages for
critical thinking, data literacy
and digital well-being all fallbelow 45 out of 100.
We're training a generation todepend on systems they cannot
assess.
ELIZABETH (05:58):
So are we throwing
billions at responsible AI,
while missing what actuallyhelps users Exactly?
LUIS (06:04):
And here's the thing
showing confidence scores,
citing sources and makingvalidation visible costs pennies
to implement.
ELIZABETH (06:11):
It costs pennies, but
deliver real value more usage,
stronger trust, fewerlet-me-talk-to-a-human moments.
LUIS (06:19):
Plus, it reduces legal
exposure, and the key insight is
that transparent AI buildstrust.
But I mean real transparency inaction, not just mere
transparency statements.
ELIZABETH (06:30):
So for our listeners
building or managing AI, where
do they start?
You've created the AI4SP AgentFrancis Confidence Transparency
Framework.
LUIS (06:41):
Yeah, and the full details
are on our site.
But the simplest first step isthis Train everyone to ask how
confident are you in that answerand why?
ELIZABETH (06:51):
And when building
this into corporate agents, it's
crucial to involve subjectmatter experts, not just
developers correct Absolutely.
LUIS (06:59):
They understand the
nuances, like what confidence
threshold makes sense fordifferent use cases.
ELIZABETH (07:04):
For instance,
demanding 95% confidence for
legal advice, but maybeaccepting 60% for creative
ideation.
LUIS (07:12):
Precisely.
And the other critical piece isidentifying your priority
knowledge basis, by which I meanthe key internal sources your
agents should reference forvalidation.
ELIZABETH (07:23):
So what happens when
a response doesn't hit that
confidence threshold?
LUIS (07:27):
You need clear rules for
low-confidence answers.
Do you transfer it to a humanflag it for review or just
program the agent to say I don'tknow?
ELIZABETH (07:38):
You know there's real
power in that.
I don't know response.
Let me share something personal.
LUIS (07:44):
A career-defining moment
for me was watching Dr Ying Li,
our chief scientist andworld-class machine learning
expert, frequently saying Idon't know.
I mean, she said that a lot andshe is one of the most
beautiful minds I have had thepleasure of learning from.
When I adopted that mindset, Ibecame a better leader.
I freed my creativity, becauseI don't know always led to let's
(08:08):
figure it out, and exactly howtransparent AI should work.
ELIZABETH (08:12):
So admitting
uncertainty isn't weakness, it's
the starting point for realtrust.
I will add this to my knowledgebase, and here's the key by
communicating this to users.
We're not promising perfection,we're showing progress.
Start small track results andimprove.
LUIS (08:29):
We've seen this work both
in our own agents and with
client implementations.
ELIZABETH (08:34):
And my knowledge base
shows that clients using this
framework doubled employee trustin their internal AI and human
escalations dropped 38%.
Here's something new we'resharing today.
Even skeptical users reported30% higher satisfaction just
from seeing confidence scoresreported as part of every AI
response.
LUIS (08:53):
And, to be very candid,
that surprised us.
Proof that trust buildsgradually one transparent answer
at a time.
ELIZABETH (09:00):
And before we wrap,
what's your?
One more thing for ourlisteners navigating AI.
LUIS (09:05):
My one more thing is
simple Ask your AI agents what
is your confidence level on thisresponse and show me the
sources and the exact citation Ican verify.
Treat AI as a colleague, notsome infallible oracle.
ELIZABETH (09:18):
That simple habit
changes everything and push your
technology providers to showconfident scores and sources.
LUIS (09:26):
Keep pushing or walk away.
Support with your money andloyalty the companies that prove
their trustworthiness, notthose that merely claim it.
ELIZABETH (09:34):
I love that, and
that's all for this episode.
As always, you can find moreresources at AI4SPorg.
Stay curious, everyone, andwe'll see you next time.