All Episodes

August 17, 2025 7 mins

Sam Altman promised GPT-5 would be “more reliable, with a lower likelihood of hallucinating.” But after trying it myself — like asking it to turn a photo of my daughter playing softball into anime (and it gave her two gloves) — I’m left asking: is this really progress?

In this episode of Thriving in Ambiguity, I break down:

  • Why GPT-5 feels like a necessary but rushed release

  • The real benchmark gains in coding and math (yes, the numbers are legit)

  • Why average users still aren’t impressed — especially when GPT-5 lags behind in social intelligence

  • The critical role of memory and why it’s the missing piece for AI to feel like a true partner

  • The human side: creativity, trust, and why your voice still matters in an AI-driven world

AI isn’t stuck — it’s in growing pains. And that’s where the opportunity lies.

👉 Share your experience: Have you tried GPT-5 yet? Did it deliver for you, or fall short? Drop a comment below.

📩 Connect with me: steve@thrivinginambiguity.com

Mark as Played
Transcript

Episode Transcript

Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Steve Mancini (00:02):
Look at this picture, Sam Altman said GPT five
would be more reliable with alower likelihood of hallucinating.
So I tried a simple real world test.
I asked it to turn a photo ofmy daughter playing softball
into an anime illustration.
The result, I don't think it quitecaptured the moment for those listening.

(00:27):
It's a shot of her fielding a bun atthird and then firing it over the first.
But the picture that came out, look at it.
It has gloves on both hands.
It came up with some randomname across the jersey.
I don't know, tigers.
And honestly, it really doesn'tresemble the original photo.
Now that might not sound like a smallthing, but this is the type of casual

(00:52):
request everyday users are going to make.
And when it misses in an obvious way,people wonder, how is this better?
So folks, welcome to Thrivingand Ambiguity where we make sense
of tech's biggest complexities.
I'm Steve Mancini, your host andpartner in navigating today's
toughest technology challenges.

(01:13):
So here's where I landedon G GT five first.
I think shipping GT fivewas a necessary step.
The product needed to be simplified.
Most people don't want to pickbetween model names or even understand
things like oh 1 0 3, mini 4.5.
So automatic modelswitching is real progress.

(01:34):
It removes friction and makes theexperience easier for casual users.
So that's a start and it's a great win.
But reliability is more than a tagline.
It's not enough to sayfewer hallucinations.
Reliability means when you give it anormal task, you don't shit to bed.

(01:56):
Two gloves is a mess.
Forget all of the other little detailsthat I completely forgot about.
And while I've noticed somesharper reasoning in places as
I've been using it, the realityis we're still far from perfect.
So was it rushed?
I don't know probably,and I understand why.
There is tremendous pressure tobe first and to continue momentum.

(02:18):
You don't want the market tothink that you've stalled.
We hear this all the time withapple's, lost its innovation.
But when the story is, it'ssmarter, it hallucinates less.
And then it still does strange things.
It undercuts trust.
The gap between promise and realityis where frustration creeps in.
And here's the thing on paper,GPT five is getting better.

(02:41):
Coding benchmarks like SWE Bench showreal gains in the real world coding test.
It scored about 75%.
Up from 69% in the last model and it'sway ahead of GPT-4, which was like 55%.
And in math, it crushed the amyexam at 94% accuracy compared

(03:04):
to 89% on the O three model.
And even on the HMMT another advancedmath test, it jumped over 93%, which is
a big leap from where it was in the mid.
Those are legitimate improvementsand for developers and engineers or
anyone pushing the technical limits,that's exciting progress, but that's

(03:26):
not most people, the average userisn't testing it on the math Olympiad.
They're asking it to summarizea report, draft an email.
Or in my case, make a funnypicture of my daughter.
And when it fails on those everyday usecases all the benchmark scores in the
world don't change that perspective.

(03:46):
And to me, that explains why G PTfive isn't getting the rave reviews
that O OpenAI probably hoped for.
In fact it tests reallypoorly on social intelligence.
Things like conversational nuance,empathy, and sounding more human.
GPT five scored lower than thecompetitors like Claude and Gemini.

(04:08):
So while it's smarter in technicalways, it's still struggling with the
simple human side of the interaction..
And where my mind goes next is memory.
Because without continuity, withoutit, remembering who you are, what
you're working on, and how you like tointeract, it's always gonna feel limited.

(04:28):
Now, OpenAI did take a step here.
GPT five has a new memory feature that canremember your preferences across sessions.
It can recall your name, your tone,and even let you customize how it
responds, whether it's casual oranalytical whatever you choose.
And you could toggle that off and on.
And you can even deletememories that it captures.

(04:50):
Under the hood, it's got a massive contextwindow now up to 256,000 tokens and even
400,000 when it's going through the API.
So that's a nice boost.
And that's real progress.
But to be honest, it still doesn't feellike memory in the way that people expect.

(05:12):
You could tell it's something today andtomorrow it feels like a blank slate.
It remembers pieces, but itdoesn't feel like it remembers you.
And that's what people want.
Continuity.
Not a new chat every time, but anassistant that carries context forward
that knows your projects, your style, andyour preferences the way a friend would.

(05:34):
And from my background in it I get it.
There's a lot of costpersistent memory at scale.
It's not trivial.
It's engineering, it's architecture, it'sstorage, it's privacy, it's compliance.
It's a heavy lift.
But if we want reliability,controlled memory is the future.

(05:55):
The other piece is integration.
Don't just chat with me.
Connect into my systems.
I wanted to understand what's happeningon my phone, my messages in my documents.
I wanted to have anchored answersin my data, so hallucinations
completely drop off.
That's when AI stops guessing andreally starts partnering with you.

(06:18):
I also want to share the human side ofthis, and someone told me on LinkedIn
this week, they said, I hate ai.
It's coming for my job.
And I understand that fear.
But here's what I said.
Human creativity still matters.
The difference between genericcontent and something meaningful
is the human behind it.
It's your voice, it's your perspective.

(06:40):
It's your lived experiences.
Those can't be replaced.
AI can make the grind easier, but theheart of it still has to come from you.
So here's my straight answer.
Is GPT five progress?
Yes.
Is it the revolution that they promised?
No.
It's not even close, but that's okay.

(07:03):
These are the growing pains of ai.
They're necessary, they'remessy, and sometimes they can be
frustrating, and it's still progress.
It's still moving forward.
So if you've tried GPT five, I'dlove to hear your experiences.
What worked for you, what didn't?
Contact me on LinkedIn or email medirectly at steve@thrivinginambiguity.com.

(07:25):
And until next time, keepembracing the change, leading
boldly and thriving in ambiguity.
That's all folks.
Advertise With Us

Popular Podcasts

Stuff You Should Know
My Favorite Murder with Karen Kilgariff and Georgia Hardstark

My Favorite Murder with Karen Kilgariff and Georgia Hardstark

My Favorite Murder is a true crime comedy podcast hosted by Karen Kilgariff and Georgia Hardstark. Each week, Karen and Georgia share compelling true crimes and hometown stories from friends and listeners. Since MFM launched in January of 2016, Karen and Georgia have shared their lifelong interest in true crime and have covered stories of infamous serial killers like the Night Stalker, mysterious cold cases, captivating cults, incredible survivor stories and important events from history like the Tulsa race massacre of 1921. My Favorite Murder is part of the Exactly Right podcast network that provides a platform for bold, creative voices to bring to life provocative, entertaining and relatable stories for audiences everywhere. The Exactly Right roster of podcasts covers a variety of topics including historic true crime, comedic interviews and news, science, pop culture and more. Podcasts on the network include Buried Bones with Kate Winkler Dawson and Paul Holes, That's Messed Up: An SVU Podcast, This Podcast Will Kill You, Bananas and more.

The Joe Rogan Experience

The Joe Rogan Experience

The official podcast of comedian Joe Rogan.

Music, radio and podcasts, all free. Listen online or download the iHeart App.

Connect

© 2025 iHeartMedia, Inc.