Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
(00:00):
Hello and welcome to a WeakenedNews episode of the Leveraging
(00:03):
AI Podcast, the podcast thatshares practical, ethical ways
to leverage ai, to improveefficiency, grow your business,
and advance your career.
This Isar Metis, your host, andbefore we dive into the main
topics today and the rapid firetopics today, I have some
exciting news.
The first piece of the excitingnews is that we just hit 250,000
(00:23):
downloads for this podcast thisweek.
This is an insane amount that Inever dreamt that I will ever
get to, and it's all thanks toyou.
So I appreciate every single oneof you listening to this
podcast, sharing it with yourfriends, sharing it on social
media, et cetera.
I cannot be more grateful thanhaving you as listeners of this
show.
But to make it even moreinteresting, I wanna make the
(00:44):
show even better.
I wanna adapt it to your needs.
And hence what we decided to dois to create a survey, which
will take you less than a minuteto take the survey and it will
give us feedback and what youlike and don't like in the show.
What would you like to add,change, remove, improve, et
cetera.
And so in your show notes rightnow, if you open your phone and
click in the show notes, youwill have a link to the survey
(01:07):
and we would really appreciateit if you will spend the minute
or maybe two to take this surveyand help us make this podcast
better for you.
Now to today's episode, thereare two main topics that we're
going to talk about.
The first is the TED appearanceof Sam Altman and everything he
shared over there.
The second ChatGPT growingambitions in the coding world,
(01:27):
and then there's gonna be a lotof rapid fire items.
Many of them about open ai.
They've been on fire this pastweek with releases and news and
things that they're doing, andapparently there's more coming.
So we probably could have donetwo news episodes just about
open ai, but we're gonna try torun through that very fast.
And there's obviously a lot moreupdates from other companies as
well.
So let's go on April 11th, aweek before this podcast goes
(01:53):
live, sam Altman has been aguest on Ted interviewed by head
of Ted Chris Anderson on themain stage, and it was a very
interesting conversation thatrevealed a lot on how Sam thinks
about himself, about the futurewith AI and so on.
I don't think I've heard anymajor new things about it, but
there's a few topics that Ireally want to cover and dive
(02:15):
into that are very, veryinteresting.
Chris pushed Sam on severaldifferent topics and Sam wasn't
happy about it, but yet he wastrying to give answers, at least
in some cases, and I will getinto that in a minute as well.
So he pushed him as an exampleon the topic of what's happening
to creatives.
Is it okay or not okay to quoteunquote.
(02:36):
Steal their work and use it forother purposes.
And what would be the model tocompensate people for their
creativity?
And Sam basically said that itwould be really nice to figure
out a new model to compensatecreatives, but he couldn't
suggest even a single idea howexactly that would happen.
He just thinks it's notnecessarily a bad idea.
(02:56):
And that would be the theme ofthe rest of the conversation.
He was pushed very hard aboutfuture safety of AI models and
even responded to Chris andsaying, Hey, you're not really a
pro AI guy, right?
And Chris actually said, no, Iactually use AI every single
day.
I think what you're doing isremarkable, but I'm terrified
and again, Sam provided manygeneric answers to these
(03:17):
questions saying, oh, it's gonnabe okay.
We believe that like in previousrevolutions, everything is going
to be fine.
People find what to do.
The human race figures it out.
There was no concrete, here'sour plan or here's what we're
trying to do, or here's ourcollaboration with groups 1, 2,
3, in order to achieve thesethings that he believes or
wishes or assumes that are gonnabe well.
(03:41):
They also talked about, theirpushed towards new open source
participation.
We talked about this in severaldifferent past episodes
recently, but, they are planningto release a, and now I'm
quoting very powerful opensource model and Sam, while he
was admitting there late to thatgame, he is claiming that it's
gonna be near the frontier andthat it's gonna be better than
(04:01):
any other open source model outthere.
That's a very significant claim.
I haven't heard that one beforethis particular interview.
Now another statement that Samsaid in that interview is that
they are now reaching about 10%of the global population.
So these two announcements putsthem at around 800 million
users, which is an absolutelyinsane number.
(04:21):
With regards to personalizationof AI and specifically the new
memory features in ChatGPT samhad a very interesting quote,
which is, but you will talk toChatGPT over the course of your
life and someday, maybe if youwant, it will be listening to
you through the day and sort ofobserving what you're doing and
it will get to know you and itwill become this extension of
(04:44):
yourself, this companion, thisthing that just tries to help
you be the best, do the best youcan.
So the vision that Sam Altman inOpenAI has for these models and
specifically for ChatGPT, issomething that will record
literally everything that we do,and we'll be able to pull the
relevant information at theright time to help us make
(05:05):
better decisions and take betteractions.
Now, I am like everything elsewith ai, really excited and
really terrified with thisvision, and you start
questioning a lot of things likeprivacy and our ability to make
our own decisions.
Not being driven by an externalforce that may or may not be
influenced by different things,but the fact that this is a
(05:25):
direction they're going and thatthe fact that they're actively
pursuing that and not even beingshy about it tells us that this
is where they're trying to go.
They're trying to get to asituation where this thing has
limitless memory and canliterally capture everything
that we do and use thatinformation in order to help us
do better things.
Now, in addition, he was askedabout AGI and Sam said that
we're not at AGI yet, and themain reason that he gave is that
(05:47):
these models cannot learn ontheir own and cannot make
improvements without us helpingthem versus humans who can't do
that.
He also said that it's veryobvious that while these tools
are getting very good, theycannot yet do everything that a
person at a desk can do.
And hence, we're not at AGI yet,which again, is a system that
will be able to do everythinghumans do, at least on the
cognitive level, at or abovehuman level.
(06:09):
So if they cannot do everythingthat we can, we are not there
yet.
That being said, Chris asked himspecifically about, okay, what
is the internal definition ofAGI?
And they don't have one.
The answer was that if you put10 engineers in the room, you're
gonna get 14 differentdefinitions of what AGI is.
And so despite the fact that'sthe holy grail that they're
chasing and beyond, which issomething that Sam clearly
(06:29):
stated, that AGI is not now atarget.
It's just somewhere on thetimeline we're gonna get there
and then we're gonna cross that.
And it's not the final targetanymore, but they don't have a
very clear definition of whatthat is.
When it comes to agents andsafety, again, Chris pushed him
pretty hard and same came backand basically said that he
understands that safety is aconcern, but he also understands
(06:50):
that without safety, they don'thave agents.
Basically what he's saying, ifagents are not going to be safe,
nobody's gonna use them, andhence they won't be able to be
developed further.
So his claim, again, is more ofa logical explanation to why he
thinks agents will have to besafe versus actually explaining
why they'll be safe or how theyare, or what steps they are
(07:11):
making in order to make themsafe.
So not.
Great answers in my eyes, and itwill be very interesting to see
how that evolves more on thatfrom many different angles in
the rest of this episode.
Now, when he was pressured aboutall the people who left OpenAI
with safety concerns and spokeloudly about it, Adnan basically
acknowledged that there isinternal disagreement with
(07:32):
levels of the required AIsafety.
But he's saying that if we goback and if we look at their
previous track record, when itcomes to delivering safe AI
systems, their track recordshows that they're doing a good
job.
Now, while I agree with thestatement that so far their
systems have been safe, that isnot a logical argument to use in
this particular case because AGIand beyond will have
(07:54):
significantly more powerfulcapabilities than anything we
know today.
They will be able to manipulateus in ways we cannot even think
about.
So it's basically saying, Hey,we were really good in keeping
safety when we had bullets sowe'll be safe with atomic
weapons.
That's basically the argumentthat he's making, which I don't
buy.
I think the fact that previouslythey were able to do things in a
(08:15):
safe way means very little totheir ability to deliver safe,
really advanced AI systems oncewe get to AGI and beyond, and it
really, really scares me thatthat's the answer that he was
able to provide.
Another very interestingstatement that I heard him say
for the first time is that ingeneral, he believes that moving
away from centralized controland decisions is a good
(08:36):
direction to go.
And I'll be more specific whatmight mean by that.
He was asked, should there belike a summit or a committee
that will be a collaborationbetween leading companies and
governments to decide and helpdefine a better future for ai.
And he basically said that maybehaving a small elite summit is
not the right way to go.
Maybe allowing 800 million usersthat they have right now, more
(08:57):
or less express what they thinkis right and wrong versus a
select few and following theirdefinition now, he acknowledged
that that's never been asuccessful way in the past.
But I tend to think that there'sa happy median between these two
approaches.
One of saying, okay, these 10,12, 50 people will decide the
future of AI and humanityversus, okay, let's involve a
(09:20):
hundred million people, 200million people, and have them
have a word on where they seethe future is going, what they
would like to see, and I thinkthey have the opportunity to do
that right now.
I think it could be very simplefor open AI right now to put a
one question, two questionquestionnaire once a week to all
the users that you have toanswer in order to continue and
learn what people actually want,what they're afraid of, what
(09:42):
they would like to slow down anddo the same thing in all the
other platforms on Gemini andClaude.
Then deep seek, et cetera.
And then we can learncollectively because we have the
access to the masses, whatpeople actually want, what they
don't want, and maybe then havea summit to discuss these
findings and continue in theright direction.
Do I see that happening?
No, but I think it's notnecessarily a bad idea.
(10:02):
Sam was also asked about theimpact of being him, becoming a
parent on his decisions, and hebasically said that having a
child has profoundly impactedhis personal approach to life.
But he was as committed to thesafe delivery of AI even before
having a kid.
And speaking about parenthoodand the views of Sam, about
where the world is going withai, he had two profound quotes,
(10:26):
or I dunno if they're profound.
I mean they're logical on onehand and very interesting.
On the other hand.
Quote number one said, my kidshopefully will never be smarter
than ai.
Basically, he's anticipatingthat in the near future, AI will
be smarter than any of us,including his kids, and if they
have his genes, they're gonna besmart kids or individuals in the
future.
The other very interesting quotewas, I hope that my kids and all
(10:49):
of your kids will look back atus with some like pity and
nostalgia and be like, they livesuch horrible lives.
They were so limited.
The world sucked so much.
I.
He truly believes that AI willenable a future of abundance,
like nothing that we have today,both in means of personal reach
and capabilities, as well asoverall success of humanity and
(11:10):
the planet.
And while this is great, itmight be blinding him from the
difficulties and the risks thatare actually there.
And I don't think he isoverlooking them.
I think he's looking at them,but I think he deeply feels that
the benefits will overcome therisks and the dangers, which I
personally don't necessarilyagree.
And now to our second topic,which is opens ai, extreme push
(11:33):
towards leadership in the codedevelopment with ai.
So we spoke many times on thisshow about the trend of vibe
coding, of basically usingnatural language, in this
particular case, English, tocreate code across multiple
platforms.
This phrase I've been coined byAndres Cari, who's a former open
AI engineer, but it caught likewildfire in the world and the
(11:54):
trend of writing code with justsimple English is growing
aggressively both in thedevelopers communities as well
as people like me who have neverwritten code ever before and can
now create applications more onthat.
In a future episode in the nextcouple of weeks about an
application or applications thatI'm developing and you will get
to experience, in the next fewweeks, I'll give you a little
(12:14):
hint.
It will allow you to find anyinformation you want from any
past episode of this podcast andget answers about anything that
happens in the news or specificworkflows or specific
technologies or tools, anythingthat was mentioned in the
podcast, get a answer, but alsoget a link to listen to that
particular segment in a specificepisode.
It's already in testing and I'vedeveloped it myself using
(12:35):
English only.
And like I said, I will have awhole separate episode about
that, but OpenAI has beenpushing very, very hard in that
direction.
And in this past week, they madetwo huge steps to world
dominance in that aspect.
One is they've introduced fourdifferent models, two in the
GPT, 4.1 family, so GPT, 4.1,GPT, 4.1 mini and GPT, 4.1 nano,
(12:59):
as well as O three, the fullmodel.
So far we only had access to Othree mini.
The GPT-4 0.1 model family isgeared specifically for
developers.
It's available through the API,allowing developers to use them
and integrate them in everythingthat they want.
They excel in every code task,so they outperform every other
chat GPT model before by a bigspread in coding tasks.
(13:21):
And they come with 1 milliontokens context window, that's
750,000 words of code ish.
So huge jump in context windowaligning with some of the
leading competitors right now,Gemini 2.5 Pro has a 2 million
tokens context window, and LAMAfour, which was just released
last week, has a$10 millioncontext window, but I don't know
(13:43):
how good it is in coding, and sodefinitely a huge jump forward.
Now, the other very interestingthing about these models is that
they can manipulate images intheir reasoning process.
Meaning when you want to createcode or do anything else in that
matter, you can upload imagesincluding diagrams, charts, et
cetera, notes that you've takenon a piece of paper, on a back
(14:03):
of a napkin, flow charts, thingslike that.
And the AI knows as part of itsreasoning process to zoom in,
zoom out crop, look at specificsegments in order to understand
better the tasks ahead.
Think about how powerful this isto developers when you can
upload your Kanban board as isor a flow chart that you've
created in order to describe theuser interface or the user flow
and stuff like that.
(14:24):
And your system actuallyunderstands that and can write
code accordingly and breakthings in the right components
based on that.
So extremely powerful capabilitythat is now available in GPT-4
0.1 Now, if you remember a fewweeks ago, we shared with you
the huge growth in Claude'sincome and they have grown
dramatically in this past yearthat allows them to raise a lot
more money right now.
(14:45):
And the majority of their growthwas coming from API connections
to the main code writing toolsbecause most of the developers
in the world right now preferClaude Sonnet 3.5 and 3.7 over
any other tool.
So most of these code writingtools allow you to pick which
large language models runs inthe background and Claude
Sonnet, 3.7 and 3.5 were theleading models which were
(15:06):
driving a huge income to Claude,which is one of the reasons why
OpenAI wants to be moreaggressive in that game, and
hence the introduction of thesenew models.
If you remember, they saidthey're not going to release
more models until we get to GPTfive, and now instead of having
GPT five, we have GPT oh threeand GPT-4 0.1 in three different
variations.
So that doesn't count as we'renot gonna release any other
(15:27):
models.
But again, these are gearedspecifically for developers.
To show you how that competitionis driving them to do stuff that
we may or may not agree with.
This is the first time that theyreleased a model without a
safety card.
So these system cards that openAI has released every time they
released a model are there toprovide transparency to
basically tell us what measuresthey took from a safety
(15:47):
perspective, from a testingperspective, to verify that
these models are okay and theyhave very little risk, if any.
And yet the 4.1 family of modelswere released without a safety
report.
Now when OpenAI was approachedabout this, they basically said,
well, this is not a frontiermodel.
Hence it's not a necessity.
Basically saying, don't tell uswhat to do.
We will decide when to do safetychecks for models and when not.
(16:11):
again, I'm not sure I agree withthat, but it's very, very
obvious that the fiercecompetition in this field is
pushing companies to do stuffthat is beyond what should be
acceptable from a safety andsecurity perspective.
And we'll have another point onthis later, but I want to
continue on the topic open AIand its push in the coding
world.
So there's already a lot ofexamples, mostly people posting
(16:32):
on X saying how powerful thismodel is, that it's game
changing and how it writessignificantly faster and
significantly cleaner and bettercode than any other ChatGPT
model before, but also betterthan most other models out
there.
So definitely a big improvementin code writing for ChatGPT, but
that's not the big coding newsfrom OpenAI this week.
The big news from OpenAI thisweek is that there are rumors
(16:54):
that they're in advanced talksto acquire.
Windsurf for$3 billion.
So let's go back for a second tothe whole topic of vibe coding
and the platforms that enableit.
These platforms are divided intotwo main categories.
One is platforms that enablespeople who do not write code to
create applications and writecode.
These are the types of toolsthat I use like Lovable and Rep
(17:17):
and several different others.
But there are tools who arebuilt for developers allowing
actual software engineers towrite code faster, better, and
do it in a more efficient way.
The leading one in the worldthat really took the coding
world by a storm is cursor andapparently OpenAI were trying to
acquire cursor first.
Now putting things inperspective, cursor is now
(17:37):
assumed to have$200 million in aRR while Windsurf.
The company who will probablyget acquired has 40 million in a
RR, so five x bigger.
Now Bloomberg reports that OpenAI met and had conversation with
20 other AI coating firms beforemoving in that direction.
So they are definitely going toacquire somebody and it seems
(17:58):
like it's going to be Windsurf.
Windsurf was founded in 2021.
They raised$243 million so far.
And like I said, they're one ofthe most popular coding
platforms for actual codewriters and code developers.
And I think this is a veryinteresting move by OpenAI to
basically have an integration ofthe AI coding tool and the
(18:18):
audience combined with itsability to drive the models
behind the scenes.
It will not surprise me if thatwill be followed by other
leading AI companies buyingother leading coding platforms.
It just makes sense because thisis now the most advanced, the
most widely used application oflarge language models beyond the
large language model themselves,and being able to acquire that
(18:40):
allows you to control that partof your destiny versus letting
people pick whatever they wantto pick.
Now, this may or may not gosmoothly because there's several
different reports that the FTCmay raise concerns about this
given two different aspects.
One is open eyes close ties toMicrosoft who controls their
side of the development worldand open OpenAI themselves
previous investment in cursorthrough their startup fund.
(19:01):
So the output is that if theyacquire windsurf, they will have
their hands in the three top AIcoding platforms in the world
today, which may or may not beacceptable by FTC regulations.
Now, in addition to these newmodels and the acquisition,
OpenAI just released Codex CLI,which is a lightweight open
(19:21):
source coding agents that runsdirectly in terminal
environments.
It works obviously inconjunction with all three and
all four mini and it's anotherstep in the direction of agentic
software engineer that cannotjust create the code, but can
also deploy the code directlyinto the projects as required.
Now because it's based right nowon the O four solutions and the
(19:42):
O three, the tool supports themultimodal inputs that we just
discussed earlier, allowingusers to provide screenshots and
sketches alongside with text inorder to enhance the
understanding of what you'retrying to develop with the code.
Now users can control the levelof autonomy in this tool from
approval mode.
Meaning just show me what you'retrying to do.
Let me approve it all the way tofull auto approval modes where
(20:02):
it was just going to deploy thecode that it creates as it feels
necessary.
Now from a security perspective,codex CLI maintains privacy by
running everything locally onyour machine.
Rather than calling two remoteservices for every prompt that
you write.
And to encourage the adoption ofthis system.
OpenAI set aside$1 million inAPI tokens to eligible software
(20:23):
development projects in blocksof 25,000 each.
So they're gonna put money intocompanies who are going to use
these capabilities in order todrive adoption.
Again, showing you how all inthey are on this topic.
So in quick summary, OpenAI isall in on the coding world, both
in means of developing modelsspecifically for coders as well
as spending a lot of money onacquisition to get a significant
(20:46):
lead in that aspect of AI usage.
They definitely have the money,they just raised$40 billion or 3
billion, is a small change forthem.
but it will be very interestingto see how that evolves from
here, both in means of theirintegration with windsurf, if
this actually happens.
And obviously what are going tobe the implications for the rest
of the industry.
And now two rapid fire items.
This past Monday, OpenAIannounced that they're going to
(21:08):
phase out GPT-4 0.5 and beingreplacing it in their APIs with
GPT-4 0.1.
And that is not surprising atall.
If you've listened to thispodcast for a while, then you
know when they came out withGPT-4 0.5, I said it made
absolutely no sense that theyreleased that model.
It wasn't necessarilysignificantly better than 4.0
and it was significantly moreexpensive for them to run.
(21:29):
And on the coding side, itdefinitely wasn't doing any
better than the previous models.
And so now that they know a lotof the API side of things is
going towards coding and that4.5 cost them a Fortune and 4.1
actually delivers betterresults, they're going to phase
out 4.5 from that aspect.
To put things in perspective,when I'm saying it's
significantly cheaper, it's abetter deal for you as well.
(21:50):
So GPT-4 0.1 is going to cost$2per million tokens of input.
The same thing will cost you$75instead of two if you use GPT-4
0.5.
And on the output tokens, it's$8for million tokens on GPT, 4.1
and$150 on GPT-4 0.5.
And that is, while OpenAI saidthat at the pricing of four of
(22:13):
4.5 that they put out, they'restill losing money.
So they built a model that wasextremely inefficient.
And I assumed they distilledthat model in order to create
4.1 or maybe use other tools.
But they're going to phase out4.5 and just keep 4.1 on the
API.
A cool feature that was added tochat GPT this week is because of
all the craze with the images,they added a image library.
(22:35):
So on the left menu just belowExplore GPTs, there is now a
library segment that shows youhow many images that you've
created with ChatGPT since thelaunch of the image generation,
and you can click on that andsee all your images in a
gallery.
I find this very useful and verycool.
And since OpenAI sees the crazethat this has generated, they
(22:56):
are apparently developing asocial network built around the
image generation capability.
So the project is still in earlystages, it's only in internal
prototype, and it focusesspecifically on creating a
social feed around AI generatedimages.
They saw the craze that happenedon traditional social media with
(23:16):
their image generation tools,and they said, why not ride that
wave and have somethinginternally?
And now they gain severaldifferent benefits from it.
First of all, they get to pokemeta and X to companies that are
their direct competitors in thisfield.
Not to mention sticking it toElon Musk, in his own game, but
in addition, it will providethem data to train their models
(23:38):
similar to what Meta is doingwith their platforms, as well as
X is doing with its platform.
So you can see that thisactually makes sense.
Now, does it have a chance ofbeing successful as a new social
platform?
I don't know.
What I do know is that there'san insane craze right now around
ChatGPT we just said they got to800 million global users and
they're growing at an insanespeed and everybody's
(24:00):
downloading their app and addinga social feed as a feature where
people can see other images thatother people are generating and
comment on them and share themmight actually work.
Now, speaking of trends fromChatGPT that are taking over on
social media, there's a newtrend after the Ghibli insanity
then the action figures craze.
Now people are doing reverselocation search on images.
(24:23):
So many people have sharedtaking images from multiple
sources, uploading to ChatGPTspecifically oh three, and
asking it where the picture wastaken.
And it's showing really goodresults at understanding from
small nuances and things in theimage where the picture was
taken based on landmarks ordifferent hints that it finds in
the images.
(24:44):
While this is really cool, itobviously raises serious
concerns because any picture ofyourself that you put online,
let's say on social media, cannow be used to tell people where
you were at the time that youwere taking the image.
It's also showing you howpowerful these tools are right
now to understand nuances andlittle hints from images to get
a much better understanding ofmore things than we probably
(25:06):
know.
Those of you who have seen thevery first demo of Gemini or the
Google female employee wasdemoing Gemini Live and she was
wearing the glasses and shelooked outside the window and
all you could see is rooftops ofother buildings around.
And she asked Gemini where itthinks she is and it told her
that she's probably at King'sCross in London, which is where
she was.
(25:27):
So these tools exist, thiscapability exists.
It's not exposed in a formalway, but it's there and you can
use it.
And it will be very interestingto see now that people know that
what they develop from an APIperspective, what other
applications can be developed tobenefit from that capability.
If you think about the negativestuff, there's a lot of negative
stuff, but if you think aboutthe positive stuff, allowing you
to ask where you are in order toget better navigations, if you
(25:48):
don't know where you are or howto find the one thing that
you're looking for in a streetwhere you don't have the exact
address or stuff like that, justby seeing what you're seeing,
that's the direction where it isgoing.
On the last two items fromOpenAI, I want to touch on the
hallucinations and mistake sideof things because I think it's
very important to touch upon it.
According to OpenAI internaltesting, all three and all four
mini models that were justreleased hallucinate more than
(26:11):
both company's previousreasoning models and traditional
models.
So GPT-4 oh and oh one Mini andoh three mini, to make it even
more confusing or lesspromising, open AI themselves do
not know why these models havehigher levels of hallucinations.
And they said, and I'm quoting,more research is needed.
So they do not understand whywhen they add additional
(26:33):
capabilities to the models, theyget additional hallucinations.
By the way, in a pretty bigspread.
So on a benchmark called personQA oh three hallucinated, 33% of
the questions while oh one andoh three mini hallucinated only
16 and 14%.
So it doubled the rate ofhallucinations in this model.
That's obviously not a goodthing, and as of now, it doesn't
(26:55):
seem that OpenAI has a goodsolution for that.
Now, on a more interestingaspect of this, OpenAI
researchers published a paperlast week revealing that their
deep research technology, whichI started using all the time,
not just from ChatGPT, but fromthe other providers as well,
they found two very interestingfacts.
They were testing it against avery hard benchmark, and the
(27:16):
idea was to compare it to aprevious technology, just GPT-4
oh versus deep research, and B,compare it to actual human
researchers in those same reallydifficult tasks.
And what they found that deepresearch did dramatically better
than humans and a wholedifferent level from the old
tools.
So the old tools basicallyfailed miserably.
If you just gave GPT 4o one ofthese tasks, it couldn't
(27:39):
complete any of that researcheffectively.
Humans, many of them gave up on70% of the questions after two
hours of effort and not findinganswers and for the people who
did continue all the way, stillhad 14% incorrect answers.
Now, deep research got answersto all the questions, so they
didn't give up on any of them,but it only came back with
(27:59):
correct answers, 51 point apercent of the time.
Basically just a smidge accuratethan flipping a coin on the
right answer on something.
Now that's not very promising,but to be fair, this is on the
hardest a set of questions thatthey could find that, again,
took human researchers hours toactually complete.
So what do I think about this?
Personally, I started becomingmore and more dependent on these
(28:21):
deep research tools.
It is very alarming to me tofind out that they could be
wrong 50% of the time.
So what this tells us is thatwhile these tools are incredible
and they really do amazingresearch, you still need to go
and click on the links of thesources that it provides you and
verify the information.
So the research is not gonnatake you just five minutes.
It's probably going to take you25 minutes to actually go
(28:43):
through the relevant links andsee what's reliable information,
what's not.
But it's still gonna save you afew hours of doing the research
yourself because in many cases,these websites will check 70,
80, a hundred, 200 websites inorder to get you the answers,
which if you have to do on yourown, will take you hours of
work.
As I mentioned, there's been alot of other news from OpenAI
that will move to the rapid fireitems.
So let's dive into those.
(29:04):
First of all.
According to App Figures,ChatGPT's app surge to 46
million downloads in March 20,25.
A 28% jump from February andmaking it the number one
globally downloaded app in theworld overtaking Instagram and
TikTok.
Now, the interesting thing is,what drove that madness is
obviously their new imagegeneration tool and all the
(29:26):
Ghibli style images and then theaction figures madness that
happened shortly after that,which is ridiculous to me
because there are so much betterreal value actual use cases that
you can use ChatGPT for in yourpersonal life and in business.
That gives you many way morereasons to download the app and
use it regularly than creatinganime style images.
(29:46):
But I guess it doesn't reallymatter because that drove this
complete craze of downloads ofthe ChatGPT app.
And it is amazing to see theirbrand dominance where basically
ChatGPT became synonym with ai.
It's just like Google becameGoogle in search in the two
thousands, almost the same exactway.
People just know ChatGPT most ofthe people in the world are not
(30:09):
like me or you, the listeners ofthis podcast that know multiple
chat platforms and use them indifferent use cases.
Most people just know ChatGPTand that's the only thing
they're using, and it's anincredible brand power that
OpenAI has right now over anyoneelse.
An interesting piece of newsthat is actually a good news
from my perspective when itcomes to using open AI's APIs.
(30:30):
OpenAI is planning to implementa new verified organization
process for you to get access totheir most advanced models.
So this is not gonna impact themodels that exist right now, but
it will be required for futuremodels.
And that means that anyorganization who wants access to
these future APIs will requirean approved government issued ID
from one of the countries thatis supported in the open AI's
(30:52):
API, and without that id, youwon't be able to get access to
these tools.
I think this is a great step inthe right direction.
I don't know if you'll solve allthe problems because there's
many bad people and bad playerswithin countries with actual
legit ideas.
But at least it reduces the riskof the wrong people and the
wrong groups using these reallypowerful APIs for the wrong
things.
Now, this might also be as aresponse to what happened or
(31:15):
presumably happened with deepseek when they were more or less
scraping or if you want theprofessional word, distilling
their models by basicallyrunning it against their API.
That was probably blocked inother ways before, but having an
approved government ID as aprerequisite to getting access
to advanced models, I think is agreat step in the right
direction of higher securitywhen better models come out.
(31:36):
Now, I told you before that I'mgonna come back to the safety
risk that is driven by the crazycompetition in the AI universe.
Well, OpenAI just updated itspreparedness framework and
they're stating that they mayadjust their safety requirements
if competing AI labs releasehigh risk systems without
similar protections in place.
Now they are claiming to be fairthat they're only gonna make
(31:58):
these adjustments afterrigorously confirming that the
risk landscape has actuallychanged publicly acknowledging
that they're making thoseadjustments and that these
adjustments do not meaningfullyincrease the overall risk of
severe harm, and still keepingthe safeguards in place.
So in the Ted Talk interview ofSam, Chris made an interesting
(32:20):
statement.
He basically said that all theleading labs because of the
competition are basicallysaying, we gotta run the fastest
that we can because it'sinevitable that somebody will
get there.
And if we get there first, itwill be safest because we're
safer than everybody else.
And he was questioning the.
Inevitable aspect of thatscenario?
what this is telling us that itis inevitable.
It is inevitable because thelabs themselves are claiming on
(32:43):
paper in their formal securityguidelines, that if other labs
crosses the line, they willcross the line too.
And this is not even spoken in,closed rooms and behind closed
doors.
This is their formal statementbasically saying if other labs
move forward with less cautious,we will do the same.
This is really scary to mebecause it basically means that
(33:05):
this competition can drive usover the edge and all the
leading labs will play theirrole in that scenario.
In another interestingacquisition by OpenAI, they just
hired the team, not the companythat started becoming a norm in
the AI world.
They acquired the team ofContext ai.
Context AI is a startup that wasfounded in 2023 by two former
(33:25):
Google employees.
And what their tool allows todo, is to look into the actual
doing of AI language models,understand them better, and use
that to develop better and saferAI solutions.
So OpenAI just acquired the teamand obviously all their
knowledge, the company's gonnadissolve its previous
operations.
And Scott Green, one of thefounders, is now product manager
at OpenAI Building evals.
(33:46):
Those of you who don't know whatevals are, evals are the ability
to evaluate what the AI isdoing, and evals are a
requirement to building highperforming AI applications, but
they're very hard to get rightnow because it's quote unquote a
black box.
And what their tool allows to dois to look into that black box
and have a better understandingof what it is to evaluate how
the AI is working to developbetter and better AI systems
(34:09):
faster.
Like I said, this could havebeen a whole episode about just
open AI and ChatGPT, but there'sa few interesting news from
other companies.
The biggest one comes fromMicrosoft and they just unveiled
three really interesting things.
The first one is reallypromising for the future, and
they have released a one beat,really small model that can run
effectively on A CPU.
So it's a very powerful, reallylarge model that actually
(34:30):
provides leveled results withLAMA 3.2 and Google's Gamma
three.
And Alibaba's Quin 2.5.
So not the latest and greatest,but just the level before that
on a very small footprint thatcan run.
Locally on smaller computer withsignificantly less memory.
Why do I think this ispromising?
I think this is promisingbecause the world's compute and
(34:51):
resources are now all beingdrained in order to support
future AI models.
And if we will find ways toscale models not with that level
of compute, I think the planetwill thank us.
Now on the more practical thingsthat you can start using right
now.
Microsoft released two bigreleases this past week.
One of it is AI agents that cancontrol your computer.
So the concept of computer usethe same as we got from Claude a
(35:14):
while back and from OpenAI a fewmonths ago.
You can now use copilot studioto build these agents that will
take over everything on yourcomputer, so anything in your
browser as well as applications.
Basically, it looks at the userinterface and can perform
anything that humans can performon the screen.
Now I'll say two things aboutthese kind of agents and
computer use solutions.
(35:35):
One, I think in the long runit'll open the door for a lot
more automation and willprobably eliminate completely
any tedious tasks that we haveto do today.
So from that perspective, it'sreally good.
Two, any person who has tried touse the full copilot studio
knows it's an absolutenightmare.
So first of all, they have twoproducts called Copilot Studio
One is basically custom GPTwrapped as a Microsoft product,
(35:57):
the other is their old schoolautomation tool with some more
AI capabilities, which isabsolutely impossible to use.
It's the worst user experiencein the world.
I consider myself an advanceduser, a techie and a geek, and
every time I try to buildsomething there, I have to.
And every time I try to buildsomething there, it's very hard
to do and I gave up more thanonce.
(36:18):
So I really hope that part ofthe process is allowing the AI
to create the AI so I won't haveto fight the tools that they
have created.
so I'm not sure how theimplementation is on the
Microsoft side.
It will be very interesting totest and I'll keep you posted as
I do that.
The other thing that I will saythat there's a huge risk in
allowing these tools to takeover your computer because you
don't know what they will do.
You don't know when they'regonna go crazy.
(36:39):
You don't know whether onpurpose maliciously or just by
random fluke mistake, they willdo stuff like changing your
passwords or locking you outfrom specific pieces of
information that you actuallyneed or anything else that you
don't want that would happen onyour computer or on your company
network.
And what I have done in order totest these kind of agents is
I've actually created a virtualmachine on Google Cloud.
(37:00):
I'm using that to run a browserand I'm testing everything in
that virtual machine.
So if this goes rogue, thennothing happens to my real
universe and I can safely testdifferent use cases, including
different agent capabilities.
I highly recommend that toanyone, and I will create a full
episode about this, showing youexactly how I did it so you can
do it as well.
(37:20):
The other thing that Microsoftreleased this week is they made
Microsoft copilot visionavailable to anybody on Edge.
So vision is if you want asubset of agent, it can see
everything on your screen.
It just cannot take actions.
And this was available for thepaid users of copilot and now
any user of copilot, includingthe free users can show the AI
(37:42):
everything on the edge browser,and you can even activate your
microphone and talk to it.
That will allow you to basicallyask the AI about anything that
you're doing, whether it'semails or spreadsheets or
research or anything that's onyour screen you can ask about
and collaborate with ai.
I've done this multiple timeswith Gemini experimental
capability because that's beenavailable for a while.
It's been working, I would sayabout 50% of the time
(38:04):
effectively, and the other 50%it crashes before you get to a
solid outcome.
That's at least on the Geminiplatform.
You can also do this on open ai,but only on their mobile app,
which is weird to me because itwill make a lot more sense to
have it on desktop, either onthe browser or on the desktop
app.
But right now on OpenAI, it'sonly available on the mobile
app.
But anyways, it's now availablefor Microsoft users.
(38:25):
anybody who's using Edge can nowactivate this and use that.
It's only currently working withseveral different specific
websites such as Amazon, target,Wikipedia, TripAdvisor, food and
Wine Open Table.
So basically websites you use inorder to do your day-to-day
personal things and notnecessarily business tools.
And as I mentioned, I see a verylong time between now and a time
that an organization will allowthese models to deal with actual
(38:48):
enterprise systems because Idon't know how reliable it is
yet, and I don't know when it'llbe a hundred percent reliable or
at least as reliable as humansdoing data entry.
and I think we still have timeto go until that point.
Going from Microsoft to another.
Widely used platform that hasreleased interesting AI
capabilities notion, officiallyreleased Notion Mail.
It is an AI powered email clientthat connects to your Gmail
(39:10):
account and allows you to doreally cool things.
The first thing, it allows youto manage the emails, it reads
all your emails, and you willcategorize them and put it in
different buckets so it will beeasier for you to review.
It also knows how to suggestanswers.
it also knows how to look forsuggested meetings.
So if somebody said, let's meet,it will look through your
calendar and make suggestions towhen you are available and
(39:31):
things like that.
This is obviously the directionthat all these tools will go.
I think this is a great movefrom notion to moving that
direction and offer thatintegration.
I must admit, I'm surprised, andI shared that before that
doesn't exist in Google itselfalready as part of Gemini.
But I think once Google andMicrosoft figure out how to
integrate all their Gemini toolsand all their co-pilot tools
into one Gemini that can see andconnect everything, and the same
(39:54):
thing with co-pilot, we willgain huge benefits to efficiency
on our daily business usage.
Now broadening from day-to-daywork of everybody to what AI is
doing in the broader scheme ofthings from an application
perspective and what aspects ofthe world it is going to touch.
Two former Tesla supply chainleaders has started a company
called Atomic and their goal isto develop an AI powered
(40:17):
platform focusing onstreamlining inventory planning
and supply chain management.
These two leaders haveexperience the craziness of
Tesla's near collapse when theywere scaling model three
production in the beginning.
If you remember Elon Musksleeping at the factory floor at
Tesla in order to go throughthat period, so they deeply
understand the problems and theneed to a better supply chain,
(40:39):
control and management.
Most of the supply chainmanagement in the world today
right now is done manually.
I have several clients who havewarehouses and inventory and
supply chains.
And the amount of work that isdone by copying and pasting data
from multiple Excel files andemails into a unified
environment to understand whatis actually happening in the
company is incredible.
(40:59):
We've been solving this problemthrough automations and custom
gpt, and they're gaining amazingbenefits.
So first of all, if you haveissues like that in your
company, reach out to me onLinkedIn.
I'll gladly help you out instuff that you can start doing
tomorrow versus in months oryears when this company takes
off.
But their goal is to basicallyput AI software that allows
users to quickly simulatemultiple scenarios that would
(41:21):
normally take hours or days tocalculate, and based on that
change the whole supply chainstructure.
In early pilots, they have shownseveral different examples, such
as reducing inventory by halfwhile maintaining 99% in stock
rate for the relevant neededcomponents.
This is obviously verypromising, and I'm sure that a
lot of companies need a solutionlike this.
(41:43):
Another field where this ishappening very aggressively is
obviously customer support.
So Zendesk, CEO, just beeninterviewed and shared a lot of
interesting information about AIin customer service.
He sees a very near future wherea hundred percent of customer
service interactions involve ai,and 80% are solved by AI without
any human intervention.
(42:03):
Now the interesting thing isaccording to a survey that
Zendesk themselves did, 51% ofconsumers are saying that they
prefer interacting with botsover humans when seeking
immediate service.
So I wanna unpack that for asecond.
first Of all, the survey wasdone by Zendesk, so they have a
vested interest to say thatbecause that's where they're
pushing their platform.
(42:24):
But that being said, now I'mgonna speak from my personal
perspective.
Every time I have to deal withany of the big companies, such
as at and t or any medical groupor insurance or airlines.
It drives me crazy.
The amount of calls,communications people I have to
talk to in order to solveproblems that seemingly
shouldn't be that complicated.
I just spent last week, twohours on the phone try to get my
(42:47):
tickets from StubHub.
That is insane.
That shouldn't happen.
and I agree that if these toolsare done correctly and they're
connected to the right systems,and they can find the right
information and take the rightaction quickly and effectively,
I would rather talk to that thanthe traditional customer service
agent any day, any time, andonly escalate to a human when it
(43:09):
cannot solve my problem, whichhappens anyway in more than 50%
of the cases.
An actual example that I foundis that the city of Buenos Aires
is developing AI chatbots tomanage over 2 million queries
per month without humanintervention.
And that has reduced theirburden on their actual staff by
50%.
(43:29):
So that basically tells you thatthey are dealing with a huge
amount of responses that aresuccessful because it's not
getting to the humans afterthat.
It is very clear that thecustomer service universe is
changing dramatically as wespeak.
And I already predicted that inthe beginning of last year, that
I think the concept of callcenters or connect centers will
(43:51):
disappear from the world.
And I know that's a scarythought for an entire industry
to disappear, especially incountries like India and
Philippines, where there'smillions and millions of people
who make their living throughthat industry.
But I don't see this evolving inany other way.
Another interesting data pointabout AI implementation in
enterprises comes from Johnsonand Johnson.
(44:11):
They just released the findingsof an internal research that
they've done, and they foundthat 10 to 15% of artificial
intelligence use cases delivered80% of the value.
That's in the story that wasshared on the Wall Street
Journal this week.
That's not too surprising to me,but I think it's not the right
way to measure it.
And I'll explain.
The reason it's not toosurprising to me is the 80 20
(44:32):
rule always works, or in thisparticular case, the 85% rule,
but it always is true thatwhatever are the big projects
are going to deliver the mostamount of value.
That doesn't mean that the other80% of projects do not provide
value or do not provide positiveROI.
So I can tell you that many ofthe things that I'm doing with a
lot of my clients, or thatpeople that take my courses, do
(44:54):
in their companies are smallday-to-day initiatives that are
going to change the efficiencyby small percentage, but free a
lot of time for specificindividuals to focus on bigger
things.
None of these things will bevisible as a significant one
line item on the bottom line ofa large scale company, right?
So if you wanna move the needlefor Johnson and Johnson, the
(45:14):
benefit needs to be in thebillions.
If you just created aninitiative, a small custom GPT
that removes the need to do atask once a day for an hour,
that is worth a million dollarsto the bottom line.
But it's not gonna move theneedle for Johnson and Johnson.
So they're going to say, thatdidn't qualify to move the
needle for the organization.
(45:34):
But if you do 20, 30, 50, ahundred of these in an
organization, well that adds upvery quickly to a significant
amount.
So while I agree with that, thatthe 80 20 rule works, I think
every use case needs to betested as its own ROI use case,
and if there is a positive RO, Icontinue doing it, even if it's
not a huge addition to thebottom line.
And I think there's an importantway here to differentiate
(45:56):
between company initiatives thatrequire a huge amount of
resources.
And over there you do want tofocus on the ones that will
produce the most amount of valueversus education and training to
your employees on how they canapply AI tools on their
day-to-day stuff to gain thesesmall benefits.
And then just encourage anybodywho can get a positive ROI out
of that process.
Staying on the topics ofindustry and new tools Light
(46:19):
Source Labs has emerged fromstealth mode.
They just raised$33 million inseed and Series A, and the
company addresses a criticalindustry gap.
This side of procurement.
So again, we're still staying inthat universe of supply chain
and so on, and they're statingexactly what I told you before,
that 70% of procurement teamsstill rely on manual processes
using no software tools andsourcing using no source
(46:41):
software tools, and insteadmanaging billions.
If you look at aggregately inthe world through email
spreadsheets and randomlyformatted document, exactly as I
mentioned earlier, and they wantto transform that aspect of the
business.
So a different company, slightlydifferent flavor, but I'm
sharing with you all of these tostart understanding where the
world is going.
AI is going to be in everythingthat we do, and we'll be able to
(47:04):
dramatically improve everythingthat we know in every aspect of
the business.
And as a business owner, abusiness leader, all you have to
do is think about what are yourbiggest pain points?
What would you like to solve?
And start looking for these kindof solutions because they exist.
And if they don't exist, theywill exist in the next few
years.
Staying on the topic of new,interesting developments in the
enterprise space, Adobe justtook a strategic investment in
(47:27):
Synthesia, so those of you whodon't know Synthia, they're a
British startup that allows youto create and use AI avatars.
This comes at the same time thatSynthia announced a hundred
million in a RR, which ispositioning them as a leader in
this field.
The two leading companies in theAI avatar universe are Synthesia
and Hagen.
Both of them have very advancedcapabilities, but the investment
(47:47):
from Adobe makes it veryinteresting because this now
brings potentially in the futurethese kind of capabilities into
the Adobe universe, which willallow interesting combination of
the Adobe tools with Avatarvideo generation.
And if to quote Synthesia, CEO,he said that their vision is
aligned with Adobe todemocratize high quality content
creation, and making enterprisecommunication faster and more
(48:09):
effective.
I couldn't have said it better.
I think it's a very interestingpartnership and it will be very
interesting to see if Adobeactually starting to leverage
that inside their tools.
That being said, despite theamount of money that Synthesia
is making right now, and despitethe fact that they saw a hundred
percent growth year over year,they still lost over 25 million
pounds this past year.
And continuing on the topic ofimpact on industry hugging face,
(48:32):
the open source behemoth, theone that hosts most of open
source models in the world, justmade a very interesting move and
they acquired a French basedrobotics company called Pollen.
The sum wasn't released, but therobot called Richie two is now
available for sale for$70,000.
Now, the interesting thing aboutthis particular robot is that it
comes ready to go, but with anopen source platform that you
(48:54):
can continue to develop on yourown.
Hugging face team lead forrobotics is Remi Caden, who is a
former Tesla robotic scientistwho worked on the Optimus
program.
So they're definitely veryserious about their move in that
direction, and the fact thatthey're releasing the world's
first advanced open source modelmakes it very, very interesting
and showing you how many hugeadvancements are happening and
(49:16):
are going to happen in therobotics space.
I think it'll be veryinteresting to see what
companies do with thatinfrastructure and architecture
that is now open space.
And it's a very interesting moveby hugging face, putting them in
the hardware field as well, andin the very competitive and
highly lucrative roboticsuniverse.
There are a lot more roboticsupdates and if you wanna find
out about them, you can readabout all of them in our
(49:37):
newsletter, which you can signup for in the link in the show
notes.
All you have to do is open yourphone right now, click on the
links and you can sign up for anewsletter.
And over there are a lot morenews that we cannot share on the
show, but our next topic, aftertalking about enterprise and
what's happening in that field,I wanna share with you something
interesting from Grok.
So Grok just shared Grok Studio.
Grok Studio is their version ofCanvas in ChatGPT or artifacts
(49:58):
in Claude.
And this is a side by side splitview where on the right you have
a document that you can edit.
And on the left you can continueto have the chat in Grok.
I started using it yesterday.
And I must admit, it is not topar with Canvas and not even
with artifacts.
It can run documents and code atthe same time.
(50:19):
It has one cool benefit that ithas more of a document style
editing capabilities where youcan change things to bold or
italic and different headingsstraight there in the user
interface, but it's lacking alot of other aspects that Canvas
is so helpful at with the mainthing from my perspective, is
the ability to highlight aspecific segment of the text and
getting the AI to work just onthat segment.
(50:39):
To me, that's the most magicalaspect of Canvas.
The other problem with the wayGrok studio works is that it
writes everything, the fullanswer, which sometimes for me
is like three pages on the leftside, basically on the regular
chat, and then copies it to theother side, which takes twice
the time, which absolutelydrives me crazy.
So a good step in the rightdirection.
(51:00):
I think these collaborativeenvironments with AI are
fantastic.
I think the currentimplementation by Grok is
lacking and needs to beimproved.
That's it for today.
Don't forget right now to openyour app and look at the survey
and tell us what you want thispodcast to be.
I want to listen to what youhave to say and make adaptations
to this podcast to serve youbetter.
But for that, please fill outthe survey.
Also, don't forget, there's anAI business transformation
(51:22):
course starting on May 12th.
So if you need better structuredtraining on how to implement AI
to drive your career, to improveyour team, to improve your
business, your company, it's anincredible opportunity.
We teach these courses all thetime, but usually for close
groups and only once a quarterwe open them to the public.
So the next one after May willprobably be in September.
So don't think twice and comeand join us again.
(51:43):
The link for that is in the shownotes as well.
On Tuesday, we'll be back with afascinating episode that will
show you 15 different use caseson how to use the new AI image
generation capabilities.
14 out of them are businessoriented and not for fun.
So it will teach you a lot ofvaluable stuff.
And for now, have an awesomerest of your weekend.