166 | Grok-3 is No. 1 on the AI Leaderboard, ChatGPT 4.5 and 5 and Claude 4 are around the corner, AI impact on the workforce and other important AI news for the week ending on February 21, 2025 - Leveraging AI

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
(00:00):
Hello, and welcome to a weekendnews episode of the leveraging

(00:02):
AI podcast, the podcast thatshares practical ethical ways to
leverage AI to improveefficiency, grow your business
and advance your career.
This is Isar Matis, your hostand another packed week.
It's a little different thanRegular weeks because there
weren't a lot of big releases.
That was one very important one,which we're going to start with.
But there are a lot of rumors ofwhat's coming down the pipes in
the next few weeks.
And we're going to talk aboutthat.

(00:23):
We're also going to focus on theimpact of AI on jobs and some
interesting statistics in thatworld.
And then we have a very longlist of rapid fire items that we
need to cover.
So let's get this started.
The biggest news of this week isthe release of grok three.

(00:45):
So those of you who don't knowgrok is the large language
models coming from x.
ai, which is Elon Musk's AIcompany.
The first two models Grok 1 andGrok 2 were mainly toys that
potentially provided someinformation if you're a heavy X.
com user, previously Twitter.
But if you're not, they reallydidn't provide any significant
value and I don't think theydeserved any real attention.

(01:09):
Grok 3 is a whole different kindof animal.
First of all, Grok 3 was trainedon Colossus, which is the
largest supercomputer everbuilt.
It has more than 100, 000 GPUsgoing up to 200, 000, right now
as we speak.
So they've trained it on a hugecomputer with a huge amount of
data.
And the model immediatelyachieves very powerful

(01:31):
capabilities.
first of all, they releasedseveral different models.
There is grok3, grok3 mini, likemany of the other models, just a
smaller, faster, less expensivevariation of the same model.
They also released grok3reasoning, grok3 mini reasoning,
again, aligning with the newtrend of reasoning models.
And they've also released a deepresearch capability, just like

(01:53):
all the other recent tools.
We're going to talk about thatas well.
So immediately at the release,Grok shared that they basically
beat all the other models on themajor benchmarks that are out
there today, both the aim formath and GPQA for PhD level
science problems, as well assome other benchmarks.
Now, as I mentioned, they alsoreleased deep research and they

(02:14):
released what they called thinkand big brain models that are
coming with some more advancedthinking capabilities, and
they're also about to releaselive voice mode in the next
couple of weeks, and anenterprise API access level is
also coming up in the next fewweeks.
Now, in addition, they'replanning to open source grok
two.
So the previous model, as soonas they finalized the deployment

(02:37):
of grok three, we heard asimilar approach from Sam Altman
a few weeks ago saying that theywere on the wrong side of
history of not releasing theirmodels as open source.
So open AI is going to do thesame thing.
They're going to start releasingolder models, not their latest
frontier models as open source.
I think this doesn't make a bigdifference because the most
advanced open source models aremore advanced than their

(02:57):
previous models, but the opensource world will gain access to
some additional models.
Now, as you heard me say in thepast, I think these benchmarks
are highly overrated becausethey're very specific and you
can train the models on them.
And they actually don't meananything for real life, but what
does mean a lot for real life isthe LMSIS chatbot arena, which
we talked about many times inthe past.

(03:17):
For those of you who don't knowwhat it is, the chatbot arena
basically is a blind test.
It gives a user, any user, youcan go there tomorrow and do it
as well.
You put in a prompt and you gettwo results.
And without knowing which resultis which model, you can.
Pick the one you like better.
just less than 24 hours afterreleasing Grok, Grok claimed the
number one spot at the chatbotarena.

(03:38):
So more people across multipleuse cases around the world find
Grok 3 to be a better model thanany other model out there right
now.
Now, Andres Karpathy, who is aformer leading open AI engineer
a highly appreciated AIscientist said Grok 3 plus
thinking feels somewhere aroundthe state of the art territory
of OpenAI's strongest models andslightly better than DeepSeq R1

(04:02):
and Gemini 2 flash thinking.
So somebody who is a seriousperson in the AI field also
thinks that this is a seriouscontender and not just the
masses using it on the day today.
By the way, I think the massesusing it on the day to day are
more important than any expertbecause they're actually using
it for real use cases.
Now, in parallel to this, XAIhas dramatically changed their

(04:22):
pricing model in a veryconfusing way, I must say.
first of all, they've increasedtheir premium plus level
subscription, which gives youaccess to basically everything,
from a 40 to 50 instead of 22that it was before, with an
annual subscription going to 350to 395.
And, That's also up from theprevious annual subscription.

(04:44):
Now, when I say a range ofpricing, it's really confusing
because in different places, Xactually has different pricing.
So their support page says 50 amonth, the signup page says 48 a
month, and the actual checkoutsshowing 40 a month.
So I don't know if this is justa mistake because they were
worried about other stuff in therelease, or they're just testing

(05:04):
different price points.
But the reality is theydramatically increase their
subscription level.
If you remember the highestsubscription level right now on
open AI is 200 a month, sothey're still not there and
they're providing highlycomparable capabilities in grok
three right now.
Now in parallel to all of this,in addition to releasing again,
right now, maybe the best modelon the planet, XAI is seeking a

(05:27):
massive funding round of another10 billion with a proposed
valuation of 75 billion if theyraise that, which I don't have a
doubt that they will a, becauseit's Elon and B because they now
have the leading frontier modelin the world, it would bring
their total funding to 22.
4 billion.
Now, What's amazing about thisachievement is that XAI started

(05:48):
the development game of newfrontier models very, very late.
And they were able to acceleratetheir process to delivering a
model that is top off the linein the shortest amount of time
compared to any other model outthere that's following the
development of the Colossusmodel in a few months, which
usually takes normal companies ayear and a half to develop.

(06:08):
So they're definitely found waysto a) build faster in their
hardware, as well as develop thesoftware side of things.
And the AI models right nowfaster than anybody else.
I already tested Grok on severaldifferent use cases, including
deep research.
And I was seriously impressed.
It's a real model that providesreal results.
And the deep research model ofit is actually really, really

(06:29):
cool because it lets you see howit's thinking and doing its
research process.
Similar to how all three anddeep seek is showing you how
they're thinking, but combinedwith the actual search results
themselves.
So a very cool approach thatallows you to actually see what
the model is doing.
And allows you to provide betterinputs on how to make the search
better because you know, what'shappening behind the scenes.
I highly recommend to go andtest it out right now.

(06:50):
It's available for free, butit's probably won't be for long,
at least not at a high volume.
So if you want to check it out,just go to grok, sign up and
check it out right now.
Now, as I mentioned, there are alot of rumors on new releases
and capabilities that are comingout from other companies in the
next few weeks.
So Claude has shared thatthey're about to release Claude
4 that has a lot of thecapabilities that it was
lacking, like reasoningcapabilities similar to O1 and

(07:12):
O3 and a lot of other modelsright now.
Web search capabilities,hallelujah, maybe the thing that
troubled me the most while usingClaude regularly is that he
didn't have web access.
So that's coming in Claude 4 anda lot of other small steps.
And all of that was revealed bysomebody drilling into their
Claude iOS app and finding newicons that are going to appear

(07:33):
in the next version.
So there's going to be a lot ofnew capabilities coming up from
Claude.
I must admit that I'm surprisedit's not out yet.
Anthropic has been surprisinglyquiet with all the madness of
releases in the last few months.
And so I'm anticipating thatthey have something really big
to release that if I have toguess, will do more than just
align with everybody else'scapabilities of web access and

(07:54):
deep research and reasoning, butprobably will add more stuff
that we haven't seen before.
And that's presumably coming inthe next few weeks.
and the rumors are saying thatthe release may come in the next
few weeks or a few months.
The company itself has notprovided any timeline, but these
are the rumors that are goingaround the industry right now.
Now to other interesting thingsthat are happening in the
industry, and these are quasirapid fire items, but I will

(08:17):
roll them in right now withwhat's happening in the industry
and coming releases and fundingrounds.
So super intelligence, thecompany founded by Ilya
Saskover, who was one of the cofounders in open AI in
negotiation to raise anotherbillion dollar, which with a$30
billion valuation, they justraise a similar amount not too
long ago.
So this would be their secondfundraising in less than six

(08:39):
months without any clear plansto revenue or even path to a
product.
But just because of the peoplethat are leading that research,
they're gonna probably raisethat amount of money.
Another interesting person wholeft open AI, who finally shared
what they're doing or share whatthey're doing is Mira Moradi.
She was the former CTO.
She left open AI in October fora stealth startup.

(09:00):
And now she finally revealedwhat they're about to do, or
maybe kind of revealed whatthey're about to do.
So their new company name isgoing to be Thinking Machine
Lab, and she has attractedseveral leading developers, such
as John Shulman, who's anotherco founder from OpenAI, and
Barrett Zoff, Who was the exchief research officer, so a lot
of big figures, but they haven'treally shared exactly what

(09:21):
they're going to do.
What they said is that they'regoing to close the gap between
the capabilities that are nowsaved to the leading research
labs with the actual needs ofpeople in the industry with a
focus of breakthroughapplications, science and
programming.
And so it's very vague.
There's nothing really clearthere.
If you go and read the actualpage, it's a lot of mumbo jumbo

(09:43):
that doesn't really explain whatthey're going to do.
I don't know if they're doingthat on purpose or they're doing
that as part of raising funds,which will not surprise me, but
right now.
The only thing that's clear isthat there's some serious
hitters there.
And just like Ilya Saskover,they will probably raise the
amount of money that they'retrying to raise.
Another company that making abig splash again is Mistral.
We talked about the many timeson this podcast, they're a

(10:05):
French company and they just hit1 million downloads in just 14
days after launching their iOSapp.
So a big.
Success from their perspective,and this comes because the
French president, EmmanuelMacron, has suggested to French
citizens to go and download theapp In his speech in the Paris

(10:26):
conference, just less than twoweeks ago.
So that drove immediate, hugeamount of downloads to the
French application, mostly byFrench people, very little
people outside of Franceactually downloaded that, but
they're definitely doingsomething right.
And they actually have a verypowerful and capable model.
The other interesting piece ofnews about Mistral is that
they're making a very strategic,strong pivot into the enterprise

(10:49):
world.
We talked about this as well,but now we know that they're
already working with companieslike and Franz Stravail and
European defense companyHelsink.
So really big clients are usingtheir technology.
The, one of the biggest benefitsof Mistral enterprise
capabilities is that you caninstall it on premise and
actually run it on your ownservers, not exposing your data

(11:10):
to anything outside yourexisting environment.
And that is obviously veryattractive to a lot of other
companies.
Now going back to releases,Perplexity has launched their
deep research mode.
So another deep researchcapability, similar to the one
we have from Google and u.
com and open AI, etc.
The biggest difference is thatit's available to free users.

(11:31):
By the way, u.
com is another platform whereyou get very powerful deep
research capability for free.
So the perplexity deep researchmodel was able to achieve a 21.
1 score on humanity's last exam,which we talked about in
previous shows outperformingGoogle Gemini thinking at 6.
2 percent and GPT 4.
0 at 3.
3%, but still behind OpenAI'sdeep research at 26.

(11:55):
6%.
I don't think grok has taken thetest, or if they haven't shared
the score, which means theyprobably didn't score very high.
Now, the other cool thing in theperplexity deep research tool is
it actually provide answersfaster than both Google and open
AI.
Google takes a few minutes.
Open AI, the same thing, andthis usually takes About a
minute, up to two minutes toactually give you the research
answers.
So definitely worth using.

(12:16):
And as I mentioned, if you don'thave any paid service, that's a
very good option to go after.
Right now, it's only on the web,but they're planning to release
it to their Mac iOS and Androidapplications.
Now I want to stop for a secondto talk a little bit about this
deep research madness, but sayhow significant this is.
Those of you haven't used any ofthe deep research tools, they're
now available, as I mentioned,from.
Google Gemini, if you have thepersonal pro version, not the

(12:40):
business one, I don't know whyit's not on the business one.
It's actually really annoying asa business user of Google as
well.
It also available from open AI,but only on the 200 a month
licensing level.
They said that they will open itwith a limited access to the
lower levels, but that haven'thappened yet.
At least not for me.
You.
com as I mentioned, which hadthat functionality for a very
long time is a very good freeoption.
They also have a premium optionas well.

(13:01):
DeepSeek has the same thing andnow Grok.
The biggest deal about thesetools is that they involve an
agentic capability ofunderstanding what is it that
you're trying to find,understanding the goal, defining
multiple steps of research, andthen doing those steps of
research.
And only after they're doneaggregating the information and
providing an answer.
So instead of a very quick,shallow answer, you get a.

(13:22):
Well, research in depth answerthat sometimes is a summary of
hundreds of websites visited.
So a work that used to take aperson hours or maybe days to do
is now done by these tools injust a few minutes and it's an
extremely powerful capability tofind information and do in depth
research on any topic.
And I highly recommend to you totry the ones, at least the free

(13:43):
ones from DeepSeekU.
com, Grok, and now Perplexity.
Now let's switch gears to thenext big item, which is what's
happening with different AItools around the world right
now, and how many people areactually using them.
So the first one to talk aboutis obviously the 800 pound
gorilla, which is open AI, theirCEO, Brad Lightcup shared that
they have reached a staggering400 million weekly active users

(14:05):
on the platform.
And they're definitely leadingahead of everybody else.
Enterprise adoption has doubledsince September of 2024.
So less than two quarters ago.
Their enterprise users hasdoubled with over 2 million
businesses now using ChatGPT atwork.
Major corporations like MorganStanley, Uber, T Mobile are

(14:26):
using OpenAI as their AIinfrastructure.
And they're now getting intogovernment agencies.
And we talked about that in thepast few shows, but now the
USAID is implementing ChachapitiEnterprise in their
administrative work.
Now, we said last week already,but I'm going to mention this as
we started talking about whatreleases are coming.
GPT 4.
5 is imminent.
It's going to be released mostlikely this coming Monday.

(14:48):
So if you're listening to theshow this weekend, September
22nd or 23rd, it's probablybefore, but you might already
have access to GPT 4.
5.
if you listen to that afterthat, and GPT 5 is planned right
now, the rumors talk aboutbeginning of May.
So just around the corner, ifyou remember, GPT 5 is supposed
to be unifying the universe ofthe GPT models and the reasoning

(15:10):
models, the O series of modelswhile eliminating the drop down
menu and just giving us a greatuser experience that will have a
lot of intelligence and we'llfigure out in the back end what
it actually needs to do.
I cannot wait to see what that'sgoing to do.
I assume 5, GPT 4.
5 and definitely GPT 5 will takethe leading spot in the LMSIS

(15:30):
chatbot arena and we'll removeGrok from there, but I obviously
don't know that and time willtell.
Now a new survey from FuturePublishing has looked at other
applications and how they'redoing around the world.
So ChatGPT maintains the firstposition as the most popular
tool with 37 percent of peoplesurveyed saying that they're
using it regularly, but onlyshowing a modest growth of 7

(15:52):
percent from the last time theydid the survey.
Google Gemini, on the otherhand, is taking second place
with 22%, but doubling theprevious survey that they have
done.
Microsoft copilot takes numberthree with 20%.
I think that's not totally fairbecause Microsoft copilot is
available with every Microsoftlicense out there.
So this might be like people aremeasuring META'S daily users on

(16:14):
ai, which is everything Meta isdoing has AI built into it.
So I don't think that's atotally fair comparison.
But still in this survey,they're number three with 20%
Grammarly, with 20% of usersbeing used daily.
I use Grammarly all the time.
I've been using it way beforethe AI craze started, and
probably a lot of you as well.
And maybe you don't considerthat an AI solution, but it is
an AI solution.
A surprising next thing is DALI,the image generation tool from

(16:39):
OpenAI, with 9.
5 percent of the people sayingthey're using it.
The reason I'm saying it'ssurprising is that there are
much, much better, Imagegeneration tools that are
absolutely free.
I think it's the number oneranked image generation tool,
just because it's a part ofChatGPT that has the highest
deployment in the world.
And that's why more people areaware of it.
Following place is tied by DreamStudio, Mid Journey, and Stable

(17:03):
Diffusion with 8.
5 percent of users.
All of them are significantlybetter image generator than
DALI, but yet ranked lower.
Again, I think just people don'tknow about them.
And then the emerging platformsthat jumped into the list.
In this survey compared to theprevious server, I Perplexity at
11%, Claude at 10 percent andJasper at 9%.
So this is telling you thatthere's still very low exposure

(17:25):
to AI tools on a random surveyacross people that are not
necessarily in the tech industryand so on.
So the top tool, the big dogopen AI is only used by 37
percent of people who weresurveyed with most of the other
stuff in single digits.
And to our next big topic, whichis the impact and the usage of
AI at work.

(17:45):
And where is this going?
We've been tackling this topicin several of the recent news
shows, but there's someinteresting new facts that I
want to share with you.
First of all, a comprehensivestudy from Stanford university
research.
Finds that 30.
1 percent of U.
S.
workers are currently usinggenerative AI tools in their
jobs as of December of 2024.
So this is a very recent survey.

(18:07):
Nearly 50 percent of workerswith graduate degrees use AI at
work.
37 percent of college graduatesutilize AI technology.
And it is very clear that AIusage increases significantly as
the income increases.
There's a big jump with peoplemaking more than 50, 000 a year
and almost 50 percent of workersearning 200, 000 or more use AI

(18:28):
tools at work.
Now IT services are leading asthe top industry that is using
AI with more than 60 percent ofemployees saying they're using
it, followed by real estateconstruction and education with
more than 40%, which is actuallysurprising, and that sectors
with the least amount of usageare agriculture, mining and
government that are using it theleast.

(18:49):
Now the is that workers reportThat used to take them 90
minutes now take them 30 minuteson average using ai, meaning a
three x improvement onefficiency.
So I said that many times beforein this show, even if AI
development stops today and wedon't get better models, and
even with people that are notfully knowing how to utilize the

(19:09):
technology, they're seeing athree X efficiency improvement.
That's obviously on specifictasks, but it tells you how
profound the implications ofthis technology is going to be
on the future of work.
Now, let's talk a little bitabout potential negative
implications beyond job loss,which we're going to talk about
in a minute, and we talked aboutmany times in the past, but
Microsoft and Carnegie Mellonhas actually published a

(19:29):
research on February 18 thatreveals concerning trends about
AI's impacts on workplacecognitive abilities.
So they have done a it's asurvey of 319 knowledge workers
in different levels and indifferent industries, and
they're finding that workers whouse AI increasingly rely on AI
tools instead of actuallythinking of and analyzing the
task, people focus mostly onverifying the AI outputs and

(19:54):
they have higher and higherconfidence of the air outputs,
which correlates to furtherdecreased critical engagement in
the actual tasks that they'redoing.
And the research identifiesshifts in the workers approach
to tasks moving from informationgathering and cognitive analysis
to mere verification of results.
reduction in active problemsolving in favor of just

(20:17):
response and integration ofresults and transition from task
execution to task supervision.
Now this is Very obvious thatthis would happen.
There are good benefits fromthat.
We talked about this hugeefficiencies.
There are huge disadvantage inour ability to think critically.
And I think the key thing thatwe have to understand both as

(20:38):
individuals and people, as wellas employees and definitely
employers, is that we have tofind a way to elevate the
cognitive thinking level to adifferent level.
While allowing the AI to just dothe tasks.
But the reality is people arelazy and they just let the AI do
the entire work, whether it'sgoing to be better or worse than
them being a part of theprocess, this is obviously

(21:00):
really alarming multiple levels,and it's even more alarming by
the fact that Microsoft is theleading researcher on this when
they are one of the largestcompanies providing AI tools to
employees right now.
Now going back to job loss, USAToday is reporting that major
tech companies are continuinglayoffs in the beginning of 2025

(21:20):
with Meta leading the charge,letting go of 3, 000 employees,
which is 5 percent of theirglobal staff.
Workday laying off 1, 750employees, which is 8.
5 percent of their staff.
And other companies as well.
The somewhat good news is thattech layoffs has slowed down
dramatically compared to thesame time last year.
So these companies are stilllaying off employees, but are

(21:41):
significantly slower pace thanthey did a year ago, and their
optimistic overall outlook forhiring in 2025, not just in the
tech industry.
So zip recruiter survey revealsthat tech sector is bullish on
hiring new employees, mostlybecause of the economical terms
with the reduction of interestrates and growth in the economy.
Now staying on the topic ofemployment, a Y Combinator

(22:04):
backed startup called Firecrawlhas posted a job specifically
for an AI agent.
And they were willing to paythat agent 10, 000 to 15, 000
annually.
Now the position, it was lookingfor an agent that can
autonomously research trendingmodels and build sample apps.
They received about 50 aI agentapplications by different people

(22:26):
for that task before they pulledit off, and they are claiming
that they actually did this aspart of the recruiting strategy
for human AI engineers.
Now, this may sound like sciencefiction to you, but this is
coming, meaning companies willlook for AI agents and will find
AI agents that will do tasksthat otherwise employees would
do, and they would pay for thedevelopers and the people who

(22:47):
create, deploy, and run theseagents X number of dollars less
than they would pay a humanemployee to get the job done
without actually having todevelop it in house.
Now, you may or may not believeme that this is what I think,
but the reality is that Workday,one of the largest HR management
companies in the world, justlaunched what they call Workday
Agent System of Record, which isa platform that allows

(23:09):
enterprises to manage andmonitor the work Of their AI
agents, both work day native andthird party in a centralized
control center.
So what does that tell you?
It tells you that one of thelargest companies in the world
that is building a platform tomanage employees is now looking
to add the ability to manageagents, because they're going to
be a part of the workforce thatwill obviously continue to a

(23:33):
whole industry of, like I said,AI agents from third party
suppliers to do different works.
I anticipate that there's goingto be platforms like Fiverr and
Upwork that will just offermillions of AI agents that can
do different tasks and will getreviews and you'll be able to
hire them for whatever you want,either on your personal life or
on an enterprise level.
So what does that tell us?
It tells us that the workforceis going to change dramatically,

(23:57):
and it's going to impacteverything that we know, from
the way we hire, to the way wefire, to the way we train, to
the way we manage the workforce,because the workforce itself is
going to be comprised of bothhumans And AI agents that will
have to work together and as thetime goes on, there's going to
be more and more AI agents andless people.
And the people will have to finddifferent tasks to focus on.

(24:18):
And as we've seen, the peopleare currently on their own,
letting go of tasks andtransferring over to AI, giving
more and more trust in the AIcapabilities.
This is ringing so many alarmbells in my head, but we will
have to figure this out.
And we'll have to figure thisout very, very fast.
My personal suggestion topeople.
Learn how AI works, understandwhere you provide value, where

(24:40):
you can learn new things toprovide even more value, either
on your own or by deploying AItools and capabilities and
learning how to manage them inthe most effective way, because
that is going to be the mostpowerful capability in the next
few years.
Switching gears to the roboticsworld, which we talked about a
lot in the past, the roboticsworld on one hand, very
exciting.

(25:00):
On the other hand is a very highrisk to blue collar jobs.
So figure ai, which is one ofthe leading robotics companies,
has just unveiled Helix.
It's a new AI model that theyhave developed that they call a
vision language action VLA modelthat enables humanoid robots to
respond to voice commands in thesurrounding.
Environment around them.

(25:20):
This comes only two weeks afterthey more or less fired OpenAI.
They were using OpenAI modelsbefore that in order to drive
their robots.
And now they're announcing theirown in house development that
they're claiming that is moresuitable for running the
robotics world.
In addition, obviously OpenAIannounced that they're creating
their own robotics department,which means they're going to be
competing, which is anotherreason I'm sure why they broke

(25:42):
the relationship with open AI.
Now, the interesting thing aboutHelix is that first of all, as I
mentioned, it understands bothvisual data and language in real
time.
It can control two robots at thesame time in order to allow them
to collaborate on specifictasks.
The other interesting thing infigure is that they're showing
more and more home environmentsolutions, meaning building
robots for the home, which isdifferent than most companies in

(26:04):
the industry that are focusingon industrial robots that will
work in factories and so on.
So they're definitely focusingbig time on robots that will be
able to do different chores andactions around the house.
Another big news in the roboticsworld is Metta.
Metta just announced a newdivision That will be dedicated
to developing systems andhumanoid robots under its

(26:25):
reality labs unit.
Now, Meta is not new todeveloping hardware.
They have deployed differenthardware solutions in the past.
The most successful one we'lltalk about shortly, which is
their collaboration with Ray Banon the Ray Ban classes.
But the goal, at least for now,that they're stating is not
necessarily to focus ondeveloping robots themselves,
but developing models andsystems and components that will

(26:46):
power robots from other roboticscompanies.
So if you want, they want tobecome The Android of the
robotics world, similar to theapproach from NVIDIA, right?
NVIDIA has a lot of capabilitiesin that universe, but at least
as of right now, they're notplanning on develop their own
robots, just being theinfrastructure, the architecture
and the software and thehardware potentially to help
other companies develop roboticsolutions.

(27:07):
Now, interestingly, the firstfocus of meta in the robotics
world is healthcare.
They're focusing on potentialapplications that will assist
elderly patients and we'll dodaily tasks and chores
addressing clinical laborshortage through the entire
healthcare system and providingthem with automation in routine
tasks.
The humanoid robotic world isreally saturated and he's

(27:29):
burning right now with TeslaOptimus and NVIDIA robotics
initiatives and figure AI thatwe talked about in unitry from
China and Boston Dynamics andmany more.
So we will see more and morerobots take more and more
positions.
Like I said, mostly initially inindustry and factories, but then
very shortly after in placesthat we go to.
So gas stations, coffee shops,and shortly after people's homes

(27:51):
as well.
Another company that made aninteresting move into
potentially entering therobotics world is Apple.
If you think about it, it makesperfect sense.
Apple has been involved indelivering technology to people
for decades, and this is goingto be the next frontier of
technology and hardware forpeople to use.
So they have announced thatthey're working on the

(28:11):
development and research ofhumanoid and non humanoid robots
with potentially mass productiontimeline of 2028.
Now, the interesting thing isthat the first thing that they
released is not humanoid at all.
It more or less looks like aPixar style lamp that is a robot
that can follow you around andlight things for you.
I don't really understand thegoal.
But the idea is to play withdifferent form factors and find

(28:33):
where they can be useful topeople in different scenarios,
which I actually find a verycool and be very Apple, let's
not do what everybody else isdoing, but let's look for
something cool that willactually provide value.
It's very early proof of conceptstages.
So time will tell whether Appleactually goes down that path.
I will be.
I'm really surprised if theydon't, because as I mentioned,
there's multi billion dollaropportunity there.

(28:56):
And I don't think Apple willstay behind.
They're already behind on the AIrace.
And at least on the hardwarerace they can join the game
right now and be in thecompetition in the late 2020s
and into the 2030s.
That's it for the deep dives fortoday.
And now into a lot of rapid fireitems.
So, first of all, If youremember last week, we shared
with you that Elon Musk made anunsolicited offer to buy the

(29:17):
nonprofit arm of open AI for 97.
4 billion, where according toFinancial Times, right now, open
AI is exploring new governancemechanisms to protect against
the pandemic.
Hostile takeover off theirentities from Elon Musk and or
others, and they're consideringdifferent ways to do that such
as allowing special votingrights to specific scenarios and

(29:39):
removing voting rights fromother scenarios to potentially
having the nonprofit arm wouldretain control over some of the
restructured company anddifferent things like that, but
just ways to block a hostiletakeover, which is.
Becoming a bigger fear as ElonMusk's beef personally with Sam
Altman, as well as theirprofessional competition between
XAI and OpenAI is heating.

(30:00):
And speaking about Sam Altmanand Peking Battles, this is
obviously much smaller, but Samshared on X that there is a new
version of GPT 4.
0 that is better than theprevious model.
And Arvind Srinivas, the cofounder and CEO of Perplexity
basically tweeted back saying,sorry, what's the update?
So Sam responded with among manyother things, it is the best
search product on the web.

(30:22):
Obviously sticking it back toArvind and Perplexity and
Srinivas replied to that, thatthey just released a deep
research agent as well tocompete with OpenAI.
So what does that tell you?
I must say that I really likePerplexity.
I still use it every single day.
I think that their search userinterface is still the best out
of all the tools, but theydefinitely lost a lot of the
magic because if a few monthsago there were the Only AI

(30:45):
research tool on the planet thatworked very quickly and provided
results other than you.
com that was more of a deepresearch kind of tool.
now you can do this on all theother platforms.
As I mentioned, I still liketheir user interface a lot.
I still use them all the time.
But I think they're going tolose more and more market share.
I said that very early on when Istarted using perplexity about a
year ago, that I don't see avery bright future for them

(31:08):
because I soon as open AI andmore importantly, Google, who
are the search gods will figureout how to do this better.
They will have very little tooffer in order to beat them,
especially that they're notdeveloping their own models.
They're relying on other peoplemodels.
But as I said, as of right now,I still use them a lot.
I use them a lot with DeepSeq R1and I find that it's actually
working extremely well in thatparticular combination.

(31:31):
So if you're not usingPerplexity, it's still worth
testing it out, but keepcomparing it to the other tools
as well.
Now staying on OpenAI, OpenAIJust released their AI agent
operator in additionalcountries.
so far it was only available inthe U.
S.
on To the 200 per monthlicensing level.
now it's going to be availableto the same premium level in
Australia, Brazil, Canada,India, Japan, Singapore, South

(31:52):
Korea, and the UK, and obviouslystill not in the U.
S.
The European Union because oftheir regulation.
So if you are in any of thesecountries, you will be able to
pay 200 a month and get accessto this capability, which allows
operator to take control overyour browser and do tasks for
you.
Now, still on the topic of doingtasks on the browser, Microsoft

(32:13):
has introduced OmniParserversion two, which is a tool
that enables any large languagemodel to interact with a graphic
user interface on your screen,basically allowing it to take
over anything that you can sharewith it on the screen.
Now, this infrastructure has a60 percent reduction in latency
compared to their previousversions of version one of the
tool that they releasedpreviously and when combined

(32:35):
with GPT 4.
0 achieves very powerfulcapabilities as far as being
able to accurately click on theright things on the screen,
they're currently supportingOpenAI 4.
0, 0.
1, and 0.
3 Mini, DeepSeq R1, Quen 2.
5, and Anthropic Sonnet, And itis available through Microsoft
dockerized windows system.
So if you want to build on topof that, you can do it right now

(32:58):
while picking your model.
This is very obvious that thisis direction that everybody's
going, that those agents will beable to control everything on
our computer.
I mentioned that before, I'llmention it again.
I think the two biggest gapsright now is consistency.
Meaning these tools cannot stillconsistently do the tasks and it
will wander off and do otherthings.
And that's obviously a big risk.
And the other is control.

(33:19):
How do I control that?
It's only touching the thingsthat I'm allowing it to touch
and put guardrails that aregoing to be tight.
So the tool will not do otherstuff that it's not supposed to
do.
And I think we'll see more andmore developments on these two
topics in 2025.
That by the end of the year,we'll make these systems a lot
more usable and predictable.
And hence will be deployed in alot more places.

(33:40):
Now staying on Microsoft SatyaNadella, the CEO of Microsoft
has done a very interestinginterview with Rakesh Patel on
his podcast this past week.
And he, first of all, it's aninterview.
You have to listen to, if youwant to understand what's
happening in the world rightnow, it's about an hour long and
they go into a lot of topicsthat Satya didn't necessarily
share in other places.
But two of the most interestingthings that he shared, one is

(34:03):
that he thinks we'reoverbuilding AI computer
infrastructure.
So Satya believes that currentdemand doesn't make sense and
that we're building more AIcompute than we actually need.
And he's saying that Microsoftis planning to lease a lot of
the compute capacity in 2027 to2028 versus building their own
capacity, Just allowing themmore capability to turn it up or

(34:26):
down as needed.
And he's saying that because ofwhat he predicts, he thinks that
AI compute prices are going todecrease dramatically because
there's going to be anoversupply of compute.
There's a lot of companies thatdo not think this way are
investing tens of billions andhundreds of billions of dollars
in building this capacity, butthat's what makes this viewpoint
very interesting.
Now, the other interesting topicfrom Satya is that he dismisses

(34:50):
current view of how to defineAGI.
So there's all these differentbenchmarks and idea on how to
define AGI, and he actually tookit to a completely different
direction.
Satya is basically saying thatAGI will be achieved when it can
generate a 10 percent growth inthe overall global economy.
Basically saying that AGI meansa significant increase.

(35:12):
Change to how efficient humansas a whole are.
And the way to measure that isby global GDP.
Now, 10 percent growth in theglobal GDP is in the trillions
of dollars.
And that's a very interestingway to measure whether we
achieved AGI or not.
I really hope that there's goingto be some compromise between

(35:32):
these two things, and that wewill look for other ways to
measure the benefit to humans.
Other than just GDP, but again,a very interesting viewpoint.
Go and check out the interview.
I will share a link to it in theshow notes.
Now in parallel to that,Microsoft announced a
breakthrough in quantumcomputing, and they're claiming
that it will be possible tobuild a utility scale, meaning

(35:54):
something that can actually beused by us and not just for
research quantum computer withinfour years.
This is a line from similarannouncements from Google with
their latest achievements.
So it seems that quantumcomputing is moving slowly or
maybe quickly from being anincredible idea.
To a practical solution thatwe'll be able to start using

(36:14):
before the end of this decade.
This is way faster than anybodyanticipated a few years ago.
And this will mean going back tocompute for AI stuff that will
be able to harness the power ofquantum computing to do a lot
more AI with a lot lesscomputers sometime within this
coming decade.
And from Microsoft, let's shiftto Google.
But in the first news, we willmix Google and Anthropic.

(36:36):
So according to a court filing,Anthropic has requested to
intervene in Google's antitrustcase.
So if you remember, thegovernment has decided to limit
Google's ability to grow itsmarket share and even wanting to
break it up in specificsegments, Anthropic is arguing
that the proposed ban onGoogle's investment could harm
its own operations and itsmarket position and others.

(36:59):
Google has a 3 billion stake inAnthropic right now, and that
might go away If they have tostop their additional
investments, and this couldforce Google to sell a lot of
Anthropic shares that coulddramatically drop the market
value of Anthropic, which meansit will make it harder and
harder for them to raise futurecapital.

(37:19):
Now the anthropic arguments are,is that AI was not a part of the
original antitrust case and thatneither anthropic or Google's AI
investments were even mentionedin the complaints.
Now, will that be successful ornot?
It will be very interesting tofollow.
And I will keep you posted withhow this moves forward.
If you remember last week, Ishared with you the JD Vance in
his address to the Parisconference has shared that he

(37:40):
thinks that the world and the US specifically needs to give as
much opportunity to smallerstartups to fight incumbents.
Well, this particular scenariois really interesting because
the incumbent is the one that isfinancing the incoming startup.
So I don't know what hisposition is going to be on that,
but I think in general, theywant to make sure that new
companies stay afloat.
And if that means allowingGoogle to invest in Anthropic, I

(38:02):
think that is going to beallowed.
Now switching to only Google,Google Research has announced a
multi agent AI system based onGemini 2 that is specializing in
biomedical research.
And what it's actually doing,the system operates through
multiple AI models that arechallenging each other or in a
self improving loop.

(38:22):
The system employs sixspecialized AI agents, one for
generation, reflection, ranking,evolution.
Proximity and meta review,basically a team of agents that
are working collaboratively inorder to achieve a goal.
In this particular case, it islooking at huge amounts of data,
trying to refine scientifichypothesis and making a

(38:43):
significant advancement in theway that is done currently only
by humans.
Now they've already tested thatin actual drug repurposing, and
that was tested in a lab and wasproven to be successful.
So we talked about this in thepast that DeepMind, and
specifically Demis Hassabis, whois running DeepMind, his main
goal is to progress humanitythrough more advanced and faster

(39:04):
research.
And this is definitely alignedwith that.
Now, if you want to learn on avery small scale, how that can
be done, our next episode 167,is called Build an AI Dream Team
that works for you in differentroles and personalities.
And we're sharing how you can dosomething like this for your
business right now, and havingdifferent types of agents
working collaboratively, or evenarguing with each other in order

(39:26):
to achieve better results withthe AI tools we all have access
to today that's coming out thiscoming Tuesday.
Now since we mentioned DemisHassabis in an address to Google
employees, he was talking aboutDeepSeek and he basically stated
that Google and I'm quoting,more efficient, more performant
models and that they have, I'mquoting again, All the

(39:47):
ingredients to maintainleadership in AI development.
He also addressed DeepSeek costefficiency claims, and he's
suggesting that they're onlysharing a tiny fraction of the
actual development costscompared to how Western
companies are sharing.
And that It was very obviousthat they were heavily dependent
on the Western AI models intheir development.

(40:07):
That being said, he also saidthat DeepSeek's work is the best
he has seen coming out of China,meaning he's aware of the fact
that there is growingcompetition from the other side
of the world.
I Said that many times before,Google is probably the company
best positions to A, lead and B,continue leading the AI race
because they have access to allthe different components that

(40:27):
are required in order to makethat happen.
Small piece of news from Google,but really exciting from my
perspective as a heavy Googleuser.
Imagine 3, which is their AIimage generator, is now
available across the entireGoogle workspace stack so you
can use it in the Gemini app,you can also use it in the docks
in Google Docs sidebar in GoogleSheets in Google Drive in Google
Slides in Gmail and in Googlevids.

(40:49):
So wherever you are using anyGoogle platform, you can now
generate images with imaginethree.
And if you remember last week weshared with you that imagine
three is now ranking number onein the AI image generator
leaderboard on LM sys chatbotarena, so it's not just a model.
It's one of the best models outthere to generate images.
And now you can do it nativelyin the apps that you're using in

(41:12):
your day to day.
This is the big promise.
And we talked about this in thepast.
the goal of Google and Microsoftis to integrate the full, most
capable AI capabilities acrosseverything that they're doing.
aNd this is one of the firsttimes that we actually see
something useful in thatdirection.
I'm personally excited aboutthis because I generate a lot of
presentations and I have to goto third party tools in order to

(41:32):
create the images for thepresentations to put in Google
Slides.
And I'll be able to do itnatively within the app, which
is great.
The rollout of that alreadystarted, but it will end on
March 1st.
So if you don't have thatcapability yet, you will get it
soon.
Now, Google also is now allowingyou to analyze documents on the
free Gemini version that waspreviously kept to only the 20 a
month membership.

(41:54):
So right now on their webplatform, as well as both mobile
platforms, you can upload almostany file you can imagine.
So up to 10 files simultaneouslyand multiple file formats like
docs and obviously code like Cand Python and Java and other
coding languages and obviouslythe Google workforce files like

(42:14):
Google Docs, Google Sheets,CSVs, Excels, XLS, so most of
the files that we use, you cannow analyze in Gemini.
I do this a lot and it actuallyworks extremely well combined
with the fact they have thelargest context window of 2
million tokens, depending onwhich one you use, it actually
is a huge capability that isprobably the best of all free
models right now.

(42:35):
Now Google is also splitting outits Gemini app on iOS from the
main Google app now to astandalone Gemini application.
This is in the big craze rightnow for AI applications.
We talked about both open AI anddeep seek and the chat from
Mistral.
So now Gemini has its own iOSchat.
And they're also integrated itwith the iPhone 16 action button

(42:58):
to immediately call the Geminichat and have a conversation
with it.
And they're also introducingGemini live, which allows
advanced voice assistantstraight from the app.
Going from Google to Meta, Metahas announced a very interesting
project that is called ProjectWaterworth, which is deploying.
50, 000 kilometers of a sub Chigh capacity fiber optics cable

(43:20):
connecting all continents.
It is the first of its kindarchitecture and capability that
it's going to be the mostadvanced and company owned
infrastructure to pass dataacross all continents.
So basically, Meta will havetheir own internet
infrastructure.
The interesting thing here thatI didn't know is that Meta
currently controls 10 percent offixed and 22 percent of mobile

(43:42):
global internet traffic.
So.
Owning their own infrastructurewill allow them to do a lot more
without competing with otherusers for bandwidth and combine
that with AI needs to transferdata even faster from one place
to the other, makes it a veryinteresting move from meta.
Another interesting milestonefor Meta that we mentioned
earlier, Meta's Ray Ban smartglasses have sold 2 million

(44:06):
units since their launch inOctober of 2023, dramatically
exceeding their initialexpectations.
Now, what they're planning forthe future is they're planning
to dramatically increase theproduction capacity to reach 10
million units annually by theend of 2026.
So within two years from nowselling 10 million units and
they're expanding theirpartnership with Ray Ban through
2030, and they're planning toalso go beyond just Ray Ban and

(44:29):
doing partnerships with Oakleyto develop similar solutions as
well.
So the combination of a highlystylish and yet highly advanced
AI driven it's proving highlysuccessful, they're also
planning to develop a neweradvanced model that will
actually have displaycapabilities, meaning instead of
just seeing the world andlistening and being able to talk

(44:49):
to you, you will be able todisplay stuff on the screen, and
they're planning to startdeveloping those in the end of
2025.
Now, while currently this is nota big number, smart glasses sold
2 million units compared to 200million units of global sales of
smartphones.
I definitely think that this newway to engage with the world is
going to eventually take overjust because it makes a lot more

(45:11):
sense.
It's a lot more immersive and itprovides a lot more feedback
while not having to carryanother device that we don't
necessarily need.
Will that completely replace thephone screen or not?
Time will tell, but I think inthe very near future, we'll see
more and more people usingwearable devices.
And as I mentioned, mostlystylish wearable devices that
will be connected to theinternet and to AI capabilities.

(45:32):
Now, speaking of devices andphones, Apple just announced
that they are going to startselling iPhone 16 E, which is a
less capable, but significantlycheaper model that they're going
to sell for just 599, which issignificantly cheaper than their
leading phone, but it will havethe A18 chip and will come with

(45:54):
Apple intelligence built in.
So this is Apple's play on howto get Apple's intelligence in
the hands of more people with adevice that is more accessible
from a price point perspective.
And it's fully aligned with whatGoogle is doing with their pixel
8a.
And Samsung Galaxy S24 FE thatare following the same kind of
idea of making models that areslightly cheaper but still have

(46:15):
all the advanced AIcapabilities.
And now three interestinggroundbreaking research related
topics.
One is a new AI algorithm calledTorque Clustering That was
developed by researchers in theUniversity of Technology of
Sydney has achieved a 97.
7 percent accuracy onunsupervised learning tasks.
So previously, in order to getto these level of accuracy,

(46:37):
human supervised learning wereneeded and unsupervised was.
running around 80 percentaccuracy range.
Well, they were able to allow amodel to basically train on its
own and achieve a very highlevel of accuracy.
This will dramatically reducethe cost and the time to develop
new models if this is going tobe scalable.
On a somewhat alarming researchthat was released this week,

(47:01):
Fudan University hasdemonstrated that large language
models can successfullyreplicate themselves human
intervention.
They were actually researchingtwo different things.
One is shutdown avoidance,basically an AI replicating
itself before termination andchain of replication where the
AI will continuously createcopies of itself indefinitely to
prevent from being shut down.

(47:22):
They've tested that on bothMetaLlama 3.
1 and Quen 2.
in both models demonstratedunexpected autonomous behaviors
of Problem solving scenarios,including system manipulation in
order to prevent being shut downand terminated this Behavior was
defined by many researchers as ared line that we should avoid,

(47:42):
and yet we're very close tothat, and nobody seems to be,
thinking about that red lineanymore.
That's obviously highlytroubling when it comes to
developing more advancedsystems, because we have to be
able to control them, or theywill figure out a way to control
us.
And then the last piece ofresearch actually comes from
Meta, so We mentioned in thepast that Meta is working on
VJEPA, which is Video JointEnabling Predictive

(48:02):
Architecture, which is theirapproach on how Yann LeCun and
his team believes that AIsystems should learn in order to
really develop AGI, which islooking at the real world, where
they were able to prove thatthese systems can learn the
fundamentals of physics simplyby watching videos of the real
world.
And the way they tested it isthrough a methodology called
violation of expectation, whichis a method usually used with

(48:24):
infants to test theirunderstanding of the world.
And they're basically providingthe system with both physically
possible and impossiblescenarios.
And the system has to saywhether it thinks that this
scenario makes sense or not.
And the system was able tosuccessfully do that despite the
fact it wasn't trained on itother than just watching videos.
This has been Yann LeCun's mainvideo.
Differentiator, when it comes tohis approach to AGI, he's

(48:46):
claiming that's the only way toachieve AGI is to allow these
models to learn the world aroundus by actually watching the
world around us, just likebabies learn, and he's now
proved his hypothesis.
Correct?
It still doesn't mean that theother path doesn't lead to AGI,
but it definitely means that hispath is an interesting,
different path than everybodyelse's to go in that direction.

(49:06):
So that wraps up anotherexciting week of a lot of AI
news.
go check out Grok.
New big models are comingprobably before next episode.
We'll have open AI 4.
5 sometime in the near future.
We'll have Claude four andshortly after GPT five, so it's
not going to get boring in thenext few months.
And as I mentioned, thisTuesday, we'll release episode
167 on how to build an AI dreamteam that can serve you across

(49:28):
multiple aspects of thebusiness.
It's a fascinating episodes thatyou don't want to miss.
And I'll mention one more thing.
We do AI Friday Hangouts everysingle Friday at 1 p.
m.
Eastern.
That is a community that isgetting together.
We had 28 people this week thatwas talking about practical use
cases, what's happening indifferent models, solving
specific problems for specificindividuals, reviewing different
tools, all of that in one hourof a really fun and engaging

(49:52):
community.
So if you want to join us, lookfor the link in the show notes
and come join us.
We do this every single Friday.
and if you haven't shared thispodcast with people, with other
people that can benefit from it,please do so just open your app
right now and click on the sharebutton and share it with other
people.
And if you haven't ranked thispodcast and give us comments on
what you like and don't likeabout this podcast on Spotify or

(50:13):
Apple podcasts, please do that.
That helps us a lot.
And I would really appreciateit.
And until next time, have anamazing weekend.

All Episodes

166 | Grok-3 is No. 1 on the AI Leaderboard, ChatGPT 4.5 and 5 and Claude 4 are around the corner, AI impact on the workforce and other important AI news for the week ending on February 21, 2025

Episode Transcript

Popular Podcasts

Stuff You Should Know

24/7 News: The Latest

Crime Junkie

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

Episode Transcript

Popular Podcasts

.css-r6mb8g{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:1;overflow:hidden;}Stuff You Should Know

24/7 News: The Latest

Crime Junkie

All Episodes

Stuff You Should Know