All Episodes

September 1, 2025 52 mins

Our 220th episode with a summary and discussion of last week's big AI news! Recorded on 08/30/2025

Check out Andrey's work over at Astrocade , sign up to be an ambassador here

Hosted by Andrey Kurenkov and co-hosted by Daniel Bashir Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai

Read out our text newsletter and comment on the podcast at https://lastweekin.ai/

In this episode:

  • Google's newly released Gemini 2.5 image editing model showcases remarkable advancements, enabling highly accurate modifications of subjects while retaining their original features.
  • Anthropic expands Claude with an AI browser agent for Chrome and adds features to remember past conversations, enhancing the user experience and personalization.
  • NVIDIA and AMD to share revenue from AI chip sales to China with US government, marking a notable shift in export control policies and trade practices.
  • AI companion apps are experiencing substantial growth, with projected revenues expected to reach $120 million by 2025, raising questions about social implications and user engagement.

Timestamps + Links:

Mark as Played
Transcript

Episode Transcript

Available transcripts are automatically generated. Complete accuracy is not guaranteed.
(00:11):
Hello and welcome to the last week in AI podcast.
We can hear chatbot, what's going on with AI in this case.
A bit more like the last month in ai.
Unfortunately, we've had to skip a few weeks.
Jeremy.
Busy as always, with exciting reporting and I don'tknow his natural security work and I've been traveling.

(00:32):
So as I always say sorry for if we missedweeks, we'll try to be back on regular schedule.
Going forward and in this episode we will summarize anddiscuss some of last week's most interesting AI news and a
bit of also the week before you can go to last week in AIfor our text newsletter, which does go out every week, most

(00:55):
weeks for other stuff we are not covering in this episode.
And in this episode, I'm one ofyour regular co-host Andre Rankoff.
Jeremy once again could not make it.
So we have one of our regular guest co-hosts Daniel Bashir.
Hey.
Yeah, I'm Daniel.
You may have heard me on this podcast before.

(01:15):
If you have explored the last week in AI SubstackWorld, you might have also listened to the
gradient, which if you haven't, we have lots ofinterview episodes, which are, I think pretty cool.
Would love for you to check those out.
Yeah, great to be here.
And me and Daniel, were just chatting.
There's this, an idea being floated overreviving the Gradient podcast, which

(01:35):
has been on ICE for a little while now.
So, yeah, last weekend listeners,you might hear some news on that.
In a few weeks we'll see.
But uh, in this episode we'll be covering someprimarily exciting news regarding new tools and
apps, some releases from Google, philanthropicnot any major applications or business stories.

(01:59):
Some pretty cool open sourcestuff, and you know, just a couple.
Notable policy stories not been a super busy monthso far, luckily, and we haven't missed too much.
So starting out in tools and apps, first story hasgot to be the new image editing model by Google.

(02:21):
They have released Gemini 2.5 flash image, which is.
Like by far the most impressive model forediting images that has been available so far.
It was kind of hyped up for a while.
It was being used under udib, nato, banana,and yeah, after a little bit of that, it

(02:44):
was revealed to be, in fact, from Gemini.
And you know, sadly, this is an audio format, soI'll have to describe what you can do with it.
But gist of it is you can very accuratelytake a subject like a person, and then.
Change your clothing of this person,change your posture, change your setting.

(03:07):
And it is very convincingly retaining thefeatures of this person and very successfully
kind of following your instructions about.
What you wanna do, you can combine different images as well.
It's by far beyond anything else We've seen to a pointthat some people are saying Photoshop is in trouble now.

(03:30):
So Still to me, you know, we've had verypowerful models for image gen for a while.
This one is still, you know, next level.
Yeah, and this is also coming kind of just off the heels ofGenie Three, which was released earlier this month also by
Google DeepMind which is really quite impressive as well.

(03:52):
It's got this sort of ability.
For you to look into a world and it's actuallyquite stable and sort of maintains some of
the features of the physical properties.
Like if you look at something, you make someadaptation to a part of the environment, like
painting a wall, you turn away and you turn back.
It's still there.

(04:13):
And you know this.
Is quite interactive with the notion of a world modelthat's pretty hotly debated in AI circles, whether
models have 'em, what a world model is, things like this.
So I'm pretty excited to see more work like thisthat forces us to think about these notions.
Yeah, there's been many fun examplesof things you can do with this.

(04:37):
It's a multi turn, kind of conversational model, of course.
So you can take an empty room and decorateit, you know, paint the walls at a couch at a
table, and the room will be successfully sort ofpopulated without all the details changing, just
having a very specific thing you wanted executed.

(04:58):
And on the not of the world model.
There was a fun example I saw online where someonepointed out that they gave the model an image of a road.
In Dallas or something and asked it to show whatis the opposite view, what's behind the view.
And the model apparently was able to show the view fromthe other side of the same area, which definitely kinda

(05:22):
speaks to this being sort of world model, like being able tounderstand physical properties, locations, things like that.
Next big piece of news on this front is that Enrohas launched a Claude AI agent that lives in Chrome.
And this is pretty much something that you'd expect.

(05:43):
The AI exists in a sidecar window.
It maintains context of browser capabilities.
The agent can perform tasks on behalfof the user, which is pretty exciting.
There's a lot of AI companies out theredeveloping similar AI powered browser solutions.
I feel like this is an interesting direction.

(06:03):
And maybe there is a question out there oflike, what would an AI native browser look like?
How do, how does the interaction forthe interaction designed for that look
different from how we use browsers today?
Is it like this agent and a sidecar the waythat Anthropic is doing right at the moment?
Or is there a world where browsers actuallylook pretty different in a more fundamental way?

(06:26):
That feels pretty unclear, but I thinkwe're in the beginning stages of something
that could be really interesting.
Right.
Yeah.
This is launched via cloud for Chrome, soit's an extension coming pretty quickly.
I think OpenAI launched their agent model maybe a monthor two ago, a little while ago, and that is very similar.

(06:48):
You give it an instruction and it does web stuff for you.
For opening eyes thing it.
Has its own little dedicated environment and it creates itsown browser and sort of does it in the chat GPT interface.
Here you have this plugin for Chrome and you actually useit within the Chrome browser, which is a little different.

(07:09):
And to your point, this is also coming pretty soon afterPerplexity has launched their browser, I think called.
Dia or something like that, that is alsopitching this sort of age agentic browsing stuff.
So.
It's yet another sort of competitive area.
We've seen this with search, with deep research,with every single kind of use case of ai, open

(07:33):
AI and ro and a few hours going head to head.
And yeah, I think these are gonna be pretty powerful.
I have one fun example where I had, a spreadsheet withsome links where you gotta open each link and check
kind of the website check for some quality assurance.
And in the past, I would have to do this myself,click and go look and do this very manual labor.

(07:58):
I. Was able to use strategy agent, like tell it,go to this Google doc, open it, clink on these
links, look at the site, check for these things.
it took like half an hour.
It took a long time to get throughit, but I was able to do it.
So it was gonna be, I think, similar to just the chatbotsthese kind of agentic web browsing agents are gonna.

(08:24):
Be used in a million different waysto speed up all sorts of boring stuff.
And speaking of philanthropic, theyhave another slightly notable update.
The Claude Chat bot can now remember yourpast conversations, so something that's
been available on chat bt for a long time.

(08:45):
You can activate it by going to settings and labelingsearch and reference chats it's interesting to see
Anthropic adding this a long, long time after OpenAIdid OpenAI, as far as I remember was must have been
last year when it started remembering details from yourconversations to sort of personalize it to each user.

(09:07):
And that speaks, I think, to OpenAI having had amore consumer oriented focus and philanthropic.
Targeting much more of a code andenterprise and business people.
But yeah, I'm sure this makes it a more compelling offering.
Yeah, there's a coupling with another storyhere that Google's Gemini will also be more

(09:30):
personalized by remembering details automatically.
As with other chatbot offerings, you have options liketemporary chats where you can have private conversations.
It won't be saved to use for personalization or AI training.
My take on this, and you can kind of see ifyou look at the pieces we're referencing here.
That Anthropic has taken a slightly differentpattern where they will only use chat memories if you

(09:54):
prompt them to, and I think that we're in the prettyearly innings of what memory and personalization
are going to look like for these systems.
I think that there are a lot ofdifferent contentious issues.
That come up here.
I think that memory in its current form is not perfectin any of its implementations, and I do think that

(10:16):
there's gonna have to be a lot of considerations andhard work on how does the interaction pattern it affords
right now, how does that make a difference to themodel behavior in a way that's relevant to different
things users care about and what sorts of principles.
Should we have about how that evolves?

(10:37):
Again, feels like a very early discussionas these things are just beginning to be
ruled out, but something to pay attention to.
Yeah, and I think this is interesting to me.
As a topic because I think this, I've started realizingfirst of all the magnitude of Chad GPT usage, right?
We, we've covered how we have 700 million active users.

(11:02):
And you know, in my mind I was sort of assumingor even not imagining how people are using it.
Like I use it for work.
I use it to like help brainstormand write some code and whatever.
But many people use it in many different ways.
Some people use it as like a therapist or like a life coach.
Some people use it just to talkto and think through problems.

(11:25):
So for people who do.
Talk to chat GPTA lot, like talk to it.
That kind of memory feature Ithink probably matters a lot more.
And so Gemini launching in particular whereGemini is becoming, I think the main competitor
to chat GPT as far as chatbot's, peopleactively use that could matter quite a bit.

(11:46):
And speaking of Gemini, there is another launch from Google.
They launched guided learning, which isavailable within Gemini and is designed to
teach rather than simply answering questions.
So it's meant to.
Have you learned things?
Have you built a deep understanding, helpyou work through problems step by step.

(12:12):
All that sort of stuff.
Again, we keep saying this, I find it interesting.
This is happening very, very soon.
After chat, GPT launched study mode.
We know that all of these servicesare used heavily by students.
There's no, I don't know how, what percentof high school students and college
students aren't using JGBT at this point.

(12:32):
It must be in the below digits.
So, makes a lot of sense for this tolaunch and hopefully these sort of study
oriented things will make it so students.
Actually try to learn as opposed tojust have the AI do the work for them.
Yeah, I hope so too.
I think there's a really interesting set of questions here.

(12:55):
Some of them are around how do we ask people tostill do the hard and effortful work that is learning
and developing a deep understanding of things.
Because I think that to really.
Cause the sorts of changes in your brain and timeyou need to mull over something to really get

(13:15):
it and have deep intuition and understanding.
There just isn't a shortcut to that.
I think that the way our education systemswork, there's different forms of legibility.
That we have that indicate what it looks likefor a student to have attained mastery or
to have a deep understanding of something.

(13:36):
And I don't think it's news to anybodythat these forms of legibility are pretty
imperfect and don't always indicate that.
And increasingly they can be gamed.
And when you're a student, you lose out on something.
You lose out on this, not just.
Generalization of understanding that might come tomatter later on, but a sort of satisfaction you might

(14:03):
get personally from deeply understanding something insuch a way that it might intellectually stimulate you.
What make you want to consider different pathsor things like this later on in your life
and so that deep effort work looks or feels.
Quite important also just for the development of a person.

(14:27):
This is getting too long, so far, so I'll stop there.
I mean, it's a,
it's a deep topic, I think, and it's very interestingto consider how if you're on the younger side,
you know, starting to grow up and you wouldn'tremember a time before AI before chatbots.
Like your experience will be very different from ourexperiences where we had the internet at least, but

(14:51):
like, we had no idea to learn, but it was very different.
Yeah.
Oh,
Moving on to a couple stories about OpenAI.
We've had what now?
Philanthropic Google as the mainguise of this section so far.
Next we have news that Apple intelligence willbe integrating with GT five, starting with iOS.

(15:11):
26. So, Siri already integrates with OpenAI, GP fourO. as far as I know, like you ask Siri a question and
then it decides to pass forward that topic to Chad, GPT.
And so perhaps not surprising thatthey're gonna be upgrading it to.
GPT five relatively soon, but does speak to the kindof continued partnership between Apple and OpenAI.

(15:39):
one more piece of news on OpenAI,they are adding new features.
To Codex their coding assistant.
So they're introducing an IDE extension, anextension to the kinda standard coding tool,
which is also something that cloud code has.
It is introducing GitHub code reviews.

(16:00):
Yeah, generally kind of expanding the feature set.
Of their cloud code competitor.
And this is, I guess for for non-programmers, this mightnot be very exciting, but I think Cloud Code has really
seemingly made huge impact in the programming world.
And these kind of agentic coder tools are prettyrapidly being adopted and making a big shift.

(16:26):
So OpenAI managing to compete, managing to.
Get some user share with Codex as you know, for arare occasion kind of entering this space later.
Philanthropic it's a pretty significant struggle.
Yeah, a lot of stuff going on in the coding world rightnow, as you've seen from the many startups involved in this.

(16:50):
Our applications and business story for todayis also about a startup, sort of in this space.
It's about a company called Lovable, which TechCrunchrefers to as a vibe coding startup, and, if you haven't seen
lovable before, basically it's used to create full stack webapplications and websites, so that's the specific area that

(17:11):
they're in, and they are projecting some pretty big numbers.
They're aiming to achieve $1 billion.
Annual recurring revenue within thenext 12 months, which is quite soon.
And it's currently growing that a RRby at least $8 million each month.
It's already surpassed a hundred million in a r justeight months after reaching its first $1 million,

(17:35):
which again, goes to show obviously many of thesecompanies have lots and lots of spend, but the kind of
user and revenue growth that they can experience is.
Quite on a different level fromwhat we've been seeing before.
Yeah.
Lovable has been kind of a clear winner so far in thisentire space and they did launch quite a while ago, so they

(17:58):
pretty much took off this year as AI got good enough to.
Be usable basically without knowing code about readingCode Lovable is one of these very user friendly.
You know, I don't know if theyallow you even to see the code.
There are some competitors like Repli, which are morefriendly to technical users that expose much more kind

(18:20):
of techie stuff, and it's a very busy space, as you said.
So rep plate is one competitor.
There's also Bolt there's V zero from cel.
Base 44.
There's like at least 10 significantplayers at this point, I think.
And yeah, it's.
probably gonna be a major market, assumingthe economics of it start working.

(18:45):
'cause I think the speculation is thesecompanies are acquiring all this revenue.
By burning through cash and not eventrying to be profitable at this point.
And speaking of big numbers the next one is aboutraise Decar, the company that we recently covered as

(19:05):
having launched this Real time sort of filter, realtime video to video model that was very powerful.
You can give it like, normal stream of a Regular kind ofworld normal video, and it can turn it into GTA or I don't
know, Simpsons or any sort of art style with real timestreaming, which would mean that if you're like playing a

(19:28):
game, it can completely change the art style, for instance.
Or you can even have a very low res gameand then make graphics whatever you want.
So they have raised a hundred millionand they have now hit a $3.1 billion.
Evaluation and that's pretty significant.

(19:50):
Like there's no large set of users for this yet.
And this entire idea of streaming video to video their modelMirage, LSD yeah, again, still sort of at a preview stage.
So investors seem to be pretty optimisticon this having a lot of potential.

(20:12):
Yeah.
It's one of those where it feels quiteearly to say anything substantive.
We have another story here that's also abouta pretty big raise and the company you've
surely heard about before that is not too new.
Cohere has raised $500 million from investorswith a new valuation of $5.5 billion.

(20:35):
Lots of different players involved here.
COHEs, hoping to use those funds for accelerated growth.
They plan to expand their technical teamsand developing enterprise AI solutions.
Again, unlike many AI startups, cohere is less focused onconsumer applications and much, much more on customizing
AI models for enterprise clients like Oracle and Notion.

(20:58):
Hoping to develop this sort of cloud agnostic AI platform.
So this is again, a pretty different approachthat some of these labs are taking with their
technical talent, where they are trying to look atdifferent enterprises and businesses, thinking about
how can AI be useful for your sort of vertical?
And you're seeing both sort of general versionsof that, like cohere, but then also ones that want

(21:21):
to develop deep expertise in a very specific area.
Next up, We have a story about pony AI notactive in the US is aiming to roll out.
To the European market.
So this report from Bloomberg is kindof saying that this is their aim.

(21:42):
So far, apparently they've already rolled out 200 Genseven Robox vehicles just over the past two months.
They're aiming to get to a total of a thousand vehicles.
And this is notable because in the US.
We've definitely seen a speed up ofcompetition and deployment of Robox.

(22:04):
Is this year in particular, Waymo is entering new markets.
Tesla's robox service just launched and isalso at least aiming to expand rapidly and
it's very clearly going to be a huge deal.
Like this problem is starting to be at the point whereit's solved, where robot taxis are quite reliable.
People seem to prefer them to Ubers in general from.

(22:26):
What I've seen in discussions, so pony AI beinganother significant player coming from China has the
potential to really break into the European market.
And if that's the case, that's gonna be a big deal, right?
Last story on applications is about anotherbig lab and a bit of changing of the guard.

(22:48):
Igor Baskin, who is a co-founder of ElonMusk's, XAI and who I recognize as having
some kind of bird editors X profile photo.
I dunno if it's a bird.
I can't remember if it as wings,but it's a memorable pro profile photo.
Anyway, besides a point he has announced his departurefrom XAI to start his own venture capital firm.
The Boost Skin Ventures, which will focus insupporting AI safety research and backing startups

(23:14):
that aim to advance humanity and explore the universe.
This was inspired by a discussion with Max Techmarkabout building AI systems safely for future generations
and is also following several scandals at XAI,involving the chatbot grok, which included controversial
responses and inappropriate content generation.

(23:34):
Many of you, if you are extremely online or spendbasically any time on X, probably remember the
GR four release and what happened around then.
Yeah, it's been a tumultuous few months for XAI, to be sure.
A lot of impressive results with Rock four launch.
Just very impressive L-L-M-X-A-I in general sincelaunching, I think towards VAN of 2023 since the team

(24:02):
coming together just caught up incredibly rapidly.
It'll be fun to speculate if thismeans that XAI is not doing so well.
Typically you don't see people departing fromstartups I've co-founded in less than two years.
But here obviously it's hard to say if Aush canjust wanted to go off and start this venture.

(24:24):
Initiative or if it indicates anything aboutXAI internally, but still significant to
have a shake up in leadership in general.
And XAI.
It's in an interesting time in its life, so,
Moving on to projects and open source.
First, we have an open source release from Meta ai.

(24:45):
This was, I think from a couple weeks ago.
The release is Dyno V free.
A state of art vision model trained withsupervised self supervised learning, which is
able to generate high resolution image features.
So basically it allows you to process.
Any given image and output a representation of it that'suseful for all sorts of stuff and that you can use to,

(25:13):
for things like object, attention, sematic segmentation,video tracking, et cetera, about any fine tuning.
And this is a pretty large model.
It's has 7 billion parameters, which is unusually large for.
Just pure image models trained on 1.7 billion images.
This is very much just like taking the imageprocessing model to the biggest, place it's been.

(25:39):
we don't talk too much about just pure imagemodels for things like semantics and mutation,
OB object, object detection, video tracking.
These are like semi solve problems at this point.
Used to be like.
A decade ago, these were significant tasks in computervision, but it's pretty important to remember that I
think as far as using ai, applying it, object detection,segmentation, just general video understanding and

(26:05):
image understanding tasks are pretty significant.
So having.
A really cutting edge model that is free for academicuse, that has a commercial license as well, comes with
a lot of code could be very useful for certain people.
Yeah, these sorts of models clearly have prettyimportant impacts out there in the world.

(26:29):
For this specific model, a few orgs likethe World Resources Institute and NASA's Jet
Propulsion Laboratory have been using it.
This has also improved the accuracy of some prettyspecific tasks like forestry monitoring, supportive
vision for Mars exploration robots, and the fact thatyou can do this with minimal compute overhead and you

(26:49):
don't have to rely too much on web captions or curation.
So that.
You are able to sort of apply this universalfeature learning when you're bottleneck by
annotation is a really good advancement, I think.
Next up we have a specific setof foundation models, GLM 4.5.
This is an LLM with 355 billion parametersdesigned to excel ingen reasoning, coding tasks

(27:16):
employees, a mixture of experts architecture.
Which is pretty familiar to a lot of people who spendsome time in ML research, but basically lets it select
different subsets of its parameters for different tasks,which is quite good for efficiency and performance.
What this also means is when you hear the numberof parameters in the model, that's not quite
the same as the effective number of parameters.

(27:37):
So the number of parameters that are actually beingused when the model makes an inference about something,
and the training is sort of multi-stage here.
It pre-train on a diverse data set.
This is followed by fine tuning onspecific tasks, improve its capabilities.
Nothing too crazy here.
There's RL thrown in the training processespecially when it's working on decision

(27:59):
making, problem solving sort of tasks.
Just a pretty interesting model.
Yeah it's kind of interesting.
We have a figure here, figure free, andThere's pre-training on a general corpus, then
pre-training on a code and reasoning corpus.
Then there's mixed training, which has precepts,repo level code data, synthetic reasoning, and long.
Context and agent data, and then there's RL and stuff.

(28:23):
So there's a lot going on and this is verymuch following in the footsteps of R one.
I want sort of introduced, I think this approach, at leastin terms of published research, of having these multiple
kind of stages for training gentech and reasoning models.
And the notable thing about this model, aside frombeing big, is they are doing quite well, like they're

(28:48):
claiming on the benchmarks to be beating Opus four,to be up there with oh three and Rock four, almost to
be quite performant at a smaller number of parameters.
So, you know, 353 billion parametersis a lot, but it's less than deep Seq.
R one is less than K two.

(29:10):
On coding tasks, they are similar on the benchmark front.
So very much a continuation of a trendyou've seen all throughout this year of
open source models coming out of China.
Starting with R one and really.
Proceeding ever since that are getting betterand better, that are getting really on par with
the closed source offerings from philanthropicand OpenAI for many things which is new, right?

(29:37):
Like until this year you could not get an open source LMthat was anywhere near competitive with Claude or JGPT.
Now that's different.
And speaking of open source releases fromChina, next story is about deep seek releasing.
Its V 3.1 model.

(30:00):
So this is you know, a bump inthe version as probably the title.
It has a longer context window andnot like any sort of substantial.
Jump in any sense, but I think notable to see Deeps seecontinuing to release and continuing to update the R one
model sort of incrementally and still being competitive.

(30:24):
Although apparently Deepsea fans are waiting for therelease of R two, which would be the successor two R one.
So, this is kind of leading up to that.
And speaking of open weight LLMs, we have kindof an interesting story about the overall market.
So, artificial analysis did a benchmarkevaluating the performance of GPT or assess 120

(30:50):
B, the recent open source release from open ai.
And they evaluated the performance.
Of this model across different providers on the cloud.
So you can run these open source models throughvarious companies like Cerebra Fireworks,
deep Infra Together, AI Rock, Amazon Azure.

(31:11):
A bunch of them.
And the funny thing that they found in this ison a particular benchmark, Amy, they have very
different outcomes across the different providers.
So on some of them, Cerebra, nebulus,dp, infra, they get very high score, 93%.

(31:31):
Then you go to Grok, Amazon, Azure, they go down by 10%,maybe even more than 10%, which speaks to hard to say what?
These providers are doing, arethey like making smaller versions?
Are they quantizing, are they using different hardware?
But definitely a surprising result.
You would think that if it's the same model and allthese people are serving it, letting you use it via

(31:56):
their hardware, you expect roughly the same performance.
But apparently that's not the case.
Our last story on this front is an open sourcetext to speech model from Microsoft called Vibe
Voice 1.5 B. This is capable of generating up to90 minutes of speech with four distinct speakers

(32:18):
supporting cross lingual synthesis and singing.
It's primarily trained on English andChinese and available under an MIT license.
There's.
A decent amount of work goingon right now in audio synthesis.
And I think that this is like a pretty exciting advancement.
Like 90 minutes of speech is quite a long time.

(32:38):
I think there's still questions about generalcoherence of the audio over that stretched
period of time, but it does seem as though,again, we're making pretty quick advancements.
Yeah.
And this is one of these notable things where audio ingeneral, historically, has kind of lagged behind in the

(33:02):
open source front in terms of data sets, in terms ofmodels, is just this kind of area where you don't have
as many options as, for instance, image generation.
So having powerful text to speech means that.
On the one hand, as a company, you can useit, fine tune it for various applications.

(33:22):
On the other hand, we know now that peopleuse these kinds of things for scams and so on.
And that would just mean that, you know, you have to reallybe on lookout whenever you hear someone in audio these days.
Like it's at to a point where you cannot tell thedifference between AI generation and actual recorded audio.

(33:43):
And on to research and advancements.
Just a couple of stories for this episode.
The first one is deep Think with Confidence,which is a new approach that basically makes test
time scaling more efficient and more effective.
So they're looking at the type of testtime, scaling where you wanna do several.

(34:09):
Parallel reasoning paths.
You wanna have a model, try to solve the problemmultiple times and get to different results.
And then you might take sort of a majority output ora combined output of your various reasoning traces.
And this paper introduces a fairly straightforward idea.
So it's titled Deep Think With Confidence.

(34:30):
As you are doing your rollouts of different reasoningpaths towards getting to an answer, you can evaluate,
roughly speaking, the confidence of a model in terms ofthe kind of prediction what they call token confidence,
which is looking at the probabilities of the tokens, it'sactually outputting and they also defined an average.

(34:54):
Trace confidence that they call self certainty.
And basically they evaluate thisthing as you roll out the model.
And if you have low confidence, they kill the run.
They kind of stop it.
So you end up being able to do many parallelruns, cut off ones that seem unpromising.

(35:16):
Then if you get to high confidence, you're nowable to combine these results from multiple models
and get to a combined kind of confident output.
And in benchmarks, they show that.
With this method, they're able toimprove performance pretty substantially.
Able to improve performance you know, by 10%getting on some of these benchmarks like Amy.

(35:44):
A couple percent boost forG-P-T-O-S-S 5% boost for deep seek.
Basically making it so for things where you're not reliablygetting an output necessarily, you are now gonna get
more significant ratio of getting to the right answer.
And yeah, it speaks to, I think a place wherewe are with reasoning and test time scaling.

(36:06):
There's a lot of.
Logging fruit probably in this wholearea of test time, scaling in terms of
ways to do it more reliably, efficiently.
This one is a fairly straightforwardalgorithm method that can be applied widely.
while we're thinking of test time scaling anda lot of these improvements, maybe a natural

(36:27):
question is to ask what happens to jobs?
And as it happens, a couple of days ago, a Stanfordstudy found that the adoption of generative AI is
significantly affecting job prospects for youngUS workers, particularly those age 22 to 25.
And this came out quite recently andthere's been a lot of commentary.

(36:48):
I would actually recommend taking a look at NoahSmith's recent blog on this specific paper, and also,
of course, reading the paper itself because it's worthtrying to understand and contextualize those claims.
But just to get up a bit on a soapbox about this paper,I feel like despite the fact that the people who wrote

(37:09):
this paper are pretty careful economists and like verydeserving respect, it does feel like this finding is a bit.
Of a specification search.
As job markets rise and fall, there's always some groupof people who are doing worse than the rest, and it's
a little bit unclear that it is always justified to tiethis to, there's a new technology on the block, like ai.

(37:33):
What's worth saying is that.
Sure it's possible that AI is impacting job prospectsto some effect, but it's a little bit hard to
disentangle this entirely from other economic factors.
One really great thing that Noah Smith doesin this post where he takes a look, is he.
Looks at the data about how AI exposure relatesto job prospects for people at different ages.

(38:00):
And This is specifically for people who age 22 to 25.
But the workers who were in their thirties, forties,fifties, who were judged to be most heavily exposed to AI
actually have seen robust employment growth since late 2022.
And.
You can maybe score this back withthe story about AI destroying jobs.
But again, it's kind of unclear like, whywould companies be rushing to hire new

(38:23):
40-year-old workers in AI exposed occupations.
Again, just a lot of question marks here.
Six Facts About the Recent DeploymentEffects of Artificial Intelligence.
So they're examining the effects of AI on labormarket, unemployment on people being able to get jobs.
The first fact is they uncover substantial declines inemployment for early career workers age 22 to 25, as we

(38:49):
say, in ACC accumulations, most expo exposed to ai, such assoftware developers and customer service representatives.
Second key fact is that overall employmentcontinues to grow, but employment growth for
young workers has been s stagnant since late 2020.
Third fact is not all uses of AI areassociated with declines in employment.

(39:14):
Fourth, they find that deployment declines for theseworkers remain after conditioning on firm time effects.
So they, they do try to be careful, as yousaid, this is analysis from labor data.
We are not doing experiments here.
We're just looking at variousstatistics and trying to conclude.
What ai effect may have had.

(39:36):
So they try to account for these otherfactors that could explain statistics.
Fifth, they say that the labor market adjustmentsare visible in employment, more of a compensation,
and six, the above facts are largely consistentacross various alternative sample constructions.
So, as you said, like.

(39:57):
economics research is tricky.
There's no careful experimentation going on here.
They are working with data thatcan have various interpretations.
Like in the case of software development, for instance,which is one of the major areas where employment has
been much harder for early careers professionals.

(40:19):
Obviously there's many factors going on during COVID.
There was arguably Overp employment.
Many of the big tech companies really hired like crazy,and then there was a large amount of layoffs going on
in software development over the last couple years.
There's economic conditions, all sorts of stuff.

(40:40):
So this is a very early piece of research andthey do, to be fair, kind of position it as such.
They call this IES in the coal mine to indicatethat this might be a sign of what's happening, but
it's kind of still early and it's hard to tell.
But as far as analysis, as far as sort of actual.

(41:03):
Research that is able to tell usanything about employment with ai?
To my knowledge this would be the first sort ofmajor work or obviously I'm not an economist.
Maybe there's been some prior research onthis, but this is coming from a Stanford
group that is pretty oriented, honest.
One of the lead offers is.

(41:25):
Eric Json, who has done previousresearch on AI and economics.
So as you said, Daniel, if you find this interesting,probably we have to follow up and see some more
deep analysis and possible interpretations of us.
Yeah.
Our next story is in the policy and safety space,and this one's actually really interesting about an

(41:50):
unpublished report on AI safety from the US government.
Back in October, a red teaming exercise was conducted ata computer security conference in Arlington, Virginia.
Where AI researchers dress tested some advanced AIsystems, they identified 139 novel ways these systems

(42:14):
could misbehave, like generating misinformation,leaking personal data, things like this.
The key upshot of that exercise was it revealedsignificant shortcomings and a new US government
standard designed to help companies test AI systems.
But the National Institute of Standards ofTechnology didn't publish a report on those findings.

(42:34):
The reason for that, according to some sources wasthat along with other AI documents from nist was
withheld because there were some concerns aboutconflicting with the incoming administration's policies.
Wired now has this unpublished report.
And I guess one of the key takeaways here is that thisis an area that feels like it should be nonpartisan

(42:58):
and ideally not too influenced by politics, butit seems like there have been challenges faced in
publishing AI research under the Biden administration.
So just an interesting story in terms ofthe confluence of politics and AI safety.
Right.
This is a report from nist, the National Institute ofStandards at Technology, which was tasked with this kind

(43:22):
of thing with creating standards and technology for ai.
They created this.
NIST AI 600 dash one framework to assess AI tools.
This is an artificial risk management framework,general artificial inte intelligence profile.
So this tmy exercise basically was toevaluate this framework that they published

(43:47):
or like a year ago, I think mid 2024.
So, probably, yeah.
Not too surprisingly know The Trumplegislation reversed biden's actions on AI
recently published their own agenda on ai.
And it's very likely that these kinds of AIsecurity initiatives are gonna see less interest,

(44:07):
less promotion with current administration.
And another story about the US governmentand a kind of a surprising one.
The US government is going to take a cutof NVIDIA and A MD AI, chip sales to China.
So we have talked quite a lot about export controls, aboutrestrictions for Nvidia to be able to sell GPUs to China.

(44:32):
It's been a very evolving area and a Trump administration.
There was a time where the age 20 chip, which fora long time was the one that Nvidia sold to China
specifically, was blocked suddenly from being sold.
And so, this is kind of reversing that nowNvidia apparently is able to sell the age 20

(44:54):
again, but will have to pay via government.
So Jeremy unfortunately would be the guy who would give themost insight on this development, but seems a bit surprising
as far as kind of the approach to export restrictions.
moving on to something unrelated tothe government, going to another topic.

(45:18):
We've talked about quite a lot lawsuits ongoingabout copyrights for the major LLM providers.
So philanthropic has settled a high profileAI copyright lawsuit brought by book.
Offers.
So this was initiated by offers, Andrea Bards,Charles Grabber, and Kirk Wallace Johnson, who

(45:43):
accused Andro of using their books without permission.
There was some.
Let's a conflicting developments in hereCalifornia District George Ruled could use books
with fair use, but found that the acquisitionmethod via shadow libraries constituted piracy.

(46:05):
And this is just one of multiple, lawsuits ongoing.
Basically for years now, that would have majorramifications about basically how you can
use, how you can acquire data for training.
AI models, open AI and philanthropic, and otherskind of took the maximally permissive approach of

(46:28):
using a bunch of data without asking any permission.
And so this settlement hard to say asnon-lawyers how significant of a effect
it has on other ongoing law developments.
But does Mark one kind of piece of progressin this long ongoing story where, you know,

(46:53):
at least this lawsuit has reached an end.
Our last story is about AI companion apps, whichare on track to pull in $120 million in 2025.
In the first half of the year, theseapps already generated $82 million.

(47:14):
With downloads of up by 88% year over year,reaching $60 million, and the top 10% of these
apps account for 89% of the revenue with 33 appssurpassing $1 million in lifetime consumer spending.
The popular ones into space includereplica character, ai, poly buzz chai with.

(47:37):
A significant portion of users seeking AI girlfriends.
You may have also seen commentary on Twitterabout AI boyfriends being very popular.
This is a really interesting hairy spaceto me because I think it represents, or.
It sort of portrays something pretty fundamentalabout the kind of companionship that people

(48:00):
seek and are willing to accept and the differentways in which it can be met and not met.
Personally I find AI companions a bit troubling.
For numerous reasons, but I won't getup on the soapbox without it here.
Yeah, well, I did include it in the policyand safety section very much because it has
pretty let's say concerning or significant,implications for society, for people's psychology.

(48:28):
We know, in the modern age there'sbeen a very much degradation.
In amount of socializing and the amountof close connections people have.
It's.
Arguably one of the major health crises of the modern age,like people's ability to have friends and close connections.
And so this market growing significantly, gettinga lot of revenue according to this support.

(48:53):
There have been 112 apps published just in the firsthalf of 2025 with the names of those apps having.
Girlfriend in 56 of them.
Fantasy boyfriend, anime, soul soulmate, lover, wi Fu.
A lot of, yeah, clearly romantic interaction apps.

(49:13):
And it's coming also in thiscurrent paradigm of dating apps.
I think the general consensus is it's hard and unenjoyableprocess to try and find a human girlfriend or soulmate, so.
I mean, it's a little concerning.
I think it's fair to say, on the one handyou can treat it as a video game, as like

(49:37):
a role-playing exercise as a fun thing.
By the way, character ai, one of the playersin the space for a while, which isn't focused
on girlfriends, which is general roleplay,still has millions of monthly active users.
Like this is a very big space.
So it's.
Likely to keep growing.
I mean, X AI recently launched Ani andlike their own rock based companions.

(50:01):
I don't know.
It's an interesting phenomenon for sure.
And, fun fact the movie Her, which was all about this thingdirected by Spike Jones, where the main character played by
Phoenix Falls in love with an AI character, said in 2025.
Lots of people are saying that this movie wasincredibly prescient, and I think that's fair.

(50:24):
If you haven't seen her, I highly recommend.
Well, that is it for this episode.
As I've said, hopefully we are goingto back, get back to a weekly schedule.
Thank you Daniel for fulfilling the guest co-host duties.
Always fun to have you on here.
Thanks for having me.
I always really love doing this.
And thank you to the listeners.

(50:45):
As always, we appreciate you tuning in and bearingwith us as we skip some weeks at an unpredictable rate.
Always appreciate it if you leave reviews,if you share it with friends and more
than anything if you just keep tuning in.
Advertise With Us

Popular Podcasts

On Purpose with Jay Shetty

On Purpose with Jay Shetty

I’m Jay Shetty host of On Purpose the worlds #1 Mental Health podcast and I’m so grateful you found us. I started this podcast 5 years ago to invite you into conversations and workshops that are designed to help make you happier, healthier and more healed. I believe that when you (yes you) feel seen, heard and understood you’re able to deal with relationship struggles, work challenges and life’s ups and downs with more ease and grace. I interview experts, celebrities, thought leaders and athletes so that we can grow our mindset, build better habits and uncover a side of them we’ve never seen before. New episodes every Monday and Friday. Your support means the world to me and I don’t take it for granted — click the follow button and leave a review to help us spread the love with On Purpose. I can’t wait for you to listen to your first or 500th episode!

Stuff You Should Know

Stuff You Should Know

If you've ever wanted to know about champagne, satanism, the Stonewall Uprising, chaos theory, LSD, El Nino, true crime and Rosa Parks, then look no further. Josh and Chuck have you covered.

Dateline NBC

Dateline NBC

Current and classic episodes, featuring compelling true-crime mysteries, powerful documentaries and in-depth investigations. Follow now to get the latest episodes of Dateline NBC completely free, or subscribe to Dateline Premium for ad-free listening and exclusive bonus content: DatelinePremium.com

Music, radio and podcasts, all free. Listen online or download the iHeart App.

Connect

© 2025 iHeartMedia, Inc.