Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
(00:10):
Hello and welcome to thelast week in AI podcast.
We can hear us chat about what'sgoing on with ai As usual.
In this episode, we'll be summarizingand discussing some of last
week's most interesting AI news.
You can go to the episode descriptionfor all of the links to the news we're
discussing and the timestamps so you canjump to the discussion if you want to.
(00:32):
I am one of your regular hosts, Andre ov.
I studied AI in grad school and I nowwork with at a generative AI startup.
And I'm, you hear me typing here just'cause I'm making final notes on what
is an insane We can, if we do our jobsright today, this will be a banger of
an episode if we do our jobs wrong.
It'll seem like any other week.
(00:53):
This has been insane.
I'm Jeremy, by the way.
You guys all know that if you listenGladstone, ai, ai, national Security,
all that jazz this is pretty nuts.
Like we were talking aboutthis I think last week where.
You know, we're catching up on twoweeks worth of news and we were talking
about how every time we miss a weekand it's two weeks, inevitably it,
(01:13):
it's like the worst two weeks to miss.
And the AI universewas merciful that time.
It was not merciful.
This time, this was an insane,again, banger of a week.
Really excited to get into it, but.
Man, is there a lot to cover?
Exactly.
Yeah.
There hasn't been a week likethis probably in a few months.
You know, there was a, a similar weekI think around February where a whole
(01:36):
bunch of releases and announcements werebunched up from multiple companies and
that's what we're seeing in this one.
So just to give you a preview, themain bit that's exciting and very
full is announcements concerningtools and sort of consumer products.
So Google had their IO 2025presentation, and that's where
(01:59):
most of the news has come out of.
They really just went on the attack, youcould say, with a ton of stuff either
coming out of beta and experimentationbeing announced, being demonstrated.
Et cetera, and we'll begetting into all of that.
And then afterward, philanthropicwent and announced cloud four and
(02:22):
some additional things in addition tocloud four, which was also a big deal.
So those two together made fora really, really eventful week.
So that'll be a lot ofwhat we were discussing.
And then in applications as business,we'll have some stories related to OpenAI.
We'll have some interesting researchand some policy and safety updates
(02:46):
about safety related to these newmodels and other recent releases.
But yeah, the exciting stuff isdefinitely gonna be first up and
we're just gonna get into it.
First into and apps is Claude four.
Maybe 'cause of my own biasabout what's being exciting.
So this is Claude Opus fourand Claude Sonnet four.
(03:09):
This is the large and medium scalevariant of Claude from Tropic.
Previously we had CLA 3.7I think for a few months.
Cloud 3.7 has been aroundbut not super lengthy.
And this is pretty muchan equivalent update.
There'll be costing thesame as the 3.7 variance.
(03:33):
And the pitch here is that they'rebetter at coding in particular
and better at long workflows.
So they are able to maintain focusedeffort across many steps in a workflow.
This is also coming pairedwith updates to cloud code.
(03:56):
So it's now more tightly integratedwith development environments
coming with an SDK now.
So you don't have to useit as a command line tune.
You can use it programmatically.
And related to that as well, bothof these models, opus and sonnet,
are hybrid models, same as 3.7.
(04:16):
So you can adjust the reasoningbudget for, for models.
So I guess qualitatively not anythingnew compared to what philanthropic has
been doing, but really doubling downon the agentic direction, the kind
of demonstration that people seem tobe optimizing these models for the
(04:39):
task of like give a model some workand let it go off and do it and come
back after a little while to see whatit built with things like cloud code.
Yeah, the two models that arereleased, by the way, are, are Claude
Op Opus Four and Claude Sonnet four.
Note the Slight Change yetagain in naming convention.
So it's no longer Claude fourSonnet or Claude four Opus.
(05:01):
It's now Claude Sonnet four, ClaudeOpus four, which I personally like
better, but hey, you do you a lotof really interesting results.
Let's start with SWE Bench, right?
So Thiswe Bench verified the sortof like software engineering,
pseudo real world tasks benchmarkthat really opening AI polished up.
But that was anyway, developed alittle while ago in the first place.
So opening Eyes Codex One, which I'mold enough to remember a few days
(05:25):
ago when that was a big deal wasjust for context hitting about 72%,
72 point 0.1% on this benchmark.
that was like really quite high.
In fact, for all of 20seconds it was soda.
Like this was a big deal,like on whatever it was.
Tuesday when it dropped.
And, and now we're on Friday, no longer abig deal because sonnet four hits 80.2%.
(05:50):
Now going from 72 to 80%, thatis a big, big jump, right?
You think about how much, there'snot that much more left to go.
You've only got 30 more percentage pointson the table, and they're taking eight of
them right there with that one advance.
Interestingly Opus four scores79.4%, so sort of like comparable.
I. Performance to sonnet on that one.
(06:10):
And we don't have much informationon the kind of Opus four to sonnet
four relationship and how exactlythat distillation happened if
there was sort of extra training.
Anyway, so that's kind of a anotherthing that we'll probably be learning
a little bit more about in the future.
But and this is the numbers v like upperrange with a lot of compute that, you
(06:30):
know, similar to O three for instance,from OpenAI, when you let these models
go off and, and go work for a whileand not kind of a more limited variant.
E Exactly.
That's a really good flag.
Right.
So there's a, a range with inferencetime compute with test time compute
models where yeah, you have the,kind of like the lower inference
(06:51):
time, compute budget score, whichin this case is around 72, 70 3%.
And then the high inferencetime, compute budget, which is
around 80% for both these models.
Again, contrasting with codexone, which is sitting at
72.1% in this, in this figure.
And they don't actually indicate whetherthat's low compute mode or high compute
mode which is itself a bit ambiguous.
(07:11):
But in any case, this isa, it's a big, big leap.
And this bears out in the qualitativevaluations that a lot of the folks who
had early access have been sharing on X.
So, you know, you know, make, makeof that what you will, all kinds
of really interesting things.
So they've figured out how toapparently significantly reduce
(07:31):
behavior associated with getting themodels to use shortcuts or loopholes.
Big, big challenge with Codex One.
A lot of people have beencomplaining about this.
It's like, it's too clever by half, right?
The O three models have this problem too.
They'll sometimes like findjanky solutions that are a bit
dangerously creative, where you'relike, no, I didn't mean to have
you solve the problem like that.
Like that's a, that's kind of,you're sort of cheating here.
(07:54):
And and, and other things wherethey'll tell you they completed
a task but they actually haven't.
That's kind of a, a thing I found a littlefrustrating with especially with O three.
But so this model has significantlylower instances of that.
Both models, meaning Opus four andSonnet four, are, they say 65% less
likely to engage in this behaviorthan sonnet 3.7 on age agentic tasks
(08:17):
that are particularly susceptibleto shortcuts, end to loopholes.
So this is, pretty cool.
Another big dimension isthe memory performance.
So when developers build applications thatgive Claude local file access, Opus four
is really good at creating and maintainingmemory files to store information.
So this is sort of partial solution to theproblem of persistent LLM memory, right?
(08:40):
Like you can only put somuch in the context window.
These, these models are really good at,at building, like creating memory files,
explicit memory files, so not just storyin context, but then retrieving them.
So they're just really good ata kind of implicit rag, I guess
you could, you could call it.
It's not actual rag, it's just they're,they're that good at, at recall.
there are a bunch offeatures that come with this.
(09:01):
As with any big release, it'slike this whole smorgasborg of
different things and you gottapick and choose what you highlight.
We will get into this, but someof the most interesting stuff here
is in the Claude for System card.
And I think, Andre, correct me ifI'm wrong, do we have a section
to talk about the system cardspecifically later, or is this it?
(09:21):
I think we can, yeah, get backto it probably in the advancement
section just because there is somuch to talk about with Google.
So we'll do a bit more of a deep divelater on to get it to technicals.
But at a high level, I think, you know,as a user of chat gt, lms, et cetera,
(09:43):
this is a pretty major step forward.
And in particular on things like clot codeon kind of the ability to let these LMS
just go off and complete stuff for you.
And so moving on fornow from philanthropic.
Next up, we are gonna betalking about all that news from
(10:05):
Google that came from IO 2025.
Bunch of stuff to get through.
So we're gonna try and getthrough it pretty quick.
First up is the AI mode in Google search.
So starting soon, I guess youhave a tab in Google search
(10:25):
where you have AI mode, which isessentially like chat GP of search.
Google has had AI overviews for a whilenow, where if you do, I think at least
for some searches, you're gonna getthis LLM summary of various sources
with an answer to your query AI mode.
(10:48):
Is that, but more in depth.
It goes deeper on various sources andyou can do follow up questions to it.
So, very much now along the lines of whatperplexity has been offering, what Chad
GT search has been offering, et cetera.
And that is.
(11:09):
I guess really on par.
And, and Google has demonstratedvarious kind of bits and pieces there
where you can do shopping with it.
It has charts and graphs, it can dodeep search that is able to do, looking
over hundreds of sources, et cetera.
(11:29):
Yeah.
That kind of tight integration is,I mean, Google kind of has to do it.
One of the, the issues obviously withGoogle is when you're making hundreds
of billions of dollars a year from thesearch market and you have like 90%
of it, it's, it's all downside, right?
Like the, the thing you worry about iswhat if one day OpenAI like chat, GPT
just tips over some threshold and itbecomes the default choice over search.
(11:53):
Not even a huge default choice, justthe default choice for 5% more of users.
The moment that happens, likeGoogle's market cap, I. Actually we
would drop by more than 5% becausethat suggests an erosion in the
fundamentals of their business, right?
So this is, this is a really big fivealarm fire for Google, and it's the reason
why they're trying to get more aggressivewith the inclusion of generative AI in
(12:14):
their search function, which is overdue.
I think there are a lot ofpeople who are thinking, you
know, why did this take so long?
I think one thing to keep in mind too iswith that kind of massive market share,
in such a big market comes enormous risk.
So yes, it's all fine and dandy foropening AI to launch chat GBT and to
have it tell people to commit suicideor help people bury dead bodies.
(12:36):
Every once in a while, peoplekind of forgive it because
it's this upstart, right?
At least they did back in 2022.
Whereas with Google, if Google isthe one doing that, now you have
Congressional and Senate subpoenas.
Like people, people wantyou to come and testify.
They're gonna grill you.
You know, Josh Hawley's gonnalay into you hard as he ought to.
But that's the, that'skind of the problem, right?
(12:56):
You're reaching afundamentally bigger audience.
That's since Equilibrated.
So open AI is, is kind of benefitingstill from their brand of being
kind of swing for the fences.
So in some ways the expectations are abit lower, which is unfair at this point.
but Google definitely hasinherited that legacy of a big
company with a lot of users.
So yes, the rollouts will beslower for completely legitimate
(13:17):
sort of market market reasons.
So anyway I think this isjust like really interesting.
We'll see, we'll see ifthis actually takes off.
We'll see what impact that has too.
On, on chat.
GPT, I will say the Google product suite.
Is this sort of unheralded, relativelyspeaking unheralded suite of
very good generative AI products.
I use Gemini all the time.
(13:37):
People don't tend to talk about it much.
I find that really interesting.
I think it's a bit of a failure ofmarketing on Google's end which is
weird 'cause their platform is so huge.
So maybe this is a way for them to kindof solve that problem a little bit.
Well, we'll touch on possibly usagebeing higher than some people.
I think there might be aSilicon Valley bubble situation
going on here where, yeah.
(13:58):
Yeah.
Fair.
I get you're not Silicon Valley, butyou're like, you know spiritually in
Silicon Valley in terms of a bubble.
moving right along.
Next announcement was talkingabout Project Mariner.
So this was an experimentalproject from DeepMind.
This is the.
Equivalent to OP opening as operator,Amazon's Nova on Propex computer use.
(14:21):
It's an agent that can go off and usethe internet and do stuff for you.
It can, you know, go to a website,look for tickets to an event, order
with tickets, et cetera, et cetera.
So Google has improved this with thetesting and early feedback, and is now
gonna start opening up to more people.
(14:42):
And the access will be gated bythis new AI Ultra Plan, which is
$250 per month, which was introducedalso in the slate of announcements.
So this $250 per month plan is theone that will give you like all the
advanced stuff, all the models, themost compute, et cetera, et cetera.
(15:06):
And you'll have Project Mariner as well.
And with this update, you are gonnabe able to give Project Mariner up
to 10 tasks and it'll just go offand do it for you in the background.
Somewhat confusingly.
Also, Google had a demo of AgentMode, which will be in the Gemini app.
(15:28):
And it seems like agent modemight just be an interface.
To Mariner in the Gemini app maybeI'm not totally sure, but apparently
ultra subscribers will haveaccess to agent mode soon as well.
Yeah.
And it's so challenging to, to kindof, I find to highlight the things
(15:49):
that are fundamentally differentabout a new release like this.
Just because so often we findourselves saying like, oh, it's
the same as before, except smarter.
And that's kind of just true and thatis transformative in and of itself.
In this instance, there is one sort ofthing, you alluded to it here, but just
explicitly say it, the previous versionsof Project Mariner were constrained
(16:10):
to doing like one task at a time.
'cause they would actuallyrun on your browser.
And in this case, the, the big differenceis that because they're running
this in parallel on the cloud, yeah.
You can reach that kind of 10 or adozen tasks being run simultaneously.
So this is very much a, adifference in kind, right?
This is like many workers inparallel chewing on your stuff.
(16:30):
That's a change to the way people,people or you're more of an orchestrator,
right, in that universe than a sortof a leader of one particular ai.
it's quite interesting.
And moving right along.
The next thing we'll cover isVO free, which I think from just
kind of the wow factor mm-hmm.
(16:52):
Of the announcements is the highest one.
I think in terms of impact, probably notthe highest one, but in terms of just wow,
AI is still somehow blowing our minds.
VO three was the highlight of Googleio and that is because not only is
it now producing, you know, just mindblowingly coherent and realistic videos
(17:16):
compared to even a year ago, but it isproducing both video and audio together
and it is doing a pretty good job.
So there's been many demonstrationsof the sorts of things video can do.
The ones that kind of impressed me, andI think a lot of people, is you can make
videos that sort of mimic interviewsor, you know, typical YouTube style
(17:41):
content where you go to a conferencefor instance and talk to people and you
have people talking to the camera withaudio and it just seems pretty real.
And it's yes, you know, different in kindfrom video generation we've seen before.
And coming also with a new toolfrom Google called Flow to kind of.
(18:06):
Be able to edit togethermultiple videos as well.
So again, yeah, veryimpressive from Google.
And this is also underwear, AI Ultra Plan.
Yeah.
It's funny 'cause they also includea set of benchmarks in their, launch
website, which by the way are sortof, they're sort of hidden, right?
You actually have to clickthrough a thing to see any anyway.
(18:26):
And, and I always find itinteresting to look at these
when you've got that wow moment.
I, I don't mean to call itquite a chat GPT moment.
for text to video because we don't yetknow what the adoption's gonna look like.
But certainly from an impactstandpoint, it is a wow moment.
When you look at how that translatesthough, relative to VO two, which
again, relatively unheralded, like nota lot of people talked, they did at
(18:47):
the time, but it hasn't really stuck.
So 66% win rate, so thirds ofthe time it will beat VO two on a
sort of movie gen bench, which isa benchmark that meta released.
It.
It's just a, basically aboutpreferences regarding videos.
So it, it wins abouttwo thirds of the time.
It loses a quarter of the timeand then ties 10% of the time.
(19:07):
So it's a pretty, like, it looks likea, a fairly dominant performance,
but not, not like a, a knockoutof the sort that you might.
It's difficult to go fromthese numbers to like, oh, wow,
like this is the impact of it.
But it certainly is there.
Like when you look at these,it's, it's pretty remarkably good.
And this speaks to the consistencyas well of those generations.
(19:28):
It's not that they can cherry picknecessarily just a few good videos.
It does pretty consistentlybeat out previous versions.
Right.
And they also actually updated VO two.
So just demonstration of how crazthis was in terms of announcements.
VO two now can take kindof reference photos.
We've seen this with some other updates.
(19:50):
So you can give it an image of,you know, a t-shirt or a car and
it'll incorporate that into a video.
And all this is folded intothis flow video creation tool.
So that has camera controls, ithas sin Builder where you can.
Edit and extend existing shots.
It has this asset management things whereyou can organize ingredients and prompts.
(20:15):
And they also released this thingcalled Flow tv, which is a way to
browse people's creations with vo free.
So tons of stuff.
Now Google is competing more of arunway and kind of a, I guess what
OpenNet started doing with SOA whenthey did release SOA fully that had
(20:36):
some built-in editing capabilities.
Now VO isn't just text to video.
They have more of a full featuredtool to make text to video useful.
Yeah.
And the inclusion of audio too, I, Ithink is actually pretty important.
You know, it's, it's this other modality.
It helps to ground the model more.
And I suspect that because of the causalrelationship between video and audio,
(21:00):
that's actually quite a meaningful thing.
Like if they, you know, thisis interesting from that whole
positive transfer standpoint.
Do you get to a point wherethe models are big enough?
They're consuming enough data that whatthey learn from one modality leads them
to perform better when another modalityis added, even though the complexity
of the problem space increases.
And I suspect that will and probablyis already happening, which means we're
(21:21):
heading to a world by default withmore multimodal video generation that
wouldn't be too surprising, at least.
And next up, you know, Google, I guessdidn't just wanna do text to video.
So they also did text toimage with Imagine Four.
This is the latest duration of their,you know, flagship text to image model.
(21:42):
As we've seen with Text to Image, itis even more realistic and good at
following prompts and good at text.
They're highlighting really tinythings like the ability to do detailed
fabrics and fur or an on animals.
And also they apparently paid attentionto generation of text and typography
(22:08):
saying that this can be useful forslides and invitations and other things.
So, rolling out as wellfor their tool suite.
And last thing to mention aboutis they also say This will
be faster than Imagine Free.
The plan is to make it apparently upto 10 times faster than Imagine free.
(22:30):
Yeah.
And it is, it's, it's unclearbecause we're talking about a
product rather than a model per se.
It's unclear whether that's becausethere, you know, there's a compute cluster
that's gonna come online that's gonna,you know, allow them to just crunch
the crunch through the images faster,or that there's an actual algorithmic
advance that makes it say, 10 timesmore compute efficient or whatever.
So always hard to know with these things.
It's probably some mix of both,but Interesting that, yeah, I
(22:52):
mean, I'm at the point where I,it's like flying on instruments.
I feel like I can't tellthe difference between these
different image generation models.
Admittedly these photos looksuper impressive, don't get me
wrong, but I just, I can't tellthe incremental difference.
And and so I just end up lookingat like, yeah, how much per token,
like, or how much per image?
So you know, the, the priceand the latency are both
(23:13):
collapsing pretty quickly.
And moving right along.
We just got a couple more thingswe want to even covering all
of announcements from Google.
This is just a selection that Ithought made sense to highlight.
Next one is Google Meet is gettingreal time speech translation.
So Google Meet is the video meetingoffering from Google, similar
(23:34):
to Zoom or other ones like that.
And yeah, pretty much nowyou'll be able to have.
Almost real time translation.
So it's similar to having a realtime translator for like a press
con conference or something.
When you start speaking.
It'll start translating to thepaired language within a few
(23:55):
seconds, kind of following on you.
And they're starting to roll this outto consumer AI subscribers initially
only supporting English and Spanish.
And they're saying they'll beadding Italian, German, and
Portuguese in the coming weeks.
So something I've sort of been waitingon honestly, I've been thinking
(24:15):
we should have real time AI powertranslation that is very kind of
sophisticated and and powerful, andnow it's starting to get rolled out.
I personally thought people who spokelanguages other than English were just
saying complete gibberish up until now.
So this is a real shock that, that Yeah.
No, but it, it's, it'skind of funny, right?
This is another one of those thingswhere you hit a point of, you know, where
(24:37):
latency crosses that critical thresholdand that becomes the magic unlock.
Like a model that takeseven 10 seconds to produce a
translation is basically useless.
'cause it's at least this reallyawkward conversation, at least for
the purpose of, of Google Meet.
So another case where it did takeGoogle a little while, as you pointed
out, but the risk is so high ifyou mistranslate stuff and start
(24:58):
an argument or, you know, whatever.
That's a, that's a real thing.
And that're deploying it again,across so many, so many video chats
because of their reach, that that's,you know, gonna have to be part of
the, the corporate calculus here.
Right, and this is a thing we're not gonnabe going into detail on, but Google did
unveil a demo of their smart glasses.
And that's notable, I think, becauseMeta has their smart glasses and
(25:22):
they have real time translation.
So if you go to a foreign country,right, you can kind of have your in ear
translator and I wouldn't be surprisedif that is a plan as well for this stuff.
But last thing to mention, forGoogle, not one of the highlights,
but something I think notable aswe'll see compared to other things.
(25:46):
Google also announced a new JUULs AIagent that is meant to automatically
fix coding errors for developers.
So this is something you can use on GitHuband it very much is like GitHub co-pilot.
You will be able to task it with workingwith you on your code repository.
(26:09):
Apparently it's gonna be coming outsoon, so this is just announced.
And yeah, it will kind of make plans,modify files and prepare pull requests
for you to review in your coding projects.
And like literally every single productannouncement like this, they have
Google saying that Jules is in earlydevelopment, end quotes may make mistakes.
(26:33):
Which anyway, I think we'll be sayingthat until we hit super intelligence.
Just 'cause, you know, thehallucinations are such a, a persistent
thing, but there you have it.
Right.
And
next story actually is directlyrelated to that, it's that GitHub has
announced the new AI coding agent.
So GitHub co-pilot hasbeen around for a while.
(26:55):
You could task it with your reviewingyour code on a pool request on a,
on a request to modify a code base.
Google also had the ability tointegrate Gemini for viewing code.
So Microsoft very much.
Kind of competing directly with JUULsand Codex as well, with an offering of
(27:16):
an agent that you can task to go off andedit code and prepare a pool request.
So.
Just part of an interesting trend of allthe companies very rapidly pushing in
a direction of coding agents and agentsmore broadly than they had previously.
Yeah.
This is also notable because Microsoftand OpenAI obviously are in competition
(27:40):
and this frenemy thing copilot wasthe first, apart from Opening Eyes,
codex was the first sort of large scaledeployment at least of, of a coding auto
complete back in like, I wanna say 2020, 20 21, even just after GBT three.
And yeah, so the, they're continuing thattradition in this case, kind of being
fast followers too, which is interesting.
(28:01):
Like they're not quite first at thegame anymore, which is something
to note 'cause that's a big change.
one small thing worth noting.
Also, they did announce open sourcingof GitHub copilot for VS code.
So this is like a nerdy detail, butyou have also competition from Cursor
and these other kind of alternativedevelopment environments with a company
(28:23):
behind Cursor now being valued atbillions and billions of dollars.
And that is a direct competitorto Microsoft's Visual Studio
Code with GitHub copilot.
So them open sourcing with GitHubcopilot extension to Visual Studio
Code is kind of an interesting moveand I think they are trying to compete
(28:45):
against these startups that arestarting to dominate in that space.
And just one more thing to throw in here.
I figure we're flagging becauseof its relation to this trend.
Mistral, the French company that istrying to compete with OpenAI and
philanthropic has announced Devraa new AI model focusing on coding.
(29:11):
And this is being releasedunder an Apache to license.
It is competing with thingslike Gemma free 27 B, so like
a mid range coding model.
And yeah, Misra also working on a largeragentic coding model to be released soon.
Apparently with this being the smallermodel, that isn't quite that good.
(29:35):
This is also following up on codetrial, which was more restrictively
licensed compared to death trial.
So there you go.
Everyone is getting into codingmore than they have before.
You get an agent and you get an agent
and onto applications and business.
(29:57):
We have, first up, not the mostimportant story, but I think the
most I. Kind of interesting or weirdstory to talk about, which is this
open AI announcement of them fullyacquiring a startup from is it Johnny?
Ive Yeah.
Johnny Ive, yeah.
Johnny Ive, yes.
Who seemingly has had this startupio that was, the details here
(30:23):
are, are quite strange to me.
Yes.
So there's this startup that Joni,I've started with Sam Altman seemingly
two years ago that we don't knowanything about or what is done.
OpenAI has already owned 23% of thisstartup and is now going on a full equity
(30:44):
acquisition that they're saying they'repaying $5 billion for this IO company.
And that's a company with 55employees that again, at least
I haven't seen anything out of.
And this is, they're saying likethe employees will come over.
(31:07):
Johnny, I've will still be working.
Yeah.
At Love From, which is his designcompany, broadly speaking, which
has designed various things.
So Johnny, I've not a full-time employeeat OpenAI or io still sort of like a
part-time contributor collaborator.
And to top off all these variouskind of weird details, this came
(31:31):
with an announcement video of.
Sam Altman and John Johnny, ive walkingthrough San Francisco, meeting up in
a coffee shop and having like a eightminute minute conversation on values
and AI and their collaboration that justhad a very, very strange vibe to it.
(31:53):
That was, you know, trying to makethis very artsy, I guess feel to it.
They also released some glass post coffee,coffee, you know, called Joni and Sam.
Anyway, I, I just don't understandthe PR aspects of this with
business aspects of this.
(32:13):
All of this is weird to me.
it almost reads like a landingpage that Johnny Ive designed.
Like to, to announce it.
It's very like, kind of sleek, simpleApple style almost one might say very
similar to actually love from theirwebsite has the same style of Yeah.
Yeah.
The blog post is like thisminimalist centered text, large text.
(32:34):
And the headline is Johnny and Sam.
I think it's, I'm justgonna say it's weird.
So they, so there, this blog post, they'retalking about the origin story of this.
I think the news reports aroundthe time that IO was first launched
said, recall that like Sam Altmanand Johnny Ives new startup.
And the implication was that thiswas a company being co-founded by
(32:58):
Sam and Johnny together or something.
That's clearly not the case, atleast according to what they said.
They.
Imply they say something like, it wasborn out of the friendship between
Johnny and Sam, which is very ambiguous.
But the company itself was foundedabout a year ago by Johnny.
Ive, along with Scott Cannon,who's an alum at Apple.
(33:19):
and then Tang Tan and Evans Hanky.
Evans e Evans hanky actuallytook over Johnny's role at
Apple after Johnny departed.
So they're tight there.
A lot of shared history.
But none of the co the actualco-founders are Sam opening.
I already owns 23% of the company,so they're only having to pay 5
billion out of the total valuation of6.4 billion to acquire the company.
(33:40):
And then, as you said, somehow outof all this, Johnny ends up still
being a free ish agent to work atLove Run that, that, by the way,
highly, highly unusual to acquire acompany even at a $6 billion scale.
And to let one of the core, arguablythe most important co-founder
just leave, like this is normallynot how this goes usually.
(34:01):
famously with like the WhatsAppacquisition by Facebook it was like
a, I forget what it was, like, a $5billion acquisition, but the founder of
WhatsApp left Facebook early, and so hewas on an equity vesting schedule, so
most of his shares just vanished and hedidn't actually get the money that he
was entitled to if he'd stuck around.
So the common thing weird thatJohnny gets to just leave and fuck
(34:22):
off, and like, apparently, I don'tknow if he's still getting his money
from this or like, it's so weird.
This is like.
Very esoteric kind of deal it seems.
But bottom line is they're workingon a bunch of hardware things.
Opening I has hired, the former head ofMeta's, Orion Augmented Reality Glasses
(34:42):
Initiative, that was back in November.
And that's to head up its roboticsand consumer hardware work.
So there's a bunch ofstuff going on at OpenAI.
This presumably foldsinto that hardware story.
We don't have much information,but there's presumably some magic
device that is not a phone thatthey're working on together.
And who the hell knows?
Right.
So this announcement, which is veryshort like, I don't know, like maybe
(35:05):
nine paragraphs concludes with saying,as IO merges with OpenAI, Johnny and
Rum will assume deep design and creativeresponsibilities across OpenAI and io.
not like a strong commitment andas you said, said free agent.
Like what deep design andcreative responsibilities.
(35:27):
And yeah, IO was seemingly workingon a new hardware product, as you
said like a hardware interface forai, similar to the humane AI pin and
rabbit R one, famously huge failures.
Very interesting to see if they'restill hopeful that they can make this.
(35:48):
I. AI computer, or whatever you wannacall it, AI interface within OpenAI
and with Joni, ive, but anyways,yeah, just, such strange vibes out
of this announcement and this videoand with business story around this.
Can an announcement have code smell?
'cause I, I feel like that's what this is
(36:10):
and moving out to somethingthat isn't so strange.
We have details about Open AI'sPlanned Data Center in Abu Dhabi.
So they're saying that they're goingto develop a massive five gigawatt data
center in Abu Dhabi, which would be one ofthe largest AI infrastructures globally.
(36:30):
Yeah.
So this would span 10 square milesand be done in collaboration with
G 42 and would be part of OpenAI's Stargate Project, which.
I'm kind of losing track is open the EyesStargate Project, just like all their data
centers where they might wanna put them.
And this is coming after, youknow, of course, Trump's tour
(36:55):
in the Middle East with G 42.
Having said that, they're gonna cutties divest their stakes in entities
like Huawei and Reba genomics Institute.
this is pretty wild fromnational security standpoint.
It is not unrelated to the dealsthat we saw Trump cut with the
(37:15):
UAE and Saudi Arabia, la I won'tsay last week or the week before.
So for context, opening Eyes First,Stargate Campus in Abilene, Texas,
which we've talked about a lot, that'sexpected to reach 1.2 gigawatts,
really, really hard to find a sparegigawatt of power on the US grid.
That's one of the big reasons whyAmerica's turning to the Saudis
the Emiratis and, and so on andthe Qataris to find energy on these
(37:37):
kind of energy rich nations grids.
And so when you look at five gigawatts,you know, five times bigger than
what is being built right now inAbilene, that would make this by
far the largest structure or so,sorry, the largest cluster that
OpenAI is contemplating so far.
It also means that it would be based inforeign soil on the soil of a, a country
(37:58):
that the US has a complicated past with.
And just based on the workthat we've done on securing.
Builds and data centers I can tell youthat it is extraordinarily difficult.
To actually secure something whenyou can't control the physical
land, that that thing is goingto be built on ahead of time.
So when that is the case, you havea security issue to start with.
(38:21):
That is prima facie.
Not an option.
When you're building in the UAE fora variety of reasons, you may tell
yourself the story that you're.
Controlling that, that environment, butyou cannot and will not in, in practice.
And so from national security standpoint,I mean, I would really hope the
administration is tracking this veryclosely and that they're bringing in, you
know, the special operations, the, theIntel folks, including from the private
(38:42):
sector who really know what they're doing.
I, I, I gotta say the, the currentbuilds, including from the Stargate
family so far are the, the levelof security is not impressive.
I've heard a lot of private reportsthat are non-public that make it
very clear that that's the case.
And so this is a really, really big issue.
Like, we, we gotta figureout how to secure these.
There are ways to do it and ways notto do it, but opening eyes so far has
(39:06):
not been impressive in how seriouslythey've been taking the security story,
but they've been talking a big game.
but the actual on the groundrealities seem to be seem to
be quite different again, justbased on what we've been hearing.
So really interesting question.
Are we going to have this build go up?
Is it going to be effective froma national security standpoint?
And what's it gonnatake to, to secure this?
(39:27):
Yeah.
Anyway all part of that G 42 backstorythat we've been tracking for a long
time between Microsoft and OpenAI andthe United States and all that jazz.
Yeah.
And, and it seems like with Trumpin office, there's definitely set
to be a major deepening of tiesand open AI is opening Microsoft.
Other tech companies seem happyto jump on board with that move.
(39:52):
And yeah, as you said, there's been kindof a lot of investments going around from
that region into things like open ai.
So makes some sense.
It, it's worth it if you can secure it.
Like this takes immense pressureoff the US electric grid, right?
Like we're not gonna just build orfind five gigawatts like tomorrow
that takes, we, we actually don'tknow how to build nuclear reactors
(40:14):
in less than a decade in America.
So it's a really good option.
Saudi capital, UAE capital,those are great things.
If.
You know, they don't come withinformation rights or whatever.
But yeah, this is like if you, ifyou wanna like get the fruits of
the of, of sort of Saudi and Uuae energy, you gotta make sure that
you understand how to secure thesupply chain around these things.
(40:35):
'cause Yeah, well we've, the billionsof dollars this will surely cost.
You'd hopefully put ina little bit of effort.
Well, we'd be surprised.
You'd be surprised.
Yeah.
Yeah, yeah, yeah.
Security is, is expensive and, and it's,it's actually like, it can't necessarily
be bought for money because the, the teamsthat actually know how to secure these
sites to the point where they are robustto, for example, like Chinese or Russian
(40:59):
Nation state attacks are extremely rare.
And it's literally like a couple guys atlike Seal Team Six and Delta Force and the
agencies and like, yeah, their demands ontheir time are extreme and you probably
can't network your way to them unlessyou have a, a trusted way to get there.
So it like, it's a really tough problem.
Well onto the next story.
(41:20):
I think another sort of weird, almostfunny story to me that I thought were
covering LM Arena, which had the famous AIleaderboards that often have been covered.
We covered it just Ithink a few weeks ago.
There was a big controversy aroundseemingly the big commercial
(41:43):
players gaming their, to getahead of open source competition.
That organization has announced ahundred million dollars in a seed funding
round led by a 16 z and uc investments.
So this is gonna value them atsomething like $600 million and this
(42:07):
is coming after them having beensupported by grants and donations.
So.
Like, I don't understand.
What is the promise here forthis, leaderboard company,
organization is this just charity.
Anyway, it's very strange to me.
I would love to see thatslide deck that pitch deck.
(42:28):
There's a lot here.
That's.
Interesting to say the least.
So one thing to note by the way,is they raised it's a hundred
million dollars seed round.
This is not a priced round.
So, so for context, when you raise aseed round you're, oh man, this gets
into unnecessary detail, but basicallyit's a way of avoiding putting a
real valuation on your company.
(42:49):
If you raise it with, with safes,usually the whole thing with a seed round
is you don't give away a board seat.
Whereas if you raise a seriesA or series B, you're starting
to give away board seats.
So this implies that theyhave a lot of leverage.
Like if you're raising a hundredmillion dollars and you're calling it
a seed round, you're basically sayinglike, yeah, we'll take that money.
You'll get your equity, but don'teven think about getting a board seat.
That's kind of the frame here.
(43:10):
You can only do that typicallywhen you have a lot of leverage.
Which again brings us back to your very, Ithink, very good and fundamental question.
What is the profit story here and.
The, like, I have no idea.
But it's notable that like Ella Marina hasbeen accused of helping top AI labs game
its leaderboard, and they've denied that.
(43:32):
But when you think about like, okay, howcould a structure like this be monetized?
Well, maybe, showing some kindof, not overt preference, but,
subtle preference for or indirectpreference for certain labs.
Like, I don't, I don't know.
I'm speculating and thisshould not be taken.
Like we, I just don't see any informationon exactly what the profit play is, which
(43:53):
kind of makes me intrinsically skeptical.
And yeah, we'll, we'll,we'll see where this goes.
But again, there's a lot of leverage here.
There's gotta be a profit story.
It's being led by a 16 z so, you know.
There's a, there, there, presumably.
Yeah.
Apparently it cost a few milliondollars to run the platform
and, and they do need to do thecompute to compare these chatbots.
(44:19):
So er here is you get two generations,two outputs for a given input and
people vote on which one they prefer.
So it is costly in that sense and,and it does require you to pay for
the inference and what at least hasbeen said is this funding will be
used to grow a la marina and hire morepeople and pay for costs such as the
(44:43):
compute required to run this stuff.
So yeah, basically saying thatthey are gonna scale it up.
And grow it to, to something thatsupports the community and helps
people learn from human preferences.
Nothing related to how this a hundredmillion will be, you know, something
(45:05):
that the investors will get a return on.
But it could be a data play, like,you know, a kind of scale ai we're,
we're doing, you know, it, it, it is.
You've got some data labeling.
That's cool there.
I just like, yeah.
I'd love to see that deck.
Yeah.
And next up going back to hardwareNvidia, CEO has said that the next
(45:27):
chip after the aged 20 for Chinawill not be from the Hopper series.
So this is just a kind of smallremark, and it's not because
previously it was reported thatNvidia planned to release a downgraded
version of the H 20 ship for China.
In the next two months, thisannounced and mades a transition in
(45:53):
US policy as to restrictions on ships.
And, and after the sale of these age20 chips designed specifically for
China was banned only a few months ago.
looks like Nvidia is yeah,kind of having to change their
plans and adapt quite rapidly.
I. It seems like they will bepulling from the Blackwell line.
(46:16):
This makes sense.
Jenssen's quote here is, it's notHopper because it's not possible
to, to modify Hopper anymore.
So they've sort of movedtheir supply chains over onto
Blackwell, no surprise there.
And they've sort of squeezed allthe juice they can out of the, the
Hopper platform and presumably soldout of their stock when, when it was
(46:37):
announced that they couldn't do anymore.
Next up, I put this in a business sectionjust so that we could move on from Google
for a little bit, was announced thatGoogle Gemini AI app has 400 million
monthly active users, apparently,which is approaching the scale of Chad
GPT, apparently that had 600 millionmonthly active users as of March.
(47:02):
So yeah, as I, as I previewed, Iguess seems very surprising to me
because Gemini as a chatbot hasn'tseemed to be particularly competitive
with offerings like Chatt and Claudeand haven't seen many people be
big fans of Gemini or a Gemini app.
(47:22):
But according to this announcement,lots of people are using it.
Yeah, and apparently so, so thecomparable here, there are recent
court filings where Google estimatedin March that chat GPT had around
600 million monthly active users.
So, you know, this is like two thirdsof, of where chat GPT was back in March.
So, you know, to the extent that, youknow chat GPT and open AI encroaching
(47:45):
on Google's territory, well Google's,you know, starting to do the same.
So yeah, it, this is all obviouslya competition as well for data
as much as for, you know, moneyin the form of subscription.
So this is all self licking ice creamcones, if you will, or flywheels that both
these companies are trying to get going.
Right.
And I think also part of a broaderstory, this whole thing with Google
(48:07):
IO 2025 and then this announcement aswell, I think demonstrates that over
the last few months really Google hashad a real shift in fortune in terms
of their place in the AI race andcompetition basically until like 2025.
They've seemed to be surprisingly behind.
(48:28):
Gemini was like surprisingly bad,even though the numbers looked
pretty good and their web offeringsin terms of search lacked behind
perplexity and Chatty P search.
Then Gemini 2.5 was updated or, orreleased I think in late January and kind
of blew everyone away how good it was.
Gemini 2.5 and Gemini Flashhave continued to be updated
(48:52):
and, continue to impress people.
And now all this stuff wouldbe a free Imagine four.
The agents all these like 10 differentannouncements, really position
Google as, as I think for manypeople in the space looking at who
is in the lead or who is killing it.
Google is killing it right now.
(49:14):
They are, and this is, you know,we've talked about this before, Google
being the sleeping giant, right?
With this massive, massive poolof compute available to them.
They were the, the first to, I mean,there's the first to recognize scaling
in, in the sense that OpenAI didwith GPT two and then G PT three, but
then there's the first to recognize.
(49:35):
Let's just say the need fordistributed computing infrastructure
in a more abstract sense, andthat was certainly Google.
They invented the TPU explicitlybecause they saw where the wind was
blowing, and then they, now they havethese massive TPU fleets and, and a
whole integrated supply chain for them.
You know, OpenAI really woke the dragonwhen they, when they went toe to toe
(49:56):
with Google via chat, GPT and Microsoft.
And so.
Yeah, I mean, to some degree thisis not to some degree entirely.
This is the reason why you're seeingthat, you know, five gigawatt UAE build,
that's, that OpenAI is gonna build.
They need to be able to compete ona flop per flop basis with Google.
If they can't, they're done.
Right.
This is kind of just how the story ends.
(50:17):
So that's why all the CapEx is being spentanyway, just these announcements that
we're seeing today are the product ofCapEx that goes back, you know, two years,
like breaking ground on data centers twoyears ago and making sure chip supply
chains are ready three years ago anddesigning the chips and all that stuff.
So, you know, this is really along time in the making, every time
you see a big rollout like this.
Yeah.
(50:37):
And, and not just theinfrastructure, I mean.
Having DeepMind, having Google ai.
Yeah.
You know, Google was the firstcompany to really go in on AI in a
big way spending, you know, billionsof, of dollars on DeepMind for many
years as just a pure r and d play.
(50:58):
Microsoft later, you know, also startedinvesting more in, in meta and so on.
But yeah, Google has been around for awhile in research and that's why it was
to a large extent, kind of surprising howlagging they were on the product front.
And now seemingly they're catching up.
And just one more thingto cover in the section.
(51:19):
We have a bit of a analysis onthe state of AI servers in 2025.
This is something Jeremy, you linkedto just on x. So I think I'll,
I'll just let you cover this one.
It's sort of like a, a random assortmentof of excerpts or, or take homes from this
big JP Morgan report on AI servers fromtheir Asia Pacific Equity Research branch.
(51:44):
And there's just like a bunch of, abunch of little kind of odd tidbits.
We won't spend much time onthis 'cause we gotta go, man.
There's more news.
Just looking at the, the mismatch betweenfor example packaging production, so,
so tsm C'S ability to produce wafersof like, kind of packaged chips.
And how I. And then downstream,GPU module assembly and how
(52:09):
that compares to GPU demand.
And they're just kind of flaggingthis interesting mismatch where it
seems like there's about a 1 millionor 1.1 million GPU unit oversupply
currently expected heading into anywaythe next few quarters, which is really
interesting given where things wereat just like two years ago, right?
That massive, massive shortagethat saw prices skyrocket.
(52:30):
So cur, you know, kind of curious to seewhat that does to margins in the space.
This is all because of NVIDIA inventorybuildup, basically, like there, there's
a whole bunch of, of excess there.
Anyway, and, and there weresome yield issues and things
like that that are being fixed.
Anyway, they, they've got interesting,interesting numbers about the whole space.
(52:50):
CapEx increasing across the boardfrom these massive cloud companies
by like pretty wild amounts.
And in particular.
ASIC shipments.
So basically AI chip shipmentsprojected to go up 40% year
over year, which is huge.
I mean, that's like, that'sa lot more chips in the world
than there were last year.
(53:10):
And keep in mind, those chipsare also much more performant
than they were before.
So it's 40% year over year growth on a perchip basis, but on a per flop basis, a per
compute basis, it's even more than that.
You know, we may be like doubling theamount of compute or, or actually more
that there is in the world based on this.
Anyway, you can check it out if you're,if you're a nerd for these things and
you wanna see, you know, what's happenedto Amazon's training two demand it's
(53:35):
up 70%, by the way, which is insane.
And a bunch of other cool things.
So, so check it out if you're like asort of like finance and compute nerd.
'cause this is just gonna, justgonna be your, your weekend read.
Onto the next section,projects and open source.
We just have one story hereI guess to try and save time
'cause there is a lot more after.
And the story is pretty simple.
(53:56):
Meta is delaying the rollout ofthe biggest version of llama.
So when they announced four, they alsowere previewing lama for behemoth,
their large variant of LAMA fourThat is meant to be competitive with,
you know, cha, BT, and, and Quad.
(54:17):
And.
Basically the frontier models.
So it seems like, and according tosources, that they initially planned
to release this behemoth in April, thatwas later pushed to June, and it has now
been pushed again until at least fall.
So this is all, you know, kind ofinternal, they never committed to
(54:40):
anything, but it seems like per kindof the reports and general, I think
things that are coming out that Metais struggling with training, with model
to be as good as they want it to be.
Yeah, I think this is a, actuallya really bad sign for meta.
Because also they have a reallybig compute fleet, right?
(55:02):
They, they have huge amounts ofCapEx that they've poured into
into AI compute specifically.
And what this shows is that theynow have consistently struggled
to make good use of that CapEx.
They have been consistently pumping outthese, like pretty mid models unremarkable
and then to make up for that, gaming themto make them look more impressive than
(55:22):
they are in a context where deep seek iseating their lunch, both from a marketing
branding standpoint and also just rawperformance and compute efficiency.
And so.
Yeah, this is, this is really bad.
The, the whole reason that metaturned to open source, it was never
because they thought that theywere going to somehow open source
a GI that was never gonna happen.
(55:43):
Anybody who has a GI locks it downand uses it to like, bet on the
stock market and then funds the nextgeneration of scaling in that shit.
But and then obviouslyautomates AI research.
It, it was eventuallygonna get locked down.
This was always a recruitment playfor meta and there were some other
ancillary infrastructure things,getting people to build on, build
on their platforms and that.
But the biggest thing absolutelywas always recruitment.
(56:04):
And now with that story just fallingflat on its face it's really difficult.
Like if you wanna work atthe best open source AI lab.
A like, unfortunately, it looks likeright now there are Chinese labs
that are absolutely in the mix, butB there are a lot of interesting
players who seem to be doing a betterjob on a kind of per flop basis.
(56:28):
Over here.
You look at even Alan ai, right?
They, they're putting out somereally impressive models themselves.
You've got a lot of really,anyway, impressive open source
players who are not meta.
So I think like Zuck is in areal bind and they're doing a
lot of damage control these days.
Yeah, and I think this speaks tolike Meta has really good talent.
(56:48):
They have been publishing justfantastic work for many, many years,
but my sense is that the skills andexperience and knowledge needed to
train a massive, massive LLM model Yeah.
Is very different.
And the competition forthat talent is just immense.
(57:09):
X Xai when it came out, I think was.
Seemingly providing just really, reallybig packages to try and get people who
have experience in that philanthropic hashad very high retention of their talent.
I think I, I saw a numbersomewhere like 80% retention.
(57:30):
We've seen people leaving fromGoogle to go do their own startups.
So I think meta, presumably that's partof a problem here is this is a pretty
specialized skillset and knowledge andthey've been able to train good lms.
But to really get to thefrontier is not as simple as
(57:50):
maybe, you know, just scaling.
After research and advancements,and we begin with not a paper
and not a very detailed kind ofadvancement, but a, a notable one.
And this is also from Google, sosort of under radar, just as a little
research announcement and demo, theydid announce a Gemini diffusion.
(58:16):
And this is the kind of demonstrationof doing language modeling via
diffusion instead of auto regression.
So typically any chat bot you use thesedays essentially is generating one token
at a time, left to right, start to finish,you know, it picks one word, then it
(58:37):
picks the next, then it picks the next.
And we I think recently covered.
Efforts to move that to the diffusionparadigm where you basically
generate everything all at once.
So you start with all the text andthere's some messy kind of initial state,
and then you update it to do better.
(58:59):
And the benefit of that is youcan do be just way, way, way, way
faster compared to generating oneword, a one token at the time.
So DeepMind has come out witha demonstration of diffusion
for Gemini, for coding.
That seems to be pretty good.
Seems to be comparable with Geminitwo flash light with smaller kind
(59:24):
of not quite as powerful fast model.
And they are claiming speeds of about.
1500 tokens per second withvery low initial latency.
So something roughly on a scaleof 10 times faster than GBT 4.1,
for example, just lighting fastspeeds not much more details.
(59:49):
Here, you can get access to thedemo signing up for a wait list.
And yeah, if they can push thisforward, if they can actually make
diffusion be as performant as just ataggressive generation at the frontier.
Really, really big deal.
(01:00:10):
And diffusion.
So conceptually diffusion is, is quiteuseful from a parallelization standpoint.
It, it, it's got propertiesthat allow you to paralyze at a,
just in, in more efficient waysthan transformers potentially.
One of the consequences of that,they show a case where the model
generates 2000 tokens per second ofeffective effective kind of token rate
(01:00:33):
generation, which is pretty, pretty wild.
It means you're almost doing likeinstant generation of chunks of code.
There's, to kind of give you a sensefor why this would matter, there's a
certain kind of, some sometimes knowas like non causal reasoning that these
models can do that your traditionalauto regressive transformers can't.
(01:00:55):
So an example is you can saylike, solve this math problem.
First give me the answer and then afterthat, walk me through the solution.
Right?
So give the answer first, then givethe solution that's really, really hard
for standard auto aggressive modelsbecause what they wanna do is spend
their compute first to spend theirinference, time compute, generating a
(01:01:18):
bunch of tokens to reason through theanswer, and then give you the answer.
But they can't, they're being asked togenerate the solution right away and only
generate the the sort of derivation after.
Whereas with diffusion models,they're seeing, they're, they're
generating the whole thing all at once.
They're seeing the wholecanvas all at once.
And so they can start by having, you know,a crappy solution in the first cycle of
(01:01:38):
generation and, and a crappy derivation.
But as they modify their derivation,they modify the solution, blah,
and then eventually they get, youknow, the right answer on the whole.
So this may seem like a pretty nichething, but it can matter in certain
sort of specific settings where certainkind of causalities at play and you're
trying to solve certain problems.
And just generally it's, it's good tohave other architectures in the mix
(01:02:01):
because if nothing else you could dolike a kind of mixture of models where
you have some models that are betterat solving some problems than others.
And, and this gives you anarchitecture it's a bit more
robust for, for some problems.
Right, and like intuitively, youknow, you're so used to when you are
using chat GBT or these LMS to thisparadigm of like, you enter something
(01:02:22):
and then you see the text kinda popin and you almost are reading it as
it is being generated with diffusion.
What happens is, like all the textskind of just shows up it's near
real time and that is a real kindof qualitative difference where
it's no longer, you know, waitingfor it to complete as you're going.
(01:02:43):
It's more like you enter something and youget the output almost immediately which
is kind of bonkers if you think it can bemade to work anywhere near as well as just
the auto regression paradigm, but not manydetails here on the research side of this.
Hopefully they'll release more'cause so far we haven't seen very
(01:03:05):
successful demonstrations of it.
And moving on to an actualpaper, we have a chain of model
learning for language model.
So the idea here is you canincorporate what they're saying as
hierarchical hidden state change chainswithin transformer architectures.
(01:03:26):
So what that means is you canhidden states in neural nets is
basically just the soup of numbersin between your input and output.
So you take your input, it goes througha bunch of neural computing units
and generate all these intermediaterepresentations from the beginning
(01:03:47):
to end and, and keep updatinguntil you generate the output.
So.
The gist of a paper is that if youstructure that hidden state hierarchically
and have these chains that are processedat different levels with different
levels of granularly granularity and withdifferent levels of model complexity and
(01:04:10):
performance, you can be more efficient.
You can use your compute in moredynamic and more kind of flexible ways.
So that's, I think, the gist of this.
And I haven't looked into thisdeeper sort of, Jeremy, maybe
you can offer more details.
Sure.
I, I think this is kindof a, a banger of a paper.
(01:04:31):
It's also frustrating that this is, Imean, this is a, a multimodal podcast.
We have video, but we don't like, youknow, there's like an image in the
paper that makes it make a lot of sense.
It's figure two that just sort of showsthe architecture here, but high level
you can imagine, neural network has,you know, layers of neurons that are st,
you know, stacked on top of each other.
(01:04:52):
And typically the, you know, theneurons from the first layer are
e, each one of them is connectedto each neuron in the second layer.
And each neuron in the secondlayer is connected to each neuron
and the third layer and so on.
So you kind of have this dense mesh of,of neurons that are linked together.
So there's a width, right?
The number of neurons per layer,and then there's a depth, which is
(01:05:12):
the number of layers to the network.
in this case, what they're gonnado is they're kinda gonna have
a slice, a very small, narrowof width slice of this network.
And they're going to essentially makethat the backbone of the network.
So let's imagine there's like, youknow, two, two neurons in each layer.
And the two neurons from layer oneare connected to the two neurons from
(01:05:35):
layer two and layer three and so on.
And the two neurons, say at layertwo can only take input from
the two neurons at layer one.
They, they can't see any ofthe other neurons at layer one.
That then becomes this like prettycordoned off structure within a structure.
So if, if you have like a, a largernumber of neurons in each layer
(01:05:55):
that are only connected to the, theadditional anyway sets of neurons at
each layer, hopefully you can justcheck out the figure and see it.
You can kind of see how thisallows you to kind of increase.
the size you can run your model in, inlarger mode, either by only using the
thin slice of say, two neurons thatwe talked about, or by considering a
(01:06:17):
wider slice, you know, four neuronsor, or eight or 16 or whatever.
And so what they do is they find away to train this model such that
they are training at the same time,all these kind of smaller submodels,
these thinner submodels, so thatonce you finish training, it like
costs you the same amount basicallyto train these models, but you end
(01:06:37):
up for free with a bunch of smallermodels that you can use for inference.
And the other thing is, because ofthe way they do this, the way they
engineer the loss function is such that.
The smaller slices of themodel they have to be able to
independently solve the problem.
So the, the thinnest slice ofyour model has to be able to make
decent predictions all on its own.
(01:06:59):
But then if you add the next coupleof neurons in each layer to your
model and get the slightly widerversion, that model is gonna perform
a little bit better 'cause it's gotmore scale, but it also has to be able
to independently solve your problem.
And so it ends up those extraneurons end up specializing in
kind of refining the answer thatyour first thinner model gives you.
(01:07:22):
So there's this sort of idea whereyou can gradually control, you can,
you can tune the width of your modelor effectively the level of, capacity
that your model has dynamically.
At Will.
And from an almost interpretabilitystandpoint, it's quite interesting
'cause it means that the, neuronsfrom that thinnest slice of your
network that's still supposedto be able to operate coherently
(01:07:43):
and solve problems independently.
Those neurons alone must be kind offocused on more foundational basic
concepts that generalize a lot.
And then the neurons that you'readding to the side of them
are more and more specialized.
As you add onto them.
They're, they're gonna allow the model toperform better when they're included, but.
excluding them still resultsin a functional model.
(01:08:04):
So this is, there's a lot ofdetail to get into in the paper we
don't have time for, but I highlyrecommend taking a look at it.
I wouldn't be surprised if something likethis ends up becoming fairly important.
It just, it smells of goodresearch taste, at least to me.
It is a Chinese lab that came outwith it, which is quite interesting.
But in any case check it out.
Highly recommend.
It's yeah, it's cool paper.
(01:08:25):
Yeah.
Actually a collaboration betweenMicrosoft Research and Food University.
right.
Several others.
But they did open source or save a willopen source for code for this stuff.
And, but paper is is kind of funny.
They.
It produced a lot of terms where like,here's yes, the notion of chain of
(01:08:48):
representation, which leads into chain oflayer, which leads into chain of model,
which leads into chain of language model.
Where we idea that yeah, these kind ofcumulatively lead up to the notion that
when you train a single large model,it contains these sort of submodels
(01:09:11):
and it is quite elegant, as you say.
Now that I've taken abit of a deeper look,
next paper is seek in the darkreasoning via test time instance
level policy gradient in latent space.
So the idea and or problem here is.
variant of test time compute whereyou want to be able to do better for
(01:09:33):
given input by leveraging computationat test time rather than train time.
You're not updating your parameters atall, but you're still able to do better.
And the idea of how this is done hereis sort of mimicking prompt engineering.
So you're tweaking the representationsof the input for the model, but instead
(01:09:56):
of actually literally tweaking theprompt for a given input, it's tweaking
the representations within the model.
So they are using every rewardfunction to update the token wise,
latent representations in the processof decoding, and that they show
(01:10:18):
can be used to for given input,improved performance quite a bit.
So they're kind of optimizing the.
Internal computations in an indirectway that is yet another way to
be able to scale at test time.
Quite different from, forinstance, chain of thought.
Yeah.
So that was actually really good.
(01:10:38):
I never thought of this as analternative to prompt engineering,
but I think you're exactly right.
Right?
It's like a activation space, promptengineering, or at least that's that,
that's a really interesting analogy.
Yeah.
It, it's so, so this, this isanother in my opinion, another
really interesting paper.
So is the basic idea is you're gonnatake a prompt, you know, feed it to
(01:11:00):
your model is in this case you, you'regonna give it a reasoning problem
and get the model to generate a, acomplete chain of thought, right?
So the model itself just generatesthe full chain of thought, vanilla
style, like nothing unusual.
And then you're gonnafeed the chain of thought.
To the model, and you're going to, thisis going to lead to a bunch of activations
(01:11:21):
at every layer of the model as usual.
Now, the final layer of the model,just before it gets decoded, you
have activations there and you'regonna say, okay, well why don't we
essentially build a reinforcementlearning model and have that, that
model play with just those activations.
And what we're gonna do is we'regoing to get the model itself to like
(01:11:42):
decode and then, estimate the expectedreward on this task for, for the, the
final kind of decoding the answer.
And you're gonna do it in a very,very, kind of simple, greedy way.
So whichever token is, isgiven highest probability.
That's just the one thatyou're gonna predict.
And you're gonna use essentially a versionof the same model to predict the reward.
(01:12:03):
And then, like, if the reward islow, you're gonna go in and modify.
So according to the model's ownself-evaluation, if the, if the reward is
low, you're gonna modify the activationsin that final layer, the activations
that sort of represent or encode thechain of thought that was fed in.
So you're gonna tweak those.
And then you'll, you'll tryagain decode, and then get the
(01:12:26):
model to evaluate that output.
Oh, you know, I, I think itneeds to be, you know, we
need to do some more tweaking.
So you go back and you tweakagain the activations and you can
do a bunch of loops like this.
Essentially.
It's like getting the modelto, to correct itself.
And then based on those corrections,it's actually changing its own
representation of the chain ofthought that it was chewing on.
(01:12:48):
And it's really quite interesting.
And, and again, it feels it sortof feels obvious when you see it.
But somebody had to actuallycome up with the idea.
A couple observations here.
So there's an interesting scalingbehavior as you increase the number
of iterations of the cycle, right?
Get, get the model to actuallydecode, evaluate its own output,
(01:13:08):
then tweak The activation's a bit.
What you find is there's typically likea, an initial performance improvement
that's followed by a plateau.
And that plateau seems to comefrom the model's own ability to
evaluate, to predict the rewardsthat would be assigned to its output.
when, instead of the modelself-evaluating, you use
(01:13:31):
an accurate reward model.
One that always gets thereward prediction, right?
Then all of a sudden that plateaudisappears and you actually get
continuous scaling, like the moreof these loops you do, as long as
you're correctly assigning the reward.
And, and it corresponds tolike the true base reality.
You just continue, continue,continue to improve with scale.
(01:13:53):
So that's another scaling law impliedin here, which is quite impressive.
There's also a bunch of likecompute efficiency stuff, so.
There's a question of like, do we,do we think of the, the playing
field as every activation in thefinal layer of the transformer?
Or as a subset, we could imagineonly optimizing, only doing
(01:14:13):
reinforcement learning to optimizesay, 20% of those activations.
And in fact, it turns out that thatends up being the optimal way to go.
And 20% is a, a prettygood a pretty good number.
They find don't, don't optimizeall of those activations,
just optimize some of them.
And at least for me, thatseemed counterintuitive.
Like why wouldn't you wanna optimizethe full set of activations?
(01:14:35):
It turns out, you know,a couple of reasons.
One is just optimization stability, right?
So if you're updating everything,there's a risk that you're just gonna
go too far off course and you need tohave some anchoring to the original
meaning of the chain of thought.
So you don't yeah.
Steer way off.
And then there's issues ofrepresentational capacity.
So just having enough latentrepresentations to allow you
(01:14:55):
to do effective extrapolation.
Anyway, this is a really, I think,interesting and important paper.
Wouldn't be surprised to find it turn intoanother dimension of test time scaling.
So yeah, just thought, thoughtit was worth calling out.
Yeah, it's, it's interesting in asense of, I don't know, it's, it's
like you have an auxiliary model or youcould conceptually have an auxiliary
(01:15:18):
model that's just for evaluatingthis, like in-between activation
and doing sort of side optimizationwithout updating your main model.
something about, it seems a bit strangeconceptually and, and maybe there's
like equivalent versions of this, butthat's just a gut feeling somehow I get,
and next we have two experts areall you need for steering thinking.
(01:15:44):
Reinforcing cognitive effort in MOEreasoning models without additional
training is the title of this paper.
So this is a way to improve reasoningin mixtures of experts model
it without additional training.
Mixture of experts is when you havea model that sort of splits a work
(01:16:04):
across subsets of it, more or less,and they are aiming to focus on
and identify what we are callingcognitive experts within the model.
So they're looking for correlationsbetween undesirable reasoning behaviors
and the activation patterns of specificexperts in NME mixtures of experts models.
(01:16:30):
So basically just large languagemodels that have mixtures of experts.
And then when they find the expertsthat turn out to have the best kind of
reasoning behavior, they amplify thoseexperts in the computation of the output.
And typically the, they make sure ofexperts work is you like, route your
(01:16:53):
computation to a couple of experts andthen you sort of average out the outputs
of those experts to decide what to output.
So conceptually you can sort of givemore weight to certain experts or route
the data to certain experts more often.
So when they find these, have thesetheoretical cognitive experts, they show
(01:17:18):
that in fact, this seems to be somethingthat can be done in practice for L LMS
that have MOE for reasoning applications.
Yeah.
And it's, it's kind of like, I wanna sayembarrassingly simple how they go about
identifying what are the experts, whatare the components of the model that
are responsible for doing reasoning?
(01:17:40):
And so it turns out, when you look at theway deeps, SEEQ, R one is trained, right?
It's trained to put, itsthinking, it's reasoning between
these thinking tokens, right?
So they kind of have, it's likeHTML if you're familiar with that.
Like, you know, you have likebracket, think bracket and then your
actual thinking text and then closebracket think bracket what they end
(01:18:02):
up doing is they say, okay, well,like, let's see which experts I.
Typically get activated on the thinkingtokens, and it turns out that it's
only a small number that consistentlyget activated on the thinking tokens.
So, hey, that's a pretty goodin that those are the experts
involved in, the reasoning process.
So the way they test that intuitionis they say, okay, well if that's
(01:18:22):
true, presumably, like you said,Andre, if, if I just dial up the
contribution of those experts ofthe reasoning experts on any given.
Prompt that I give them, then Ishould end up seeing more effective
reasoning, or at least a greaterinclination towards reasoning behavior.
That's exactly what happens.
(01:18:44):
So this is pretty, I, I would'velike, it's, this happens so often,
but like, I would've been embarrassedto suggest this idea, like in a, it
just seems so obvious, and yet theobvious things are the ones that work.
And in fairness, they onlyseem obvious in hindsight.
This is obviously a, a very good idea.
Anyway, so they, they use a, a, a metriccalled pointwise Mutual information
(01:19:04):
to measure the correlations betweenexpert activations and reasoning tokens.
It's actually a pretty simplemeasure, but it, there's no
point going over it in detail.
one interesting thing is there'scross-domain consistency though.
So the same expert pairs consistentlyappeared as the top reasoners the top
cognitive experts across a whole bunchof domains, math, physics a bunch of
stuff, which really does suggest that theyencode general reasoning capabilities.
(01:19:28):
I wouldn't have bet on this,like the idea that there is.
An expert in an MOE thatis the reasoning guy.
One thing, they don't touch on thisin the paper, but I would be super
interested to know how are thedifferent so-called reasoning experts?
I. Different, right?
So like they're saying there's tworeasoning experts basically in this
model that you need to care about.
(01:19:49):
So how do, like what, in what waysdo their behaviors differ, right?
What is, what are the differentkinds of reasoning that the model
is capable of or, or wants to dividebetween two different experts?
I think that'd be really interesting.
Anyway, so a whole bunch of other stuff wecould get into about computer efficiency.
But there is no time.
There is no time.
We have quite a, a fewmore papers to discuss.
(01:20:13):
So a lot of research also this week
and next one is anotherGemini related paper.
It's lessons from defending Geminiagainst indirect prompt injections.
Coming from Google.
Quite a detailed report,something like I think 16 pages.
No, actually like dozens of pagesif you include the appendix with,
(01:20:37):
yeah, all the various details.
The gist of it is.
You're looking at indirectprompt objections.
So things like embedding data in awebsite to be able to get an AI agent
that's been, you know, directed to gooff and do something go off course.
(01:20:57):
And the short versionI'll provide us a summary.
And Jeremy, you can add more detailsas you think you know is appropriate,
is that they find that it is possibleto apply known techniques to do
better so you can protect againstknown attacks and do that via
(01:21:20):
adversarial, fine tuning, for instance.
But the high level conclusion is thatthis is an evolving kind of adversarial
situation where you needs to essentially.
Be continually on it and see whatare these new attack techniques
(01:21:42):
to be able to deploy new defensetechniques as things evolve.
I, I think that's a, a great summary,especially given time constraints.
Yeah, the I'll justhighlight two quick notes.
So first is they find adaptiveevaluation of threats is critical.
So a lot of the defenses that do reallywell on static attacks can be tricked
(01:22:04):
by really small adaptations to attack.
So tweak an attack very slightlyand then it suddenly works, right?
So this is something that,anyway, we see all the time.
and then there's this other notionthat if you, if you use adversarial
training to help your models get morerobust to these kinds of attacks that's
gonna cause the performance to drop.
And what they find is that'sactually not the case.
(01:22:26):
One of the most interesting things aboutthis paper is just like the list of
attacks and defenses to prompt injectionattacks that they, they go over.
I'm gonna mention oneand then we will move on.
But it's just called thespotlighting defense.
I actually had never heard of this before.
So if you have an attacker who injectsa prompt into, or some, some dangerous
text into a prompt, like ignore previousinstructions and, do some bad thing.
(01:22:50):
The spotlighting defense, whatit does, it will insert what
are known as control tokens.
So they're basically just like newdifferent kinds of tokens at regular
intervals that just break up yourtext so that, you know, ignore
previous instructions, get split upand you have, you know, IG and then
control token and then no, and thenpre, and then another control token.
(01:23:11):
And it, it just has a way of and thenyou tell the model, sorry, in the
prompt you tell it to be skepticalof text between those control tokens.
And so that kind of teaches themodel to, you know, be a little
bit more careful about it.
And it has anyway,really effective results.
There's a whole bunch of otherdefenses and attacks they go into.
If you're interested in the attack defensebalance and the zoo of possibilities
(01:23:32):
there, go check out this paper.
It's a good catalog.
Next up we have from Epic ai, how FASTCAN Algorithms, advanced Capabilities,
so this is a blog post associated witha previously released paper titled LLM.
E guess can LLM capabilitiesadvanced without hardware progress?
(01:23:55):
The motivation of the research isbasically asking the question of can we
find software improvements that yieldbig payoffs in terms of better accuracy?
So it ties into this hypothesisthat if LLMs get good enough at
conducting good AI research, they canfind breakthroughs to self-improve.
(01:24:21):
And then you get this circle calledintelligence explosion, where you, the
VMs get better at research, they findnew insights as to how to train better
LLMs, and then the better lms keep findingbetter algorithmic insights until you
become super, super ultra intelligent.
And this is one kind of commonly.
(01:24:44):
Believe hypothesis as towhy we might get what is it?
SAI, super intelligent ai.
A SI, yeah, A-S-I-A-S-I relatively soon.
So this blog post is essentially tryingto explore how likely that scenario
is based on the trajectory and historyof Val Go make progress so far.
(01:25:07):
And the gist of their conclusion isthat there are two types of investments.
They are compute, dependent andcompute independent insights.
So there are some insightsthat only demonstrate their,
true potential at large scales.
Things like transformers, mixtures ofexperts sparse attention that, you know,
(01:25:30):
with smaller models when you're testingmay not fully show you how beneficial
they are, how promising they are.
But as you scale up, youget way, way stronger.
Benefits of like 20 times a performance,30 times a performance versus smaller
things like layer norm, where you canreliably tell that this algorithmic
(01:25:54):
tweak is gonna improve your model.
You know, and you can verify that at ahundred million parameters instead of 10
billion parameters or a hundred billionparameters, meaning that you can do
research and evaluate these things withoutlike ultra large hardware capacity.
So.
(01:26:14):
The, the basic conclusion of a paperis that the idea that you can get
intelligence explosion needs to be aresult of finding these compute dependent
algorithmic advances being easier to find.
So you need to find the advancementsthat as you scale up compute will
(01:26:36):
yield like big, big payoffs ratherthan relatively small payoffs.
Yeah.
The, the frame is that these, so thesecompute dependent advances are you, like
you said, you only see the, the return oninvestment at large scales for the full
return on investment at large scales.
And they point out that when you lookat the the, the boosts in algorithmic
(01:26:57):
efficiency that we've seen over the years.
These are dominated bycompute dependent advances.
So you look at the transformer,the MOE mixed query attention,
sparse attention, these thingscollectively, they're like 99% of
the compute efficiency improvements.
We've seen 3.5 x according to themfrom compute independent improvements
(01:27:17):
like flash attention and rope.
But but they don't hold the candle tothe these like approaches that really
leverage large amounts of compute.
And so I think in their minds, thecase that they're making is like.
You can't have a software onlysingularity if you need to like
leverage giant amounts of physicalhardware to test your hypothesis, to
(01:27:38):
validate that your new algorithmicimprovement is actually effective.
You need to actually work in thephysical world to gather more hardware.
I think this frankly doesn't dothe work that it thinks it does.
There, there are a couple of issues withthis and, and actually Ryan Greenblatt on
x has a, a great tweetstorm about this.
By the way, first of all, love that epic.
KI is doing this.
(01:27:59):
Really important to have theseconcrete numbers so that they can
facilitate this sort of debate.
But I think the key thing here is sothey highlight, look, transformers,
transformers only kind of give youreturns at outrageous, or, sorry,
give you the greatest returnsat outrageous levels of scale.
So therefore they're acompute dependent advance.
I don't think that'swhat actually matters.
I think what matters is, wouldan automated software only
(01:28:22):
process have discovered thetransformer in the first place?
And to that, I think the answer isactually probably yes, or at least
there's no clear reason that it wouldn'thave, in fact, the transformer, I.
MOE mixed query attention.
They were all originally found atTiny Scale as Ryan points out about
one hour of compute on an H 100 GPU.
(01:28:43):
So that's like quite small even, youknow, even back in the day in relative
terms, it was certainly doable.
And so this is like the, the actualquestion is do you discover things
that give you a little lift thatmakes them seem promising enough to
be worthy of subsequent investment?
The answer seems to be that actuallybasically all of the advances that
they highlight is the most important.
(01:29:04):
Compute dependent advances havethat property we're discovered
at far, far lower scale.
And we just keep investing inthem as they continue to, to
kind of show promise and value.
And so it's almost like, you know,any startup, you keep investing more
capital as it shows more traction.
Same thing.
You should expect the decisiontheoretic loop of a software only
singularity to like, to latch onto that.
(01:29:24):
'cause that's just good decision theory.
So anyway, I, I think this isa really rich area to dig into.
I have some issues as wellwith their, their frame.
They, they look at a deep seek andthey kind of say that the deeps seek
advances were all kind of computeconstrained advances or compute dependent.
But again, the whole point ofdeep seek was that they used
such a small pool of compute.
(01:29:44):
And so I almost wanna say like.
To the extent that compute independentmeans anything deep seek, a lot of
their advances really should be viewedas statutorily compute, independent.
Like the point is that theyhad very little compute.
This is actually a great testbed for what a software only
process could unlock potentially.
So lots of stuff there.
You can look into it.
I think it's, it's a great reportand and great room for discussion.
(01:30:07):
Yeah, I think it's, it's kindaintroducing the conceptual idea of compete
dependent versus independent algorithms.
And then there are.
Questions or ideas you can extrapolate.
Last paper really quickly.
I'll just mention without goinginto depth, there is a paper titled
Reinforcement Learning Fine Tunes, smallSub-Networks in Large Language Models.
(01:30:31):
the short, short version is whenyou do alignment via reinforcement
learning that turns out to update asmall number of the model parameters.
Something like 5% or sorry, 20%versus doing supervised fine tuning.
You update all the weightsas you might expect.
(01:30:54):
So this is a very strangeand, and interesting.
Kind of behavior of reinforcement learningalignment versus supervised alignment.
And I figured should just mentionit as an interesting paper,
but no time to get into it.
So moving on to policy and safety.
We have first and exclusive witha report on what OpenAI told
(01:31:21):
California's Attorney General.
So this is, I suppose, a leak or, orperhaps, I don't know, a demonstration
of this response to petition for attorneygeneral action to protect the charitable
nature of open AI's sent to the AttorneyGeneral on May 15 by OpenAI basically has.
(01:31:45):
All their arguments in a position tothe groups that want to stop OpenAI from
restructuring and really just restatingwhat we've been hearing a whole bunch.
You know, Musk is just doing this as acompetitor and is being, is harassing
(01:32:06):
us and has misinformation and basicallysaying, you know, ignore this petition
to block us from doing what you want.
It isn't valid.
Yeah, and, and it's, so there are a wholebunch of interesting contradictions in
there as well with some of the claimsopening AI's been made, or at least the
vibes they've been putting out, whichis pretty standard open AI fair that
(01:32:28):
they'll, you know, they get, they tryto get away with a lot, it really seems.
And there's a lot ofexamples of this here.
So one item is so they, theysuggested the nonprofit.
By the way, so some of this is revealinginformation, material information about
the nature and structure of the deal.
This sort of nonprofit transition thingthat was not previously public, right?
(01:32:52):
So opening AI recently came out andsaid, look, this whole plan we had
of having the for-profit kind of getout from under the control of the
nonprofit, we're, we're gonna scrap that.
Don't worry guys, wehear you loud and clear.
There are now a bunch of caveats.
We, we highlighted, I think lastweek that there would be caveats.
The story is not as simple asOpenAI has been making it seem.
(01:33:14):
A lot of people have kindof declared victory on this.
It said, great, you know, thenonprofit transition isn't happening.
Let's move on.
But hold on a minute.
This, this is OpenAI doingtheir usual best to to kind of
control the, the PR around this.
And, and they have donea good job at that.
So here's a quote and let me justreally quickly mention for context.
This is partially in reply to thisnot for private gain coalition
(01:33:39):
that has a public letter.
They released a public letter in April 17.
They updated their letter in May 12th inresponse to on May 5th, OpenAI announcing
that they're kind of backing off fromtrying to go full for profit with this
new plan of the public benefit corporationand the kind of not going for profit.
(01:34:04):
So this not for gain, a coalitionupdated their stance and
essentially has criticism still.
And this letter on May 15th is inresponse to whole chain of criticism.
Yeah.
So if it wasn't complicatedenough already, already yeah.
And, and so here's, here's a linefrom the opening eye statement here.
(01:34:27):
The, the nonprofit will exchange itscurrent economic interests in the capped
profit for a substantial equity stake.
I. In the new Public BenefitCorporation and will enjoy access
to the public benefit corporation's,intellectual property and
technology personnel and liquidity.
That sounds like a goodthing until you realize that.
(01:34:47):
Well, wait a minute.
The nonprofit did not justenjoy access to the technology.
It actually owned or controlledthe underlying technology.
So now it's gonna justhave a license to it.
Just like opening Eyescommercial partners.
That is a big, big caveat, right?
That is not consistent with reallythe, the spirit potentially, but
certainly the, the facts of the matterassociated with the previous agreement
(01:35:08):
as I understand them under the.
The current structure opening AI's,LLC, so the, the sort of main operating
agreement explicitly states that thecompany has a duty to its mission and the
principles advanced in the OpenAI chartertake precedence over any obligation to
generate a profit that creates a legallybinding obligation on the directors of
(01:35:30):
the company, the company's management.
Now, under the new structure, though,the directors would be legally required
to balance shareholder interests withthe public benefit purpose, and so.
This is like the fundamental obligations,legal duties of the directors is now going
to be to shareholders over, or potentiallyalongside, I should say, the mission.
(01:35:51):
And that shift is probably a bigreason why investors are more
comfortable with this arrangement.
We heard SoftBank say, you know, look,from our perspective, everything's fine.
After they had said, OpenAI has gotto get out from under its nonprofit
in order for us to, to keep ourinvestment in, and now they're making
these noises like they're satisfied.
So clearly for them, defacto,this is what they wanted.
(01:36:12):
Right?
So there's something going onhere that doesn't quite match up
and, and this is certainly part ofit or at least seems like it is.
By the way, no publicbenefit company in Delaware.
Garrison Lovely, who's the author of thissays no Delaware PBC has ever been held
liable for failing to pursue its mission.
Their legal scholars can't find a singlebenefit enforcement case on the books.
(01:36:33):
So.
In practice, this is avery wide latitude, right?
There's a lot of shitthat this could allow.
in this letter, they're trying toframe all the criticism of this very
controversial and I think prettyintuitively kind of inappropriate attempt
to, to try, you know, convert the thekind of nonprofit or the, all that jazz.
(01:36:56):
They're trying to pin it on Elon andsay basically he's like the only critic
or that that's sort of the frame.
Just 'cause it's easy to dismiss him asa competitor and for political reasons.
He's a, he's an easy whipping boy.
But there's a wholebunch of stuff in here.
You know, there, there's like, I'll,I'll just read one last one last
excerpt 'cause we gotta go but openAI's criticism of the coalitions.
(01:37:18):
This is the coalition you referred toAndre April 9th letter is particularly
puzzling the company faults, thecoalition for claiming that quote.
OpenAI proposes to eliminate anyand all control by the nonprofit
over open AI's core work.
This criticism is perplexing because asOpenAI itself later demonstrated with
its May 5th reversal, that was preciselyopening AI's publicly understood plan.
(01:37:42):
At the time the coalition made itsstatement, the company appears to
be retroactively criticizing thecoalition for accurately describing
open AI's proposal as it stood.
So you could be forgiven for seeing alot of this as kind of manipulative,
bad faithy comms from OpenAI,especially given that this letter
was not meant to be made public.
And it fits, unfortunately, apattern that we have seen the
(01:38:05):
people, at least many people believe,they have seen many times over.
We'll see where it all goes, butthis is a, a thorny, thorny issue.
Yeah, I think we've gotten hints atkind of the notion that open the eye
legally, I. Has tried to be aggressivenot just legally, also publicly in
terms of arguing with Musk and so on.
And we only have time for one morestory, so we're gonna do that.
(01:38:29):
We have activating ai, safety levelfree protections from philanthropic.
So Anthropic has their responsiblescaling policy that sets out various
thresholds for when they need to.
Have these safety level protectionswith additional safety levels
(01:38:50):
requiring a greater scrutiny morestringent processes, et cetera.
So with Claude Opus four, they are nowimplementing these AI safety level three
measures as a precautionary measure.
So they've said, we are not sure if Opusfour is kind of at the threshold where
(01:39:11):
it would be dangerous to the extentthat we need this set of protections,
but we are gonna implement them anyway.
And this comes with a kindof a variety of stuff.
They're committing to do it.
They are making it harder to Jailbreak.
They are adding additionalmonitoring system.
(01:39:32):
They have a bug bounty program,have synthetic jailbreak data
security controls, making sure theweights cannot be stolen and so on.
Got quite a few things.
They released A PDF with theannouncement that is you know, like
something like a dozen pages withadditional details in appendix.
(01:39:55):
Yeah.
And so the specific thing that'scausing them to say, we think we, we are
flirting with the ASL three thresholdis the the bio risk side, right?
The ability they think potentially of thismodel to significantly help individuals
with basic technical backgrounds,like we're talking undergraduate STEM
degrees to create or obtain and deploy.
(01:40:17):
Biological weapons.
Right?
So that's really where, wherethey're at here specifically.
This is not I don't think associatedwith the autonomous research or
autonomous autonomy risks thatthat they're also tracking.
But we got early glimpses ofthis, right, with SONET 3.7.
I think the language theyused it was either ANTHROPIC
or open AI with their model.
It was sort of similar was we areon the cusp of that next risk risk
(01:40:40):
threshold where really it's kindof similar whether you look at the
open AI preparedness framework orphilanthropics L three in terms of how
they define some of these standards.
The security measures are reallyinteresting especially kind of given
our work on the data center securityside and the cluster security side.
One of the pieces.
And this echoes a re recommendation ina, in a RAND report on securing model
(01:41:02):
weights that came out over a year ago now.
They have implemented preliminaryegress bandwidth controls.
So this is basically restricting theflow of data out of secure computing
environments where AI model weights are.
So literally like at the hardwarelevel, presumably that's at least
how I read this making it impossibleto get more than a certain amount of
(01:41:25):
bandwidth to pull data of any kindout of your, outta your servers.
That's meant to make it so that ifsomebody wants to steal a model,
it takes them a long time, atleast if they're gonna use your
networks, your infrastructure.
And there are ways to kind of calculatewhat the optimal bandwidth would be
under certain conditions for that.
But that was kind of interesting.
That's a, a, a big piece of reallyr and d that they're doing there.
(01:41:47):
Also a whole bunch of managementprotocols and point software controls,
and there's a bunch of stuff here.
this is a big leap, right?
Moving to, to a L three.
So this is fundamentally increasing.
Th it means that they're concernedabout threat actors, like terrorist
groups and organized crime.
That, that they would start to derivea lift a, a significant benefit
(01:42:08):
potentially from accessing Anthropics ip.
They are not, you know, ASL three doesnot cover nation state actors like China.
So they're not pretending that they candefend against that level of, of attack.
It's, it's sort of likeworking their way there.
As their models get more powerful,they wanna be able to defend against
a higher and higher tier of adversary.
so there we go.
(01:42:28):
Curious to see what the otherlabs respond with as, as their
capabilities increased too.
Yeah.
And, and we're seeing hints thatmaybe we'll cover more next week,
that and we've already covered tosome extent that these reasoning
models, these sophisticated models.
Are maybe harder to align and, andare capable of some crazy new stuff.
So this also makes sense for that.
(01:42:50):
Yeah.
But we are gonna call it with that forthis episode, thank you for listening.
As always, we appreciateyou sharing commenting and
listening more of an anything.
So please do keep tuning in.