All Episodes

April 1, 2025 94 mins

Our 205th episode with a summary and discussion of last week's big AI news! Recorded on 03/28/2025

Hosted by Andrey Kurenkov and Jeremie Harris. Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai

Read out our text newsletter and comment on the podcast at https://lastweekin.ai/.

Join our Discord here! https://discord.gg/nTyezGSKwP

In this episode:

  • OpenAI's new image generation capabilities represent significant advancements in AI tools, showcasing impressive benchmarks and multimodal functionalities.
  • OpenAI is finalizing a historic $40 billion funding round led by SoftBank, and Sam Altman shifts focus to technical direction while COO Brad Lightcap takes on more operational responsibilities.,
  • Anthropic unveils groundbreaking interpretability research, introducing cross-layer tracers and showcasing deep insights into model reasoning through applications on Claude 3.5.
  • New challenging benchmarks such as ARC AGI 2 and complex Sudoku variations aim to push the boundaries of reasoning and problem-solving capabilities in AI models.

Timestamps + Links:

  • (00:00:00) Intro / Banter
  • (00:01:01) News Preview
  • Tools & Apps
  • Applications & Business
  • Projects & Open Source
Mark as Played
Transcript

Episode Transcript

Available transcripts are automatically generated. Complete accuracy is not guaranteed.
(00:11):
Hello and welcome to thelast week AI podcast.
We can hear us chat about what'sgoing on with ai As usual.
In this episode, we will summarizeand discuss some of last week's most
interesting AI news, and you can go to.
To the description of theepisode for all the timestamps,
all the links and all that.
I am one of yourregular hosts, Andre Ov.

(00:32):
I studied AI in grad school and Inow work at a generative AI startup.
And I'm your other host, Jeremy Harris.
I'm at Gladstone AI doingAI national security stuff.
This has been a crazy hecticday, a crazy hectic week.
So I'm, I'm gonna say rightoff the bat, there's a big, big
anthropic story that I have notyet had the chance to look at.

(00:54):
Andre, I know you've done a bit of adive on it, so I'm gonna maybe punt
my, my thoughts on it to next week.
But yeah, this is,has just been a while.
Yeah.
It's been a wild weekfor the last couple ones.
You know, we had slightly morequiet stuff about anything huge.
And then this week, multiple hugestories coming out and really being

(01:14):
surprising and, and actually quite abig deal, I think not since rock free
and, and that like slate of models.
Cloud 3.7 had we had aweek that was this big.
So it's an exciting week and we'regonna probably dive straight into it.
Let me give a quick preview ofwhat we will be talking about.
Tools and apps.
We have Gemini 2.5 coming out andkind of steamrolling everyone's

(01:38):
expectations, I would say.
And we have imagegeneration from GBD four oh.
By open ai, similar to what we sawwith Gemini taking image generation.
To the transformer, gettingrid of diffusion seemingly,
and being like mind blowing.
Then we go to applications inbusiness, OpenAI, getting some

(02:00):
money, and a few stories related tohardware projects and open source.
Some very exciting new benchmarkswhere we continue to try to
actually challenge these newmodels, research and investments.
As you said, philanthropic hasa really cool interpretability
paper that we will start talkingabout, but there's a lot to unpack.
So we might get back to it next week.

(02:22):
And then policy and safety some kindof smaller stories related to what the
federal government in the US is doing.
And actually some updates on copyrightlaw stuff in the last section.
So a lot to get through.
We'll probably be talkinga little bit faster than we
are in our typical episodes.
Maybe what's good we'll see if, if we'reable to keep it a bit more efficient.

(02:46):
So let's get straight to it.
Tools and apps.
We have Gemini 2.5, what Google iscalling their most intelligent AI model.
And this is one of their Iguess, slate of thinking models.
Previously they had Gemini 2.0flash thinking that was kinda

(03:06):
a smaller, faster model here.
Gemini 2.5 is representingtheir bigger models.
We had Gemini 2.0 Pro previously,which came out as their biggest
model, but at the time it waskind of not that impressive, just
based on the benchmarks, basedon people's using it and so on.
So Gemini 2.0 came out liketopping the leaderboards by a

(03:29):
good margin, which we haven'tseen for a while with benchmarks.
Like their performance on thebenchmarks is significantly higher Yeah.
Than the second best one on, on prettymuch any benchmark you can look at.
Even ones that seem saturated to me.
And not only that, just based on a lotof anecdotal reports I've been seeing
in terms of its capacity for thingslike coding compared to Claude for its

(03:53):
capacity for writing, problem solving.
It's just like another kind of class,a model that is able to one shot
just given a task, nail it withouthaving to get feedback or having to
do multiple tries, things like that.
So, yeah.
Super impressive.
And I, I, to me, kind of asurprising leap beyond what

(04:17):
we've had, I. Yeah, absolutely.
And I think one of thesurprising features is where
it isn't soda quite yet, right?
So SW bench verified, right?
it's actually a benchmarkthat OpenAI first developed.
You had SW bench essentially realworld ish software engineering tasks.
bench verified is the cleanedup OpenAI version of that.
So on that benchmark Claude 3.7sonnets still number one, and

(04:40):
by quite a bit like this is.
Pretty rare looking at 2.5, whichjust crushes everything else in
just about every other category,but still that quite decisive edge,
basically 6% higher in performance onthat benchmark is Claude 3.7 sonnet
still, but that aside, Gemini 2.5Pro is, I mean, it, it just, just
crushing, as you said, so many things.

(05:01):
One of the big benchmarks a lot ofpeople are talking about is this sort
of famous humanities last exam, right?
This is the, the benchmark that DanHendricks Elon's AI advisor who works at
the Center for AI Safety put together.
I mean, it is meant to be justridiculously hard reasoning questions
that call on general knowledge and,and reasoning at a very high level.
Previously OpenAI oh threemini was scoring 14%.

(05:23):
That was soda.
Now we're moving that up to 18.8%.
We're gonna need a new namefor benchmarks that don't
make them sound as final ashumanity's last exam, by the way.
But we're on track right nowto, I mean, like, we're gonna be
saturating this benchmark, right?
That's gonna happen eventually.
This is a meaningful step in thatdirection and things have been moving
really quickly with inference timereasoning, especially on that benchmark.

(05:44):
But a couple things I guessto highlight on this one.
Google coming out and saying, look,this is, so, by the way, this is
their first 2.5, Gemini 2.5 release.
It's an experimental version of 2.5 Pro.
What they're telling us is goingforward, they are going to be doing
reasoning models across the board.
So like, OpenAI don't expectmore base models to be released

(06:06):
as such anymore from DeepMind.
So everybody kind of migrating towardsthis view that like, yep, the default
model should now be a reasoning model.
It's not just gonna be,GPT-4 0.5 and and so on.
It's really gonna be reasoning driven.
And the stats on this are pretty wild.
There's so much stuff.
I'm, I'm trying to justpick one for, for one.
I mean, it tops the LM Arenaleaderboard, which is a cool kind of

(06:27):
rolling benchmark because it, it looksat human preferences for LLM outputs,
and then it gives you essentiallylike a, an ELO score for those and
pretty wide margin for Gemini 2.5.
So subjectively getting really goodscores, as you said, Andre, this is kind
of like the, the measured subjectivityside on suite bench verified.
63.8 is really good, especially given,you know, even though it comes in

(06:50):
second places sonnet, when you lookat the balance of capabilities, this
is a very wide capability envelope.
They do say they specificallyfocused on coding, so again,
still kind of interesting thatthey fall behind 3.7 sonnet.
Maybe last spec to mention hereis it does ship today with a 1
million token context window.
So in the blog post announcing this,Google made a big stink about how they

(07:13):
see one of their big differentiatorsas a large context, and they're gonna
be pushing to 2 million tokens ofcontext soon as well, apparently.
Right.
And that, that is a significantdetail because 1 million I,
I haven't been keeping track.
I, I do think we had Claude Opus inthat space of very large context.
No, but 1 million is stillvery impressive and going to

(07:37):
2 million is, is pretty crazy.
Again, like you keep having totranslate how big 1 million token is.
Well that's, I don'tknow, a few million words.
Or maybe slightly lessthan a million words.
'cause yeah, maybe 700,000 or something.
Maybe 700,000, 2 millionwould be over a million.
It's a lot of content.

(07:57):
You can fit in an entiremanual, an entire set of
documents, et cetera in there.
And of course, as with our Geminithings it is multimodal takes
in text, audio, images, video.
I've also seen reports of itbeing very capable of processing
audio and images as well.
And to that point of, it's startingto roll out as an experimental model.

(08:21):
You can already use it in aGoogle AI studio if you're
paying for Gemini and advanced.
You can also select it in amodel, dropdown it, just try it.
And that's part of what, how we'vebeen seeing people try it and,
and report really good outcomes.
So very exciting.
And out to next story, alsosomething very exciting and also

(08:44):
something that's kind of mindblowing to an unexpected degree.
So opening AI has rolled out imagegeneration powered by GP four O.
Two chat GT To my understanding, andI'm not totally sure this is exactly
the right details, but similar to Geminitwo last week, it was it last week or

(09:05):
two weeks ago, I don't know from Google.
The idea here is instead of havinga separate model that is typically
a diffusion model where VLM islike, okay, let me give this prompt
over to this other model that isjust text to image and that will
handle this and return the image.
This is taking the full kind ofend-to-end approach where you have a

(09:28):
multimodal model able to take in textand images, able to put out text and
images just via a set of tokens, andas a result of moving to this approach
of not doing diffusion, doing full on,token language modeling, I. These new
category really of text to image modelsor image plus text to image models

(09:53):
have a lot of capabilities we haven'tseen with traditional text to image.
They have very impressiveediting right out of a box.
They have also very, very good abilityto generate text, a lot of text in
an image with very high resolution.
And they seem to just really be capableof very strict prompt adherence and

(10:18):
making very complex text descriptionswork in images and, and be accurate.
And we've also discussed howwith image models it's been
increasingly hard to tell thedifference or like see progress.
Yeah.
But I will say also, you know,especially with Dali, I. And to
some extent also revolver models.

(10:40):
There has been a sort of like prettyeasy telltale sign of AI generation
with it having a sort of AI style,being a little bit smooth being, I
don't know, sort of cartoony in a veryspecific way, especially for Dali.
While this is able to do all sorts ofvisual types of images, so it can be

(11:02):
very realistic, I think, differentlyfrom what you saw with Dali from OpenAI.
And it can do, yes, just all sorts ofcrazy stuff similar to what saw what we
saw with Gemini in terms of very goodimage editing in terms of very accurate
translation of instructions to image.
But in this case, I think even moreso, just the things people have been

(11:24):
showing have been very impressive.
Yeah.
And I, I think a couplethings to, to say here.
First of all astute observers orlisteners will note, last week
we covered grok, now foldinginto its offering internally an
image generation service, right?
So this, this theme of the omni modal.
At least platform, right?
Rock is not necessarily going to makeone model that can do everything.

(11:46):
Eventually.
I, I'm sure it will.
But we're, we're making kindof baby steps on the way there.
This is OpenAI kind of doingtheir version of this and going
all the way omni modal with,with one model to rule them all.
You know, big, big strategic riskif you are in the business of doing
text image or audio to whatever, likeassume that all gets soaked up and
because of positive transfer, whichdoes seem to be happening, right?

(12:09):
One model that does many modalitiestends to be more grounded, tends to
be more capable at any given modalitynow just because it benefits from that
more robust representational space.
'cause it has to be able to representthings in ways that can be decoded
into images, into audio, into text.
So just a much more robustway of doing things.
One of the key wordshere is binding, right?

(12:29):
One of the key capabilities of thismodel, it's binding, is this idea
where essentially looking at how well.
Multiple kind of relationshipsbetween attributes and objects can
be represented in the model's output.
So if you say, you know, draw me ablue star next to a red triangle, next
to a green square you wanna make surethat blue and star are bound together.

(12:53):
You want to make sure thatred and triangle are bound
together faithfully and so on.
And that's one of the thingsthat this model really,
really does well, apparently.
So apparently it can generate correctlybound attributes for up to 15 to 20
objects at a time without confusion.
This, in a sense, is the text toimage version of the needle in
a haystack eval, right, where wesee like many different needles.

(13:13):
In the haystack.
Well, this is kind of similar, right?
If you populate the context window witha whole bunch of these relationships,
can they be rep or represented, let'ssay with fidelity in the output?
The answer, at least for 15 to 20 obobjects in this case and relatively
simple binding attributes is yes.
Right?
So that's kind of one of the, thekey measures that actually there's
something, something different.
I wouldn't be surprised if this isa consequence of just having that

(13:35):
more robust representational space.
You know, that comeswith an omni modal model.
One other thing to highlight hereis we do know that this is an
auto aggressive system, right?
So it's generating images sequentiallyfrom left to right and top to bottom.
In the same way that text is trainedand generated in these models.
That's not gonna bea coincidence, right?
If you want to go omni modal, you needto have a common way of generating

(13:58):
your, your data, whether it'svideo, audio, text, whatever, right?
So this is them saying, okay, we'regoing auto aggressive, presumably auto
aggressive transformer to just do this.
So, pretty cool.
There's a whole bunch of, anyway, coollittle demos that they showed in their,
in their launch worth checking out.
One last little note here is they,they're not including any visual water

(14:19):
markings or indicators that show thatthe images are AI generated, but they
will include what they call theirstandard C two PA metadata to mark the
image as having been created by open ai.
Which we, we've talkedabout that in the past.
If you're, if you're curious aboutthat, go check out those episodes.
But yeah, so OpenAI kind oftaking a bit of a middle ground
approach on the watermarking side.

(14:40):
Yeah.
And they also are saying there'll besome safeguards si certainly compared
to things like rock where you won'tbe able to generate sexual imagery.
You won't be able to forinstance, have politicians with
guns or something like that.

(15:01):
Of course, you're gonna be able to getaround these safeguards to some extent.
But certainly a more controlledtype of model as you would expect.
Last thing I'll also say is you'veseen a ton of different use cases
for this popping up on social media.
The one you may see covered in mediais the ification of images, where
it has turned out that you can takein a photo and tell the system to

(15:26):
translate it to a Ghibli style.
Ghibli is a very famous animation studiofrom Japan, and it does a very good
job, like a very faithful rendition.
Definitely looks like Ghibli.
And that kicks off its wholeset of discussions as to, again,
AI for what it means for art,you know, the ethics of it.

(15:51):
There are also discussions asto what this means for Photoshop
because it can do image editing.
It can do design.
You know, again, this is, I think,a surprising thing where we haven't
talked about text image as being mindblowing in a little while and it kind
of seemed to Plato for a while, andnow it is to me, certainly mind blowing
again to see the stuff you can do.

(16:14):
Onto lining round, and we actually havea couple more image generators to cover.
I don't know if it's decidedto come out at the same time
or what, but there are a few.
Starting with Igram, they are presentingthe version free of their system.
Igram is one of the leading textimage focused businesses out there.

(16:34):
Early on, their claim to fame wasbeing able to handle text better.
But these days, of course,that's not the case.
They say that this 3.0 of theirsystem is able to create better
realistic and stylized images.
In particular, they have the abilityto upload up to three reference

(16:55):
images to guide the aesthetic output.
And there's 4.3 billion style.
Presets.
So I think this reflects idealground being a bit more of a business
and this being more of a productof them, like as a primary focus.
So again now with GP four, oh, this isnowhere near that, but for specialized

(17:19):
use cases, it could be still the casethat something like ideogram can, you
know, hold on for a while, we'll see.
You can almost hear yourself arguingthe, the tam, the total addressable
market size for these products downand down and down as chat GPT as
all the big players kind of growand grow and grow their their own.
Tam this is one of the problems we'vetalked about a long time on the podcast.

(17:42):
I think I think Idea Graham isremain to be proved wrong here
and, and expect to look stupid forany nu number of reasons as usual.
But I think Idea Graham is, isdead in, in the, the medium term.
Like a lot of of companies inthis space look, they do say
4.3 billion style presets.
We of course, are extremely competent.
AI journalists have tested everysingle one and can report that

(18:04):
they are pretty good actually.
you're saying, Andre, that thetext in image feature is a kind of
lower value thing now because thecompetition a hundred percent the case.
This is why idea is now I. Choosingforced to maybe emphasize photorealism
and professional tools, right?
That's kind of what they'remaking their niche, but they're
gonna get more and more niche.

(18:24):
This is gonna keep happening as theirterritory gets encroached on by the the
sort of blessings of scale that, thatthe true hyperscalers can benefit from.
So very cool.
But kind of overshadowed by GBT four.
I will say one last point, it couldstill be the case that as this
specialized sort of model or businessfor this case, where they focused

(18:46):
on, let's say business use cases for,I dunno, posters, maybe they have
training data, but allows them tostill be better for a particular niche.
I don't know.
I think opening eyes, buyingpower for that training data
is gonna vastly exceed theirs.
And I think also, well, I, Iwould say proprietary data from
users of a platform perhaps.

(19:06):
Oh, a hundred percent.
Yeah.
Yeah.
I, I mean I think they're alsofighting positive transfer as well.
There, there are a lot of seculartrends here, but you're right.
At a certain point if youcan protect that data niche
Yeah, you're absolutely right.
That's the one way out thatI can see, at least for sure.
Yeah.
And the next story, also a newimage generator that was also
mind blowing before GPD 4.0.

(19:27):
So the headline is New reimage generator beats ai, art
heavyweights like mid journeyand flux at pennies per image.
This came out, there was a modelcode named Half Moon that was
already impressing everyone.
It came out now with weave Image 1.0.

(19:47):
They are providing service for it.
You can get a hundred free credits andthen credits at $5 for 500 generations.
And.
You know, this was atprevious GT four Oh.
Again, really impressive in termsof its prompt adherence in terms of

(20:08):
being able to construct complex scenesand just generally kind of do better
at various more nuanced or trickytasks than other image generators.
Seemed like the best, like anexciting new step in image generation.
I'll keep saying it.
GT four Oh some to some extent alsolike Gemini before, to be fair.

(20:31):
Still kind of more mindblowing than these things.
Yeah.
I mean, approximately, take my, my lastcomments on ideogram copy paste in here.
I think it's all, all roughly the same,but it's, it's a tough space now, right?
It's really getting commoditized.
Right?
And, and one thing alsoworth noting quickly is one
differentiator could be cost.

(20:52):
I. Because the auto goa model,you're using LLMs, you are using
you know, cost and speed alsobecause LMS are typically slower.
You are using decoding these things.
If you're still using diffusion models,could be cheaper and could be faster,
which yeah, could be significant.
I don't know.
I think in practicethis is really tough.

(21:14):
I mean OpenAI gets to amortizetheir inferences over way
larger in batch sizes.
And that's really the key number that,you know, you care about when you're,
when you're tracking this sort of thing.
There's also, you know,they're not gonna be using.
If it makes economic sense, OpenAI willjust distill smaller models and or have
models, you know, specialized in this.

(21:35):
So I think again, like long run, it'sreally kind of batch size versus batch
size compute fleet versus compute fleet.
in my mental picture of this, therich get richer, but again, like
very, very willing to look like anidiot at some point in the future.
Yeah, I'm certain these companiesare definitely thinking about
yeah, their odds as well.

(21:56):
Next up, moving away from imagegeneration, but sticking with multi
modality, Alibaba is releasingQuinn 2.5, Omni, which is adding
voice and video models to quinjetor also adding these things.
So there are.
Open sourcing Quinn 2.5, Omni sevenB. That is a multimodal model,

(22:20):
has text, image, audio, and videothat's under the Apache 2.0 license.
And it is somewhat significant 'causethrough my memory in the multimodal
model space, we don't have as manystrong models as just pure LLMs.
Yeah, we have started seeing more ofthat with things like Gemma, but this

(22:42):
has text, images, audio and video.
So possibly, if I'm not forgettinganything, kind of a pretty significant
model to be released under Apache2.0 with this multimodality.
Yes.
And, and kind of seeing, you know,maybe some of the blessings of
scale, positive transfer stuffstarting to show up here as well.

(23:03):
Interesting.
You have to see it as an open sourcemodel and, and yet again, you know, I.
the Chinese models being legit,like, I mean, this is no joke.
the benchmarks here are comparingfavorably, for example, to
Gemini Pro on omni bench.
That's a. Sorry, I,sorry, let me be careful.
Gemini 1.5 pro, right?
We're, we're two incrementsbeyond that as of today.
but still this is stuff from likesix months ago and, and beating

(23:25):
it handily in the open source.
So that's a, a pretty big development.
Right.
And can you imagine if we had ahistory where OpenAI didn't create
this, you know, versioning system formodels and we had actually new names
for models, wouldn't that be cool?
You know, you know what it alsomakes you want to go kinda like,
you know, 1.5 for this lab shouldbe the same as 1.5 for this one.

(23:47):
And you even see some of thelabs trying to kinda like.
number their things out of sequence,just to kind of signal to you
how they want to be compared.
It's a mess.
Yeah.
And speaking of impressive models out ofChina, next we have T one from Tencent.

(24:07):
So this is their thinking model.
This is kind of the equivalent toGemini two to oh one, and it is
available on Tencent Cloud pricedpretty competitively, a save a tops,
leaderboards beating R one along oh one.

(24:28):
So another kind of impressive release.
I couldn't see many technical detailson this, and in fact it didn't seem
to be covered in Western media thatmuch, but it could be a big deal.
Tencent being a major playerin the Chinese market.
Yeah, the, the big kind of releaseannounces that f first of all, it
is interestingly the hybrid mambaarchitecture, by which they presumably

(24:51):
mean the combination of transformerand mamba that we've, we've talked
about before, that a lot of peoplesee as this way of kind of like
covering the downsides of, of eachcheck out our mamba episodes, by the
way, for, for more on that 'causeit's a bit of a deep, deep dive.
But yeah they claim this, they referto it as the first lossless application
of the hybrid Mamba architecture.
I don't know what losslessmeans in this context.

(25:14):
So I asked Claude and it said, well,in this context it probably means,
you know, there was no compromiseor degradation in model quality
adapting the Mamba architecture forthis large scale inference model.
Okay, fine.
You know, if, if that's the case.
But again, this is wheredeeper dive would be helpful.
And it'll be interestingto, to see Mamba.
We haven't seen much, I haven't seenmuch about Mamba in, in quite a while.

(25:35):
That doesn't mean it's not being used,you know, in a proprietary context
by labs that we don't know about, butsort of interesting to see another
announcement in that direction.
and moving on to applications andbusiness, we begin with OpenAI and
the story that they're now close tofinalizing their $40 billion raise.

(25:56):
This is led by SoftBank and theyhave various investors in here.
Things like Founders fund chumanagement, things that you
actually don't know too much about.
They have a million noisebased hedge fund that is
contributing up to 1 billion.
But the leader certainlyseems to be SoftBank.

(26:17):
They're saying they'll invest ininitial 7.5 billion there along with 2.5
billion from, I dunno, other sources.
And this will be the biggestround ever in fundraising, right?
40 billion is.
More than most companies market cap.

(26:39):
So it's crazy.
And funnily enough, theshares of SoftBank dropped
in Japanese share markets.
I think because people arelike solid bank, you're giving
a lot of money to open ai.
Is that a good idea?
what Has SoftBank made giantmulti-billion dollar capital
allocation mistakes before Andre?

(27:00):
Like are I, I, Icertainly can't remember.
Yeah, I mean, there's nothing, there'sno company that starts with we, where
SoftBank was no famously involved.
Yeah, yeah, yeah.
But no, they, they've had apretty, a pretty rough time.
And so SoftBank obviouslyis famous for those calls.
I, I actually I can't remember.
I know that there's a noteworthystory to be told about their

(27:20):
performance over the last few years.
And My brain's fried.
I'm trying to remember if it'slike SoftBank actually, know,
is doing well actually, or, orSoftBank is, is completely fucked.
It's one of those two I think, theinvestors apparently include this.
So Magnetar CapitalI had never heard of.
To your point, the only one thatI'd heard of in the list here is

(27:40):
Founders Fund, which by the way,I mean these guys just crush it.
SpaceX, Palantir, Stripe, Endur,Facebook, Airbnb, rippling, like
these are, like, founders Fundis just like absolute God tier.
But apparently Magnetar Capital has$19 billion in assets under management
in a so they're gonna put in upto 1 billion alone in this round.

(28:01):
So that's pretty cool, pretty big.
So yeah, going up to $300 billionwould be the, the post money valuation,
which has basically doubled thelast valuation of 157 billion.
That was back in October.
So, I'm sorry, has your networth not doubled since October?
What are you doing, bro?
Like, get out there and start working.
'cause OpenAI is that's pretty wild.

(28:22):
So yeah.
Anyway, there's, there's a wholebunch of machinations about how the
capital is actually gonna be allocated.
SoftBank's gonna put in an initial$7.5 billion into OpenAI, but
then there's also 7.5 billionthat they presumably have to raise
from a, a syndicate of investors.
They don't, you know, necessarily havethe full amount that they need to put
in on their balance sheet quite yet.

(28:42):
And I think this was part of what Elonwas talking about in the context of the
Stargate build, saying, Hey, you know,I'm like looking at SoftBank, these
guys just don't have a balance sheetto support a $500 billion investment
or a $100 billion or whatever.
You know, it was, it was claimed atthe time, and this is kind of true,
and that's part of the reason whythere's a second trench of $30 billion

(29:02):
that's gonna be coming later this year.
That will include 22 billion fromSoftBank and then more from a syndicate.
So it's all this kind of staged stuff.
there's a lot of people who stillneed to be convinced or, you know,
when you're moving money flows thatare that big, obviously there's
just a lot of, a lot of stuff youhave to do to free up that capital.
So This is history's largest fundraise.
If it does go throughthat's, that's pretty wild.

(29:23):
Next up, another story aboutOpenAI and some changes in
their leadership structure,which is somewhat interesting.
So Sam Altman is seemingly kind of notstepping down, but stepping to the side
and meant to focus more on the company'stechnical direction and guiding

(29:43):
their research on product efforts.
He is the CEO, or at least was theCEO, meaning that of course as a
CEO, you basically have to overseeeverything, lots of a business aspects.
So this would be a changein focus and they are.
Promoting, I guess, or it doesn'tseem like they announced changes

(30:04):
to titles, at least not that I saw.
But the COO Brad Light Cap isgoing to be stepping up with some
additional responsibilities, likeoverseeing day-to-day operations
and managing partnerships,international expansion, et cetera.
There's also a couple more changes.
Mark Chen, who was I thinkan SAP of research, is now

(30:26):
the Chief research officer.
There's now a new chiefpeople officer as well.
So a pretty significant shufflingaround of their C-Suite.
Of course following up on a trend ofa lot of people leaving what we've
been covering for months and months.
So I don't know what, to read into this.
I think it could be a sign of troubleat OpenAI that requires restructuring.

(30:50):
It could be any number of things.
But it is notable of course.
There's, you know, that iceberg memewhere they're like, you know, you
got the, the regular theories at thetop and then the kind of deep dark
conspiracy theories at the bottom.
there are two versions of this story,or, or a bunch, and I've heard one
person, at least a for more openingeye person speculate about a kind

(31:10):
of like darker reason for this.
But, so Brett Lcap, you're right,was the CEO before is still the CEO.
All that's happening here, presumablyis a, a widening of his mandate.
This is notable because Sam Altmanis a legendarily good fundraiser,
and one would assume corporatepartnership developer, and you can
see that in, in the work that he didwith Microsoft and Apple, like very

(31:31):
few companies have deep partnershipswith both Microsoft and Apple,
who under any typical circumstanceare at each other's throats.
For a, I'll also say quick on thisAl note, the fact that he got a, like
friendly with a Trump administrationunder Elon Musk's noses, also
pretty legendary in my opinion.

(31:52):
Yeah.
Yeah.
I mean, he, yeah, he managed toturn around essentially a lifetime
of, of campaigning as a Democratand four Democrats to kind of like,
yeah, tighten his tie and make,nice with elements of the campaign.
You know, it's, it's,it's always hard to know.
But yeah.
you know, one take on this is,well, Mira Mirati has not been
replaced and you know, Sam has saidthere's no plan to replace her.

(32:12):
He essentially is stepping in to fillthat role and it's founder mode stuff.
He wants to get, you know,closer to gears level.
I'm sure that's a bigpart of it no matter what.
And it maybe the whole thing.
Another take that I have heard.
Is that as you get closer to superintelligence, the people at the
command line, the people at the consolethat get to give the prompts to the

(32:36):
model first are the ones to whom thepower tends to accrete or, or with
whom the power tends to, to accrete.
So you know, wanting to get moretechnical wanting to be, to turn into
more of a Greg Brockman type makessense if you think that that's where,
you know, if you're power driven.
And that's kind of like where,you know, where you wanna go.
Anyway, in interestingkind of iceberg meme thing.

(32:58):
Last thing I'll mention is Mark Chen,who's, who's mentioned here in the list
as one of the people who's promoted,who you mentioned, you may actually.
Know him from all thedemo videos, right?
So the, you know, deep, deepresearch demo, I guess the
oh one demo when it launched.
He's often there as Sam's like,kind of right hand demo guy.
So anyway, his face and, andvoice will, will probably be
familiar to quite a few people.

(33:19):
Next up, we have kind of afollow on story from what
we covered a lot last week.
So we were covering the ReenGPU announcements from Nvidia.
This is a story specific to the 600,000watt CBER racks at an infrastructure
that is set to ship also in 2027,along with their announcements.

(33:44):
So I'll let you take over on this one,Jeremy, if think you know the details.
It's just a little bit more onpower density rack, power density.
So, so for context, you know, youhave the the GPUs themselves, right?
So currently the black bells, the blackwells, like the B 200, that's the GPU.
But when you actually put itin a data center, it sits on a

(34:05):
board with a CPU and a bunch ofother supporting infrastructure,
and that is called a system.
So multiple of these trays with GPUsand CPUs and a bunch of other shit
get slotted into these server racks.
And together we call thatwhole thing a system.
A system with 576 of these GPUs, likeif you counted all the GPUs up in that

(34:27):
system that if you had 576 of them,that would be the NVL 5 7 6 KYBER rack.
This is a behemoth, it's gonna have apower density of 600 kilowatts per rack.
That is 600 homes worthof power consumption.
For one rack in a data center, right?

(34:47):
600 homes in one rack,that is insane, right?
The cooling requirements are wild.
For context, currently with yourB 200 series, you're looking at
about 120 kilowatts per rack.
So that's like a fivexing of power density.
It's pretty wild.
And Jensen while they haven't providedclear numbers has said that we're
heading to a world where we're gonnabe pushing one megawatt per rack.

(35:11):
So a thousand homes worth ofpower per rack, just kind of
pretty wild for this kyber system.
Just gives you a sense of, of howcrazy things are gonna be getting.
And another story on the hardware front.
This time from China, we have China'sC carrier, I dunno how to say it.
Side carrier, I dunno.

(35:32):
Side carrier.
Yeah, that is what you had to say first.
Yeah.
Chenin based company that iscoming out as potentially a
challenger to A SML and numberdevelopers of FAB tool developers.
So, as we've covered many times,probably at this point, SML is
one of the pivotal parts of theability to make advanced chips.

(35:54):
They provide their only, the onlycompany providing the really kind of
most advanced techniques tools to beable to fabricate, you know, at the
leading edge of, of tiny note sizes.
Nobody is able to match.
And so this is very significant.
If in fact there will be.
Kind of a Chinese domestic companyable to provide these tools.

(36:20):
Yeah.
What's happening in Chinais kind of interesting.
Over and over we're seeing them tryto amalgamate to concentrate what
in the US would be a whole bunch ofdifferent companies into one company.
Right.
So Huawei, SMIC seemed to kind of beforming a complex, it's like as if
you glued Nvidia to TSMC, the chipdesign with the chip fab, right?

(36:40):
Well, here's another company, Scarrier, C carrier, I. I dunno.
But silicon carrier that's essentiallylike integrating a whole bunch of
different parts of what's known as thefront end part of the, the fab process.
So when you manufacture semiconductors,you know, the, the front end is the

(37:01):
first end, most complex phase of,of manufacturing where your, like,
your circuits are actually gonnabe created on the silicon wafer.
There's a whole bunch ofstuff you have to do for that.
You have to prepare wafers, you haveto actually have a photolithography
machine that like fires basicallya UV light onto your, wafer to,
then eventually do etching there.
Then there's the etching, thedoping with, with ions deposition.

(37:25):
There's all kinds of stuff.
They have products now across the board.
They just launched a wholesuite of products kind of
covering that end-to-end.
So that puts them in competition, notjust with A SML, but also with applied
materials, with Lam Research with a,a lot of these big companies that own
other parts of the supply chain thatare maybe a little easier to break
into than necessarily lithography.

(37:45):
But then on lithography side,s carrier also claims that they
built a lithography machine thatcan produce 28 nanometer ships.
So less advanced, way less advancedthan TSMC but it brings China
one step closer if this is true.
If this is true, and if it's at economicyields, it brings them one step closer
to having their answer to A SML, whichthere's still a huge long ways off.

(38:08):
this should not be kind of the jumpfrom 28 nanometer lithography machines
to like, you know, seven nanometer,like DUV, let alone EUV is immense.
You can check out our hardware episodeto learn more about that, but it's
the closest that I've heard of Chinahaving an answer to a SML on the
litho side, and they're coming with awhole bunch of other things as well.
Again, more and more kind of integrationof stuff in the Chinese supply chain.

(38:31):
And the last story of a sectionalso about China Pony ai.
Pony AI wins the first permit forfully driverless taxi operation
in silicon China's Silicon Valley.
So they are gonna be able tooperate their cars in Sheen's

(38:54):
hun, district a part of it.
And this is quite significant becausethe US based companies, Tesla and
Waymo, are presumably not goingto be able to provide driverless
taxi operation services in China.

(39:14):
And so that is a huge marketthat is very much up for grabs.
And pony AI is one of theleaders in that space.
Yeah, China is makinglegitimate progress on AI
that should not be ignored.
One of the challenges withassessing something like this
is also that you have a very,a sort of friendly regulatory
environment for this sort of thing.

(39:35):
China wants to be able to makeheadlines like this and also has a
history of burying right fatalitiesassociated with all kinds of accidents
from, from covid to, to otherwise.
And so it's, it's always hardto do apples to apples here on
what's happening in the west.
But they do have, youknow, a big data advantage.
They have a big dataintegration advantage.
Big hardware manufacturing advantage.
Wouldn't be surprising if thiswas, if this was for real.

(39:58):
So there you go.
Maybe an interesting kind of jockeyingfor position as to who's, who's gonna
be first on full driverless, right.
And, and pony AI has beenfor around for quite a while.
Pounded in 2016.
Yeah.
Actually in Silicon Valley.
So yeah, they've been, I. Leadingthe pack to some extent for some
time, and it makes sense that they'reperhaps getting close to this working.

(40:24):
Onto projects and open source.
And we begin with a newchallenging A a GI benchmark,
to your point, of us having tocontinue making new benchmarks.
And this is coming fromthe ARC Prize Foundation.
We covered arc a GIpreviously at a high level.

(40:45):
The idea with these arc benchmarks, theytest kind of broad abstract ability to
do a reasoning and pattern matching.
So, and in particular in away where humans tend to be
good without too much effort.
So 400 people took this arc, aGI to test and were able to get

(41:12):
60% correct answers on average.
And that is outperforming AI models.
And they say that.
Non reasoning models like GP 4.5, cloudthree seven, Gemini two are each scoring
around 1% with the reasoning modelsbeing able to get between 1% and 1.3%.

(41:33):
So it also, this is part of a challengeand that there is this challenge to
be able to beat these tests undersome conditions operating locally
without an internet connection.
And I think on a single GPU, Iforget and I think just with one arm.
Yeah, exactly.

(41:54):
Half halfway transistorshave to be turned off.
That's right.
Yeah.
So yeah, this is an iterationon RKGI at the time.
We did cover also a big storywhere oh three matched human
performance on RKGI one.

(42:15):
At a high computational cost.
So not exactly at the same level,but still they kind of beat
the benchmark to some extent.
On this one, they're onlyscoring 4% using the level of
$200 of computing per task.
So it clearly is challenging, clearlytaking in some of the lessons of these

(42:36):
models beating arc a GI one, and I dothink a, a pretty, yeah, important thing
or interesting thing to keep an eye on.
Yeah, they are specificallyintroducing a new metric here,
the metric of efficiency.
The idea being that, you know,they don't want these models
to just be able to brute forcetheir way through the solution,
which I find really interesting.

(42:57):
There's like this fundamentalquestion of, of is scale alone enough?
And scaling Maximalist wouldsay, well, you know, what's
the point of efficiency?
the cost of compute iscollapsing over time.
And then algorithmic efficienciesthemselves there's kind of
algorithmic efficiencies whereconceptually you're still running
the same algorithm, but just findingmore efficient ways to do it.

(43:19):
it's not a conceptual.
Revolution in terms of the,the cognitive mechanisms
that the model is applying.
So think here of like, you know,the move from attention to, to
flash attention, for example, right?
This is like a, an optimization orlike kv, cash level optimizations
that just make your, yourtransformer kind of run faster,
train faster and inference cheaper.

(43:40):
That's not what they, theyseem to be talking about here.
They seem to be talking about just, youknow, how many cracks at the problem
does the, does the model need to take?
And there's an interesting fundamentalquestion as to whether that's a
meaningful thing given that we aregetting sort of these more algorithmic
efficiency improvements withoutreinventing the wheel and hardware
is getting cheaper and, and, andall these things are compounding.

(44:01):
So if you can solve the.
Benchmark this year with a, you know,certain hardware fleet, then presumably
you can do it, you know, six months fromnow with like a 10th of the, a 10th of
the, the hardware, the 10th of the cost.
So it's kind of an interesting argumentwho designed this benchmark is obviously
on one side of it saying, Hey, the,in some sense, the like elegance
of the solution matters as well.

(44:23):
It's, it's, yeah, I thinkit's, it's sort of fascinating.
Apparently.
To give you a sense of howperformance moves on this.
So opening AI's oh threemodel oh three low.
So, the version of it that is spendingless test on compute was apparently
the first to re, well, sorry,famously, I should say not apparently.
It was the first to reachlike some basically close to

(44:45):
saturation points on ARC agi.
I, one, it hit about 76% on the test.
That was what got everybodytalking like, okay, well we need
a new, a new a GI benchmark.
That model gets 4% on R kgiI two using $200 worth of
computing power per task, right?
So that gives you an idea that we, weare suppressing the curves yet again.

(45:06):
But if past performances anyindication I think these get saturated
pretty fast and we'll be having thesame conversation all over again.
next story also related toa challenging benchmark.
This is a paper challengingthe boundaries of reasoning and
olympiad level muff benchmarkfor large language model.
So new math benchmark, theyare calling a limb math.

(45:30):
And this has 200 problems with twodifficulty levels, easy and hard.
Where easy is similar to Amy, anexisting benchmark and hard being,
I suppose similar to, very hard,like super advanced types of map
problems that even humans can get.
They curate these problems fromtextbooks, apparently printed materials.

(45:55):
And they specifically excludedonline repositories and forums
to avoid data contamination.
And on their experiments, we're seeingthat advanced reasoning models deep
seek R one and O three mini achieveonly 21.2 or 30% accuracy respectively

(46:17):
on the hard subset of a data set.
So, Still some challenge to be solved.
I guess in a few months we'll be talkinghow we are getting to 90% accuracy.
Yeah.
We'll, we'll have the, the nextversion of, of a limb math.
Yeah.
It's, they came up with a,a couple of you know, pretty
interesting observations.
Maybe not too surprising.

(46:39):
Apparently models consistentlyare better on the English
versions of these problemscompared to the Chinese versions.
So they, 'cause they collected both.
So that's kind of cool.
and, and they do still see quitea bit of guessing strategies.
Sorry, the models get to the end of thethread and they're kind of just throwing
something out there, which presumablyis increasing through, you know,
false positives the score somewhat.

(46:59):
One thing I will say, I, firstof all like, yeah, kudos an
interesting strategy to go outinto the real world and bother
collecting these things that way.
it does make me wonder like how wellI. You could meaningfully scrub your
data set of problems that you see inmagazines, say, and like, be confident
that they don't exist somewhere.
Obviously there are all kinds ofdata cleaning strategies, including

(47:20):
using, you know, language modelsand other things to, to peruse your,
your data to make sure that it, itisn't referenced on the internet.
But these things aren't always foolproofand there've been quite a few cases
where people think they're doing areally good job of decontaminating and
not leaving any of that material onlineto essentially like have models that
have been trained already So yeah, I'mkind of curious how, you know, whether

(47:41):
we'll end up discovering that part ofthe saturation on this benchmark is
at least at first due to overfitting.
Yeah.
And part of the challenge with knowingis we don't know the training data
sets for any of these companies, right?
For OpenAI philanthropic.
These are not to publiclyrelease data sets.
And I would say it there's a 100%chance that they have a bunch of

(48:03):
textbooks that they bought and scannedand included in their training data.
Oh, yeah.
So, who knows, right?
A couple more stories, actually,another one coming out of China,
we have one open, advanced largescale video generative models.
And this is coming from Alibaba.
So as we say, this is, as thetitle says, a big model, 14 billion

(48:28):
parameters at the largest size.
And they also provide, 1.3 billionparameter model that is more
efficient train on a whole bunchof data, open sourced and seemingly
outperforming anything that is opensource in the text to video space

(48:49):
quite a bit, both on efficiency, onspeed, and on kind of appearance.
The only one that's competitiveis actually who on video, which I
think we covered recently, thingslike open SOA are quite a bit
below in terms of I guess likeappearance, measurement stuff.
So open source you know, steadilygetting to a point where we

(49:12):
have good text to video aswe had with a text to image.
Yeah, and just, I mean, anecdotallysome of the, the images are,
are pretty, they're pretty,pretty damn photorealistic.
Or some of the, sorry,some of the yeah, stills.
I will note there's kind ofthis amusing, I don't know
if this is intentional.
can't see it to see the promptanywhere, but there is a photo on

(49:33):
page four, this paper that looks anawful lot like Scarlet Johansson.
So that's kind of a, if intentional, Iguess a bit of a, a bit of a swipe at
OpenAI there, which is mildly amusing.
But anyway yeah, there you go.
I mean, China, especially on theopen source stuff is serious.
I mean, a, this is Alibaba, right?
So you know, they've got access toscaled training budgets, but they're not

(49:55):
even China's like leading lab, right?
That you, for, that you, youwanna look at Huawei and you
wanna look at at deep seek.
But yeah, pretty impressive.
Exactly, and, and I thinkkind of interesting to
see so many open sourcing.
Like Meta is the maybe onecompany in the US that's doing
a bunch of open sourcing still.

(50:17):
Google doing a littlebit with smaller models.
Basically the only modelsbeing released are the smaller
models like Gemma and Phi.
But we are getting moreimpressive models out of China.
And there's certainly a lot ofpeople using R one these days
because it is open source.
speaking of that, the next storyis about deep seek V three.

(50:39):
We have a new version as of March 24th.
This is another naming conventionthat's kind of lame, where the model
is deep seek V three dash O 3 24.
Just a kind of incremental update,but significant update because this is
now the highest scoring non reasoningmodel on some benchmarks exceeding

(51:04):
Gemini two pro and meta Lama 3.3 70 B.
Yeah, outperforming mostmodels basically while not
being a reasoning model.
So presumably this is an indicatorI. R one was based on Deep Seq.
V three.
V three was a base model.

(51:24):
V three was also a veryimpressive model at the time.
Trained a cheaply, that was a big story.
Presumably the group there is ableto improve V three quite a bit,
partially because of R one, syntheticdata generation, things like that.
And certainly they're probablylearning a lot about how to squeeze

(51:45):
out all the performance they can.
Yeah, I think this is a case wherethere are just so many caveats, but
any analysis of something like thishas to begin and end with a frank
recognition that deep seek is for real.
This is really impressive.
And now I'm just gonna adda couple of buts, right?
So none of this is to takeaway from that top line.
We've talked about in this episodequite a few times how the lab,

(52:07):
so, so Gemini 2.5 is no longera, just a simple base model.
All the labs are moving awayfrom that by default, not
releasing new base models.
And so yes, deep seek V three, theMarch 25 version is better than
all of the base models that are outthere, including the proprietary ones.
But labs are losing interestin the proprietary base model.

(52:28):
So that's an important caveat.
It's not like deep seek is movingat full speed on just the base
model and the labs are moving atfull speed on just the base model.
And, and that's kind of applesto apples, but the most recent
releases of base models.
Are still relatively recent still.
GPD 4.5 opening eyes has beensitting on it for a while as well.
So it is so difficult to knowhow far behind this implies.

(52:52):
You know, deeps seekis from the frontier.
This conversation will just continue.
And the, the real answer isknown only to some of the labs
who know how long they've beensitting on certain capabilities.
There's also just a question oflike, deep Sea could have just chosen
to invest more, certainly now thatthey have state backing, who knows?
But into, you know, meeting thisbenchmark for publicity reasons as well.
So none of this is totake away from this model.

(53:14):
It is legitimatelyvery, very impressive.
By the way, all the specs arebasically the same as before.
So context window of 128,000tokens you know, anyway, same, same
parameter accounts and all that.
but still, I, I think a veryimpressive thing with some
important caveats not to, I.
Read this right off the prompter,so to speak, in terms of
assessing where, where Chinais, where deep seek is, right.

(53:36):
And pretty significant I wouldsay, because also deeps seek V
three is fairly cheap to use.
And you can also use it onproviders like rock, rock with aq.
So if it is exceeding you know,models like cloud and OpenAI
for real applications, it couldactually significantly hurt the

(53:58):
bottom line of OpenAI and traffic,at least with startups and the
non-enterprise customers Yeah.
For people using the base model, right.
And I guess that's the bet thateverybody's making is that that will
not continue to be the default usecase if you're doing open source,
much, much more interesting tobe shipping base models, right?
Because then other peoplecan apply their own RL and

(54:19):
post-training schemes to it.
So you're gonna see probably opensource continue to disproportionately
ship some of these base models.
I wouldn't be surprised to find the,the full frontier of base models
be dominated by, by open source forthat reason in, in the years to come.
but there's a question of like,yeah, the value capture, right?
Are people spending moremoney on base models?
I don't think so.

(54:40):
I think they're spending moremoney on, EE genic models that
we're seeing start to dominate.
And one more story.
This is one is about OpenAIand not about a model.
This we saw an announcement, or atleast a post on Twitter with Sam
Altman saying that OpenAI will beadding support for the model context

(55:00):
protocol, which we discussed last week.
And that is an open sourcestandard that is basically defining
how you can use models as aprotocol when you use the API.
You know, a bit significant.
'cause it's coming from a tropic, we'renot introducing a competing standard.

(55:21):
We're adopting what is nowan open standard that the
community got excited about.
So I guess that's cool.
It's nice to see some coal and ofcourse when you have a new standard
and kind of everyone jumping onboard, that makes it much easier to
build tools and kind of the wholeecosystem benefits if a standard

(55:43):
turns out to be what everyone uses.
And there's no, like, weird competingdifferent ways to do things.
Yeah.
And I think there was so much momentumbehind this already that you know, it
just made sense even, even at scalefor open AI to move in that direction.
Yeah.
Onto research and advancements.
And as we preview at the beginning,the big story this week is coming out.
Ona Ro.
This came out just yesterday, so weare still kind of absorbing it and

(56:06):
can't go into full detail, but we willgive at least an overview and, and
the implications and kind of results.
So there's a pretty good summary articleactually you can read that is less
technical from MMIT Technology review.
The title of that article isPhilanthropic Can Now Track
the Bizarre Inner Workingsof a Large Language Model.

(56:28):
And this is covering two blogposts from philanthropic.
One is called Circuit Tracing, revealingComputational Graphs in Language Models.
They also have another blogpost, which is on the biology
of a large language model.
Essentially an applicationof that approach in the first
blog post to Cloud 3.5 hiku ofa lot of interesting results.

(56:51):
So.
There's a lot going on here,and I'll try to give a summary
of what this is presenting.
We've seen work from a tropic previouslyfocusing on interoperability and be you
know, exposing kind of the inner worksworkings of models in a way that is
usable and also kind of more intuitive.

(57:14):
So we saw them for instance,using techniques to be able to see
that models have some high levelfeatures like the Golden Gate Bridge
famously, and you could then tweakthe activations for those features
and be able to influence a model.

(57:34):
This is essentially taking that to thenext step where you are able to see a
sequence of high level features workingtogether and coalescing into an output
from an initial input set of tokens.
So they are doing this again as a sortof follow on of the initial approach.

(57:58):
At a high level, it's taking the ideaof replacing the layers of the MLP
bits of the model with these highlevel features that they discover via,
you know, a previous kind of thing.
They have a new technique herecalled a cross layer transcoder.

(58:20):
So previously you were focusing onjust one layer at a time, and you're
seeing these activations in one layer.
Now you're seeing these activations.
In multiple layers and you see thekind of flow between the features via
this idea of a cross layer transcoder.
And there are some more detailswhere you start with a cross layer

(58:43):
transcoder, you then create somethingcalled a replacement model, and
they also have a local replacementmodel for a specific prompt.
The idea there is you're basicallytrying to make it so this replacement
model, which doesn't have the sameweights, doesn't have the same set
of nodes or computational units asthe original model, has the same
overall behavior, has the sameroughly is equivalent and, and matches

(59:09):
the model closely, as closely aspossible so that you can then see the
activations of a model in terms offeatures and can map that out to the
original model sort of faithfully.
So let's just get into a coupleexamples where one they present in
the blog post figure five, you cansee how they have an input of the

(59:32):
National Digital Analytics group.
and they are then showing how eachof these tokens is leading to a
sequence of tokens in this graph.
So you start with digital analyticsgroup that maps onto tokens that
correspond to those specific words.
Inver parentheses.

(59:53):
After the parentheses, there'sa future that's just say
slash continue an acronym.
And then at, and, and the second layerof the computational graph you have say
DRE one underscore, say something A andthen say something G as three features.

(01:00:17):
And there's another featurecalled say DA, and that
combines with say G to say dag.
And DAG is the acronym fordigital analytics group.
So that's showing the general flow.
Of features.
We, they also have veryinterestingly, a breakdown of math.
They have something, I think 36 plus59, and they're showing that there's a

(01:00:42):
bunch of weird features being used here.
So 36 maps onto roughly 30, 36 andsomething ending of 6 59 mops that do
something starting with five, roughly59 59, and something ending of nine.
Then you have like 40 plus 50ish and 36 plus 60 ish, and then

(01:01:05):
eventually through a combinationof various features, you wind up
at the output of 36 plus 59 is 95.
So that's a high level thing.
it is giving us a deeper glimpse intothe inner workings of LLMs in terms
of the combinations of high levelfeatures and the circuits that they

(01:01:27):
are doing internally, it's buildingon actually a paper from last year
called Transcoders Find, interpretable,LLM, feature circuits from Yale
University and Columbia University.
They use a similar approachhere but of course scaled up.
So as with a previous work forphilanthropic to my mind, kind of

(01:01:48):
some of the most impactful researchon interoperability and some of the
most successful research becauseit's really is showing, I think at
a much deeper level what's going oninside these large language models.
Absolutely.
And, and again, this is whereI think I was caveating at
the outset of today's episode.
I haven't had the chanceto look at this yet.
And this is exactly the kind of paperthat I tend to spend the most time on.

(01:02:10):
So apologies for that.
I may actually come back nextweek with some hot takes on it.
It looks like fascinating work.
From what I have been able to gather.
It is it, I mean it's pretty close toa decisive repudiation of the whole.
And not the people make thisargument so much anymore.
The stochastic parrot argument of peoplelike Gary Marcus who like to say, you

(01:02:31):
know, oh, LLMs and auto aggressivemodels are not that impressive.
They're really just kind ofpredicting the next token.
They're stochastic parrots.
They're basically like robotsthat just put out mindlessly the
next word that's most likely.
I think anybody followingthe interpretability
space for the last like.
Two, three years has known thatthis is pretty obviously untrue.

(01:02:52):
As well as people following thecapability side, just with, some
of the things we've seen, but oneexample they gave was there's a
question as to whether a model usesfully independent reasoning threads
for different languages, right?
So if you ask what is the oppositeof small and English and French will
the language use, sorry, will themodel use language neutral components?

(01:03:16):
Or will it have a notion ofsmallness that's English, a notion
of smallness, that's French.
That's maybe what you wouldexpect on the stca stochastic
parrot hypothesis, right?
That, well, you know, it's an Englishsequence of words, so I'm, I'm gonna
use my kind of English submodel.
Turns out that's not the case, right?
Turns out that.
Instead it uses language neutralcomponents related to smallness and

(01:03:39):
opposites to come up with its answer,and then it'll pick only after that,
only after it's sort of reasoned inlatent space at the conceptual level.
Only after that does it sort ofdecode in a particular language.
And so you have this unified reasoningspace in the model that is decoupled
from language, which in, in a wayyou should expect to arise because

(01:04:00):
it's kind of a more efficientway to compress things, right?
That's just like you, you have onedomain and, and essentially all the
different languages that you train thething on are a kind of regularization.
You're kind of forcingthe model to reason.
And to, in a way that's independentof the particular language
that you're choosing to use toreason ideas are still the same.
And then yeah, there's this questionaround interpretability, right?

(01:04:21):
This thing will confabulate, you gavethat example of adding 36 and 59.
If it does this weird reasoningthing where it's almost doing like
a, if you like math, you know, likesomething like a, I dunno, tailor
approximation where you kind ofget the, the leading digit, right?
Then the next digit, then thenext digit, rather than actually
doing it in a, symbolic way.
But then when you ask it, okay howdid you come up with that answer?

(01:04:44):
It will give you the kind of commonsense I added the ones I carried,
the one I added the tens, youknow, that sort of thing, which is
explicitly not the true reasoning.
It seems to have followed, atleast based on this assessment.
This raises deep questions about howmuch you can trust things like the
reasoning traces that have becomeso popular that, companies like deep
Seek and OpenAI have touted as their.

(01:05:06):
In some cases, chief Hope at aligningsuper intelligent ai, it seems like
those reasoning traces are alreadydecoupled from the actual reasoning
that's happening in these models.
So a bit of a warningshot on that too, I think.
Right?
And to that point about themultilingual story, pre notable,
not just the technique itself,but the second blog post on the

(01:05:26):
radiology of a larger language model.
They applied it to Claude 3.5hiku, and have a bunch of results.
They have the multilingualcircuits, they have addition medical
diagnosis life of a jailbreak.
They're, showing you how a jailbreakworks, actually, also how refusal works.
So some very, kind of pretty deepinsights that are pretty usable actually

(01:05:51):
in terms of how you build your olms.
So much to cover interms of the things here.
So we probably will doa part two next week.
And onto the next story.
We have a chain of tools utilizingmassive unseen tools in the
chain of thought reasoningof frozen language models.
This is a new fine tuning based toollearning method for LLMs that lets

(01:06:18):
some efficiently use unseen toolsduring chain of thought reasoning.
So you can use this tointegrate unseen tools.
And they have actually a new data set.
Also simple toolquestions that has 1,836.
Tools that can be used to evaluatetool selection, performance.

(01:06:42):
And what tools by the way, is, youknow, kind of calling an API LLM
can say, okay, I need to do thisfact this web search, or I need
to do this addition, whatever.
And it can basically use a calculatoror it can use Google, right?
So pretty important to beable to do various things.

(01:07:04):
And this is gonna be adding ontothe performance of reasoning models.
This is a, a really interesting paper.
They're classic multi-headed hydra ofthings that you're trading off anytime
you want to do tool use in, in models.
So some, you know, some ofthese techniques, like you
imagine fine tuning if you finetune your models to use tools.

(01:07:27):
Well, you can't useyour base model, right?
You can't just use a frozen LLM.
you're not gonna succeed at using a hugenumber of tools because the more you
fine tune the more you forget, right?
There's this catastrophicforgetting problem.
And so it can be difficult to havethe model simultaneously know how
to use like over a thousand tools.
And if you fine tune.
You're never gonna be able to get themodel to use unseen tools because you're

(01:07:50):
fine tuning on a specific tool set.
You want to teach the model to use.
There's similar challenges withlike in-context learning, right?
So if you're doing in-contextlearning, you have a needle in a
haystack problem, if you have toomany tools to pick from and the
model start to start to sort of fail.
So anyway, all kinds of likechallenges with existing approaches.
So what's different here?
What are they doing?
So start with a frozen LLM that'sa really important ingredient.

(01:08:13):
They wanna be able to use preexistingmodels without any modifications,
and they are gonna train things.
They're gonna train models tohelp that frozen LLM that you
start with do its job better.
But that's not gonna involve trainingany of, the original LMS parameters.
So they're gonna startby having a tool judge.
And basically this is a model that.

(01:08:34):
When you feed a prompt to yourbase, LLM, it's gonna look at the
activations, the, the hidden staterepresentation of that input at any
number of layers, and it's gonna go,okay, based on this representation.
For this particular token thatI'm at in the sequence do I expect

(01:08:54):
that a tool should be called?
is the next token gonna be a callto a calculator, a call to a a, a
weather app or something like that?
And so this tool judge, again,operating at the, activation level,
at the sort of hidden state level,which is really interesting it's gonna
be trained on a dataset that has eexplicit annotations of like, you

(01:09:15):
know, here are some prompts and hereis where tool calls are happening.
Or, sorry, here some text.
And here, here annotated are wherethe tool calls are happening.
But that data's reallyexpensive to collect.
So they also have syntheticdata that shows the same thing.
So they're using this to kind of.
Get the tool judge to sort of learnwhat is and isn't in activation space.
What does and doesn'tcorrespond to a tool call.

(01:09:36):
So essentially training justa binary classifier here.
And then during inference, if thejudge scores for a given token
that that the tool call probabilityis above some threshold, then the
system will go ahead and call a tool.
When it does that, itdoes it via a separate.
Kind of model called a tool retriever.

(01:09:57):
And this tool retriever is,I mean, it's not a model,
it's a, it's itself a system.
It uses two different models, aquery encoder and a tool encoder.
So this is basically rag, right?
This is retrieval augmented generation.
You have that represent allof your different tools.
You a thousand or 2000 different tools.

(01:10:17):
And then you have a way of representing,of embedding the query, right?
Which is really.
A modified version of the activationsassociated with the token that the tool
judge decided was a tool call, my God.
Anyway, from here's rag.
So if, if you know the rag story,that's what they're doing here and then
they, they, anyway, they call the tool.

(01:10:37):
So a couple of ofadvantages here, right?
Frozen LLM don't need to fine tuneno catastrophic forgetting issues.
They are using just the hidden states.
So that's sort of fairly simple.
And the tool retriever, right?
This, this thing, the system that'sdeciding which tool to call is,
interestingly, it's trained usingcontrastive learning to basically like

(01:11:00):
in each training mini batch, when you'refeeding just a batch of training data
to the system to get it trained up.
You're basically instead of comparing.
one tool versus all the other toolsin the dataset to figure out like,
should I use this one or, or another.
You're just comparing it batchwise, like to all the tools that
are called a referenced within thatbatch just to make it more tractable

(01:11:23):
and, and computationally efficient.
So anyway, if you know, contrastof learning, that's how it works.
If you don't, don't worry about it.
It's a bit of a detail.
But it's I think a really importantand and interesting paper because
the future of of a GI has to includeessentially unlimited tool use, right?
That's something that I thinkeverybody would reasonably expect

(01:11:43):
and the ability to learn how to usenew tools, and this is one way to
kind of shoehorn that in potentially.
And just a couple more papers.
Next one, also aninterpretability paper.
The title is Inside Out HiddenFactual Knowledge in LLMs.
And it's also quite an interesting, sothe quick summary is they are looking

(01:12:07):
to see what knowledge is encodedinside an LLM that it doesn't produce.
So it may have some hiddenknowledge that it knows facts,
but we can't sort of get it totell us that it knows we facts.
The way they do that is they defineknowledge as whether you rank

(01:12:28):
a correct answer to a questionhigher than in an incorrect one.
So you basically know which fact isa fact based on it being what you
think is the right continuation andthe comparison of external knowledge
to internal knowledge is externally.
You can use the final tokenprobabilities with visible

(01:12:50):
kind of final external thing.
Internal.
You can only use you can useinternal activations to get
that estimate of rankings.
And that's an interesting result here.
LLMs encode 40% more factualknowledge internally when,
when they express externally.
And in fact, you can havecases where NLM knows.

(01:13:16):
The per the answer perfectly toa question, but fails to generate
it even in a thousand attempts.
And that's due to, you know,sampling, I suppose processes and, and
perhaps I need to do a deeper dive.
But it could be various reasons asto why you're failing to sample it.

(01:13:36):
It could be you know, too nicheand your prior is overwriting.
It could be sampling techniques.
But either way, another interestingfinding about the internals of L lms.
Yeah.
This is almost the it's the closest I'veseen to like hardcore quantification
of go's famous aphorism, I guesswhere he says prompting can reveal

(01:13:59):
the presence, but not the absenceof capabilities in language models.
Right?
It can reveal that a model hasa capability, it can't reveal.
That it doesn't have the capability,and this is what you're seeing, right?
It's pretty, it's prettykind of intuitive.
if you try a thousand times and youdon't get an answer that you know
the system is capable of delivering,then that means that you just

(01:14:20):
haven't found the right prompt.
And in general, you'll never find theright prompt for all prompts, right?
So you will in general alwaysunderestimate the capabilities
of a language model.
Certainly when you just look at it inoutput space, in token space this is why
increasingly like all of the strategies,the kind of safety strategies, like
the ones that open AI is pitchingwith just looking at reasoning

(01:14:41):
traces are looking really suspectand kind of fundamentally broken.
You need.
Representational space interpretabilitytechniques, if you're going to
make any, any kinds of statements.
And even then, right, you have all kindsof interesting Steganography issues at
the level of the activations themselves.
But interesting paper.
We'll, I guess we'll have tomove along 'cause of, we're,

(01:15:03):
we're on a compressed time.
Oh yeah.
We're doing like a shorterepisode today just because we got
started like half an hour late.
So this is, this is why we're,we're like kind of blasting through.
But I do think this is reallyimportant and interesting paper.
last paper we are wrapping upwith another new benchmark.
This is from Sana ai and theirbenchmark is based on Sudoku, I

(01:15:26):
think it's called Sudoku Bench.
And this benchmark has not justa classic Sudoku you've seen,
but also a bunch of variations ofSudoku with a kinda increasingly
complex rule sets as to how youcan fill in numbers on the grid.
Sudoku by way is you have a grid.

(01:15:48):
There are some rules and according tothose rules, you have to figure out
which numbers go where, basically.
And so they introduce this benchmarkand because there is a progression
of complexity, you see that.
Even top line reasoning models youknow, that can crack the easier
ones, but they are not able to beatthe more complex sides of this.

(01:16:13):
And you know, there's pretty much afair amount of distance to go for models
to be able to beat this benchmark.
Yeah, the take home for me from thiswas as much as anything that I have
no idea how Sudoku works becauseapparently there are all these variants.
Like I remember being in high school,I had friends who loved Sudoku, and
you know, it was just that, thatthing that you mentioned where there's

(01:16:34):
like a, you know, a nine by ninegrid and, and you know, you have to
put the numbers from one to nine.
And in each of the Componentsof the grid, using them only
once and then all that jazz.
But now apparently there are allkinds of versions of Sudoku unlike
chess and go that, you know, thosehave the same rules every time.
This is like so some versionsapparently they give as examples.

(01:16:56):
They require deducing the path that arerat takes through a maze of teleporters.
So like if the rat goes to positionX, then it gets magically teleported
to position y, which could besomewhere completely uncorrelated.
and that's kind of framedup in a Sudoku context.
there's another one that requiresmoving obstacles, cars, they say in the
correct locations before trying to sell.
There's all kinds ofweird variations on this.

(01:17:18):
And they basically design a spectrumright from really, really simple, like
four by four Sudoku all the way through.
I. With more and more constraintsand kind of sub rules added it seems
to just generally be a very fruitfulway to play a c Toric game and,
and procedurally generate all thesedifferent games that can be played.

(01:17:39):
And then ultimately where they landis they share this kind of data
set of how these models perform.
You can sort of think of this asanother arc. A GI kind of benchmark.
That's what it felt like to mewas, you know, it's interesting
coincidence to see this dropthe same week as Arc a GI two.
Basically all the models suck.
That's the take home.
The one that sucks the least is ohthree mini from from January 31st.

(01:18:03):
And so it has a correct solverate of 1.5% for the full.
scale version of these problems.
They have simplified problems aswell, so you can actually kind of
track progress in that direction.
But anyway, I thought thiswas really interesting.
They have a collaboration with a YouTubechannel called Cracking the Cryptic
to kind of put together a bunch ofessentially kind of training data, I

(01:18:24):
guess, evaluation data for these things.
But yeah, this is you know, snaai and, and they are the company
that put together that AI scientistpaper that we covered a while back.
They're back at it with this, I, Iwanna call it an a GI benchmark 'cause
that's kind of what it feels like.
I.
moving on to policy and safety.
First up, some US legislation SenatorScott Wiener is introducing the bill SB

(01:18:49):
53 meant to protect AI whistleblowersand boost responsible AI development.
So this would first be includingprovisions to protect whistleblowers
who alert the public about AI risks.
It's also proposing to establishcall compute research cluster

(01:19:13):
to support AI startups andresearchers with low cost computing.
This is in California in particular.
So this would be protectingpresumably whistleblowers from some
notable California companies andletting startups perhaps compete.
Yeah, this is actually reallyinteresting, right, because we covered

(01:19:35):
extensively SB 10 47, which was thebill that successfully came out of the
California legislature, which GavinNewsom vetoed over the objections of
not only a whole bunch of whistleblowersin the AI community, but also Elon Musk
who actually did come out and endorsed.
Very unusual for him, you know, being asort of like libertarian oriented guy.

(01:19:56):
he endorsed SB 10 47, the originalversion of SB 10 47 contained.
A lot of things, but, butbasically three things, right?
So one was the whistleblowerprotections that's included in SB 53.
The other was Cal Computethat's included in SB 53, which
leaves us to wonder, well,what's the thing that's missing?
Right?
What's the difference with SB 1047 and it's the liability regime.

(01:20:21):
So to SB 10 47 included a bunchof conditions where developers of
models that cost over a hundredmillion dollars to develop, could be
on the hook for disasters if theirsafety practices weren't up to par.
So if they developed a model andit led to a catastrophic incident,
and it costs them over a hundredmillion dollars just to develop the

(01:20:42):
model, essentially this, like, thismeans you've gotta be, super, super
resourced to be building these models.
Well, if you're super resourced andyou're building a model that, that's
like a hundred million dollars plus totrain, yeah, you're on the hook for.
Literally catastrophicoutcomes that come from it.
I think a lot of people lookedat that and said, Hey, that
bar doesn't sound too low.
Like, that's a pretty reasonablebar to meet for these companies.

(01:21:05):
But that was vetoed by Gavin Newsom.
So now essentially what they're doingis they're saying, okay, fine, Gavin,
like, what if we get rid of thatliability regime and we try again?
that's kind of the, thestate that we're at here.
So they're working their waythrough the California legislature.
We'll see if it ends up onNewsom's desk again, and if so,
if we get, you know, yet anotherscrapping of the legislation.

(01:21:26):
Right.
I should be clear that this is asenator in the California legislator,
not in the federal government.
Yes.
He represents San Francisco, actuallya Democrat from San Francisco,
which is kind of interesting.
And yeah, the main pitch isbalancing the need for safeguards
with the need to accelerate and,you know, in, in a position to

(01:21:46):
the objections raised to 10 47.
Next up we have a storyrelated to Federal US policy.
The title is Nvidia and Other TechGiants Demand Trump Administration
to reconsider AI Diffusion PolicySet to be Effective by May 15th.
So this is a policy initially introducedunder the Biden administration that

(01:22:09):
broadly categorizes countries into freegroups based on how friendly they are
with US national security interests.
So the first category wouldbe friends and that can import
chips about restrictions.
Second would be hostile nations,which are completely barred from
acquiring US Origin, AI technology.

(01:22:31):
And then there are other countrieslike India, which face limitations.
And of course companies likeNVIDIA aren't very happy about
that 'cause that would meanless people buying rare chips.
I think that's basically the story.
Yeah.
No surprise.
There's a lot of lobbyingagainst the AI diffusion policy.
By the way, this is onethat came out of the Biden
administration, but interestingly,has so far not been scrapped.

(01:22:56):
That's really interesting.
Because, you know, so manyexecutive orders from the Biden
administration have been, got ridof, as you would expect as part
of the, the Trump administrationsettling into their, their seats.
So yeah, I mean this is you know, Nvidiakind of trying it on again, Oracle
trying it on again, you know, see ifwe can loosen up those constraints.
we'll, I'm sure be talkingmore about this going forward.

(01:23:17):
I.
And next up, another storyrelated to export controls.
Our favorite topic the story is thatthe US has blacklisted over 50 Chinese
companies for the export blacklist.
So this is from the commerce ofBureau of Industry and Security.
There are now actually 80organizations on this entity list

(01:23:40):
with more than 50 from China.
And this is companies that areallegedly acting against us, US National
Security and foreign policy interests.
Yeah.
And so an example, yeah, they'reblacklisted from acquiring
us items to support militarymonetization for advancing quantum

(01:24:01):
mythology and things like ai.
Yeah, this is, this is one of the,the cases where I just, I don't know
when it comes to the policies policyside, and these are still I think
they're still Biden era policiesbasically that are operating here.
We may see this change, butfor now, like, dude, come on.
So get that, like two of thesefirms that they're adding to the

(01:24:24):
blacklist, we're supplying sanctionedentities like Huawei and its
affiliated chip maker, high silicon.
Okay.
So high silicon is, is basically,it is basically Huawei it's kind
of a division of Huawei that isHuawei's Nvidia, if you will.
They do all the chip design.
So then they, blacklisted 27 entitiesfor acquiring stuff to support

(01:24:45):
the CCPs military modernizationand a bunch of other stuff.
When it comes to theAI stuff, like, okay.
Among the, the organizations in theentity list, they say we're also six
subsidiaries, six of Chinese cloudcomputing firm in Inspir group.
So in Inspir is a giant,giant cloud company in China.

(01:25:05):
They actually famously madeessentially China's answer GPT-3.
Back in the day, you mayremember this, if you were
tracking, it's called UN 1.0 or.
Also it was called like Source 1.0.
but the, the fact that like,this is China's game, right?
They keep spinning up these likestupid subsidiary companies and taking
advantage of the fact that, yeah,like we're not gonna catch them.
We're playing a losinggame of whack-a-mole.

(01:25:26):
It's super cheap to spin up subsidiariesand you import shit that you shouldn't
until they get detected and shut down.
And then you do it again.
Until we move to a blacklist model,sorry, a whitelist model rather
than a blacklist model with China,this will continue to happen, right?
Like you need to have a white listwhere by default it's a no, and
then certain entities can importand then you wanna be really, really

(01:25:48):
careful about that because basicallybecause of civil military fusion.
Any private entity in China is a PLApeople's Liberation Army, a military,
Chinese military affiliated entity.
That's just how it works.
It's different from how itworks in the us that's just
the, you know, the fact of life.
But until you do that whiteliststrategy, like you are just waiting
to be made to look like a fool.

(01:26:09):
Like people are gonna spin upnew subsidiaries and we will be
doing articles and stories likethis until the cows come home.
Unless that, that changes.
So this is kind of one of thosethings where, you know, I don't know
why the Biden guys didn't do this.
I get that there's tons of pressurefrom, from US industry folks.
'cause it's, it is like, it is tough.
But at a certain point, if the goalis to prevent the CCP military from

(01:26:33):
acquiring this, this capability,we gotta be honest with ourselves
that this is the solution.
There is no other way that willfail to kind of, or succeed rather,
at this kind of whack-a-mole game.
And one more story.
This one focused on safetymore so than policy.
Netflix's Reed Hastingsgives 50 million to

(01:26:58):
Bodo college to establish an AI program.
This would be a research initiativecalled AI and Humanity, and a focus
on the risks and consequences ofai rather than sort of traditional
computer science AI research.
The college will, be using thesefonts to hire new faculty and support

(01:27:19):
existing faculty on this research focus.
50 million is, is quite a bit, I wouldimagine, for doing this kind of thing.
Yeah, it's sort of interesting 'causeI wasn't aware of, you know, there
are all these kind of big luminariesevery which way on, on this issue.
And we hadn't heard anythingfrom from Netflix, right.
From Reed Hastings.

(01:27:39):
And so I guess now we know wherewhere at least that part of
Fang comes down on the equation.
Yeah, it's interesting.
Of course this is a giftto Hastings alma mater.
He graduated from thiscollege decades ago.
Auto, synthetic media and art.
We have just a couple more stories.
First up, a judge has allowedthe New York Times copyright case

(01:28:03):
against OpenAI to go forward.
OpenAI was requesting to dismissthe case and so that didn't happen.
The judge has narrowed thelawsuit's scope, but upheld the
main copyright infringement.
Claims.
VI Judge is also saying thatthey will be releasing a detailed

(01:28:24):
opinion not released yet.
So pretty significant I thinkbecause you know, there's a
bunch of lawsuits going on.
But this is the New York Times, youknow, a big boy in media publishing
certainly probably has experiencedlawyers on their side and able to throw
down in terms of resources with open ai.

(01:28:47):
So the fact that this is goingforward is pretty significant.
Yeah, I mean, actually, I mean,nowadays they, they may not have the
kind of resources that they, theyonce had, especially surprisingly,
they're, they're kind of successfulyou think 'cause They managed to
move to a subscription based onlinemodel and survive better than other
media entities in recent decades.

(01:29:10):
I don't know if they're as bigas they used to be, but they're
surprisingly successful still.
Yes, I was just looking it up.
They're, apparently there'ssubscription revenue is, is
let me see in the quarter.
Okay.
Quarterly subscriptionrevenue of $440 million.
Jesus.
Okay.
That's pretty good.
That's pretty good.
Wow.
Okay.

(01:29:30):
I, I would not have wouldnot have expected that.
I'll, I'll have to updatemy, Well, there you go.
I mean, we will, we'll get we'llget the, the opinion, whatever
judge Stein means when he saysexpeditiously, which I guess in
lawyer talk or, or legal talk probablymeans sometime in the next decade.
But there you go.
another similar story, although thistime to the other side, a judge has

(01:29:52):
ruled that philanthropic can continuetraining on copyrighted lyrics.
For now, this is part of a lawsuitfrom the Universal Music Group that
was saying, wanting an injunctionto prevent philanthropic from using
copyrighted lyrics to train the models.
That means that yeah, philanthropic cankeep doing it, assuming it is doing it.

(01:30:17):
And this is also saying thatthe lawsuit is gonna keep going.
There's still an open question as towhether it is legal for philanthropic
to do it, but there's not yet arestriction prior to the actual case.
Yeah, this, this is like very muchnot a, not a lawyer territory.
so injunctions to my understandingessentially are just things where the

(01:30:40):
court will say ahead of time beforesomething would otherwise happen.
They will step in andsay, oh, up, up, up.
Like, just so you know,like, don't do this.
And then if you violate the injunction,it's a particularly bad thing to do.
So this is sort of like the,would be the court anticipating
rather than reacting to something.
So that's that's what the publishersare asking for, hence the statement.

(01:31:03):
From the judge on the case saying,publishers are essentially asking
the court to define the contours ofa licensing market for AI training,
where the threshold questionof fair use remains unsettled.
The court declines to awardpublishers the extraordinary
relief of a preliminaryinjunction based on legal rights.
So basically, we're not gonna stepin and, and kind of anticipate

(01:31:24):
where this market is going for youand just say, Hey, you can't use
this based on legal rights thathave not yet been established.
So essentially it's for another court todecide what the actual legal rights are.
We're not in a position until thathappens to grant injunctions on the
basis of, of what is not settled law.
So once we have settled law, yeah, if itsays that you're not allowed to do this,

(01:31:45):
then sure, we may grant an injunctionsaying, oh, anthropic don't do that.
But for right now, there's nolaw in the books and we don't
really have precedent here.
So I. I'm not gonnagive you an injunction.
That's kind of the, atleast my read on this.
Again, lawyers listening may be ableto just smack me in the face and set
me right, but kind of interesting.
Yeah, sounds right to me.
So, well with that, we are donewith this episode of last UK

(01:32:10):
AI Lots going on this week,and hopefully we did cover it.
And as we said, we'll probably coversome more of these details next week
just because there's a lot to unpack.
But for now, thank you for listeningthrough apparently this entire episode.
We would appreciate your comments,reviews sharing the podcast, et
cetera, but more of anything, weappreciate you tuning in, so please.
Advertise With Us

Popular Podcasts

Stuff You Should Know
The Joe Rogan Experience

The Joe Rogan Experience

The official podcast of comedian Joe Rogan.

Dateline NBC

Dateline NBC

Current and classic episodes, featuring compelling true-crime mysteries, powerful documentaries and in-depth investigations. Special Summer Offer: Exclusively on Apple Podcasts, try our Dateline Premium subscription completely free for one month! With Dateline Premium, you get every episode ad-free plus exclusive bonus content.

Music, radio and podcasts, all free. Listen online or download the iHeart App.

Connect

© 2025 iHeartMedia, Inc.