#189 - Chat.com, FrontierMath, Relaxed Transformers, Trump & AI - Last Week in AI

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Andrey (00:21):
Hello and welcome to the Last Week in AI podcast where you can hear us chat about what's going on with AI.
As usual in this episode we will summarize and discuss some of last week'smost interesting AI news and as always you can also go to lastweekin.
ai, our text newsletter for even more AI news we won't be covering.
I am one of your hosts, Andrej Korenkov.

(00:43):
My background is that I finished a PhD focusing in AI at Stanford and I now work at a

Jeremie (00:47):
generative AI startup.
And I'm your other host, Jeremy Harris, um, co founder and, uh,CEO of CloudStone AI, the AI Safety National Security Company.
I guess that's, I mean, we've said that many times, but, but now, now you really, really know.
I don't know, yeah,

Andrey (01:00):
how many episodes have we done now?
It must be Approaching a hundred.
It's almost two years now.

Jeremie (01:08):
Yeah.
Right.
You're right.
We've missed a couple, but I mean, it's gotta be knocking on the door of a hundred.
I remember when we started, it was like in the wake of chat GPT, or that'swhen I came on, we'd each been doing separate podcasts in the meantime,
but, uh, yeah, it just like, all of a sudden everything went crazy.

Andrey (01:23):
At least this week won't be too crazy.
I'll do a quick preview of what we'll be covering.
So, no huge stories this week.
We got some neat new features being introduced by OpenAI Netfropic and the business front.
We got some stories.
of fun things opening eyes up to a few fun open source projects and models this week.

(01:45):
So I think that'll be interesting.
Some research on interoperability and, uh, efficiency for small models andthen policy and safety will be maybe the most meaty section we'll be covering.
Let's just say we will be covering the implications of Donald Trump's victory for AI.
And as always talking a little bit about what's goingon with China and hardware and the US restrictions.

(02:10):
Before we get into the news, as always, do you want to acknowledge some listener comments?
We had a few on YouTube.
I always like seeing those.
One person did say they like the idea of a community or discord.
So that's interesting.
I, I'm, not going to make the call yet, but if we hear a few more, youknow, maybe we'll, we'll make it and we can chat about AI news on there.

(02:34):
And, uh, Jeremy, we did have a comment saying that, uh, a person loved your take on metaand releasing of weights with regards to national security, which I think, I mean, it was.
Yeah,

Jeremie (02:48):
it was mildly spicy.
I, by the way, I want to, I want to add just a little, a little modifier to that.
Um, so the context was like, um, you know, some Chinese companies were shownto be, or sorry, the China, China, the China, China was being shown to use and
rely on met as open source models as a kind of, um, floor to their capabilities.
Very important.

(03:09):
We've known about this for a long time.
Obviously when I say we, I mean the world.
Um, and, uh, you know, And so I basically said, like, I thinkthis is, we're getting to the point where it's indefensible.
Um, you know, I think one dimension, somebody, um, uh, just discussed this on Twitter with me.
It was a really good, good tweet.
And I think something we've talked about earlier on the podcast, but I wanted to resurface here.

(03:30):
They said, you know, Um, the, uh, the advantage of open source obviously is you could put backdoors in these models, um, and thereby, you know, use them as a national security asset, have
China use Western open source models that have back doors in them that we can then undermine.
I think there are a variety of reasons why I don't think that's what's actually going on here.
I don't think Meta is actually doing this strategy, uh,for, for several reasons that, that we could discuss.

(03:55):
But, um, yeah.
I think it would be interesting.
I think backdoors are going to be really hard to train outbecause unlearning is notoriously fickle and superficial.
So I just wanted to call that out.
I think an important kind of additional level of detail to flesh that out with.
So there you go.
You can append this to my rant in the last episode if you want.
A

Andrey (04:16):
little more nuance there, which is always good.
And, uh, also shout out to a couple more reviews.
Uh, one of them did say, keep up alignment comments.
And he even said that we are hitting the Goldilocks zone ontothe existential risk talk, which I, I feel pretty proud of.
I think that's, that's the intent.
A lot of work went into that.
Yeah.

(04:36):
Uh, and we did have a critical review, which I appreciate, uh, calling out to Intro AI music, uh,seems that not everyone is a fan, terrible, truly terrible AI generated songs for the intro, which
I don't know, I, I like them, but maybe I'll keep them to like 15 seconds instead of 30 seconds.

(04:57):
And as always, I'll put them at the end for people who do enjoy them.
And one last thing before the news, once again, we do have somesponsors to give a shout out as with the last couple of weeks.
The first one is The Generator, which is Bobson College'sinterdisciplinary AI lab focused on entrepreneurial AI.

(05:18):
Bobson College is a number one school for entrepreneurship in the U.
S., and that has been the case for 30 years.
And just last fall, professors from all across, uh, Bobson, uh, partnered with studentsto launch this, uh, generator, which is a lab, uh, that is organized into eight
groups, such as AI, entrepreneurship, and business innovation, AI, ethics, and society.

(05:41):
And things like that.
And it has now led peer training of faculty all across Bobson.
Their intent is just to accelerate entrepreneurship, innovation, and creativity with AI.
So yeah, it's a very cool initiative.
We will have a link for you to check it out.

(06:01):
And one new one, actually, we do have a second sponsor and it isDarren McKee promoting his engaging AI safety book, Uncontrollable.
The full title of it is Uncontrollable, The Threat ofArtificial Superintelligence and the Race to Save the World.
So if you do like the AI risk talk, I think you might be interested in this book.

(06:23):
Uh, Mark's Tegmark, who you would know if you care about AI safety, said that Uncontrollable isa captivating, balanced, and remarkably up to date book on the most important issue of our time.
It explores topics like uncertainty, control, and risk, and yeah, makes a case.
that we should be concerned about advanced AI, but it's not a Doomer book.

(06:47):
It lays out a reasonable case for AI safety and what we can do about it.
We'll have a link to it on Amazon in the show notes, and it's also on Audible.
You can just search for it.
The title is

Jeremie (07:00):
uncontrollable.
Yeah, I've actually, uh, have had quite a few conversations with Darren, uh, on this topic too.
So he's, uh, you know, he thinks a lot about it.
He's, he's talked to a lot of people as part of his research for this book.
So, um, certainly if you're, if you're interested in that, that space, Idefinitely wanted to pick up and read again, you know, Max Tegmark, one out of
one Max Tegmark's agree, uh, that this book is, uh, is a book and a great book.

(07:25):
And maybe the best book probably.

Andrey (07:27):
That's a little preview maybe of what's coming.
All righty, and now on to the news.
We are starting as always with tools and apps.
And the first story is about OpenAI introducing a predicted outputs feature.
This feature can speed up GPT 4.
0 by up to four times for tasks like editing documents or refactoring code.

(07:51):
So the gist is, uh, many times when you're using LLM, you may only want to tweak your input.
So you may give it some text or some code and say, you know,uh, correct any grammar mistakes in this document, for instance.
And so that means that you're mostly going to bespitting out what you take in with just a few tweaks.

(08:13):
And that is the gist of what this is.
If you use this, then you can have much faster outputs.
For me, it's actually a little surprising.
It's taken this long for this feature to come out.
It, I think, is pretty well established as something you can do.
But nice to see both Entropic and OpenAI introducing more and more of these.

(08:36):
really developer friendly, you could say

Jeremie (08:38):
features.
Yeah, this is definitely part of that productization push right towards, uh, moreand more kind of application specific tooling that opening eyes is focusing on.
Um, you know, one of the things that is making this possible is speculative decoding.
This is the technique that, um, it it's been around fora little bit now, but now we're seeing it productized.
Uh, the basic idea behind it is you get Two different models.

(09:01):
You have a draft model, basically this like very small, cheap model.
And at any given time, um, you can get that draft model to proposelike, what are the next five tokens or something like that?
Right.
So get it to cheaply produce predictions for those tokens.
And then what you can do is uh, feed all five of those tokens in parallel to a larger modelthat would have more expensive computation, but it can, it can handle them in parallel all

(09:27):
in one forward pass spending the same amount of computers it would if it was just like one,one, uh, input that it was trying to process, and so, and then you essentially get out.
Um, predictions for how accurate the draft models, um, token proposals were.
And so this allows you to essentially amortize the cost of that moreexpensive model over a large number of tokens, get it to do sort of

(09:48):
editing and cleanup, so to speak, um, a lot faster and a lot cheaper.
So this is, uh, a Practical implementation of speculative decoding, uh, it's, it's one of thosethings where, you know, it's funny, you read the paper and then a couple of months later, it's
like, boom, you know, people are putting it into production, actually saving a lot of money.
So, um, this is, this is the whole idea.
Another advantage of course, is you don't have the problem thatThe model might, you know, hallucinate the stuff that's solid.

(10:12):
Like if you, if you have, you know, some small part of like a JSON file orsomething that you want to tweak and you want the rest of the file to be
anchored to be exactly the same, then this allows you to do that, right?
It allows you to, to fix.
So what they're doing during speculative decoding is they're actually here fixing thepart of the output, output that should be fixed and only having the large and expensive
model make those predictions, presumably on the, the variable parts of that output.

(10:35):
So This is, um, a bit of a janky reimagining of what speculative decoding, decoding lookslike with this added constraint that, you know, the stuff before and after this window that
you're actually going to try to sample in, um, is, is, uh, is kind of concrete is locked in.
So, um.
I think kind of cool.
Um, I'm curious about the economics, what they are doing, by the way, isthey're only charging you for the tokens that are actually getting kind of

(11:00):
modded in the middle, let's say wherever you want the modifications to occur.
So that seems fair, right?
You're giving a strong prior on like, keep the beginning and the end, say the same.
So don't charge me for generating those tokens only charge me for generatingthe ones that I care about, which again makes a lot of economic sense.

Andrey (11:16):
That's right.
And, uh, there was also, uh, I guess a.
Partnership of factory I with open AI to test thisnew feature in their API and they have a few metrics.
It's not like, you know, there's no benchmark that they reporthere, but they do have some numbers of they did find in practice.

(11:36):
Two to four times faster response times while maintaining accuracy.
And they have examples of large files that would take 70seconds throughout the staking, uh, 20 seconds roughly.
So yeah, very easy to see how this is useful in practice for various applications.
Next up, we are moving to Entropiq and a price increase for HYCU 3.

(12:02):
5.
It costs four times more than the predecessor.
Uh, the claim I think is that the price hike is at least partially because HYCU 3.
5 is superior to the previous version, but, uh, rather surprising.
So the 1 per million input tokens and 5 per million.

(12:25):
output tokens.
And that's again, four times more than the previous one.

Jeremie (12:30):
Yeah.
It's also almost 10 times more expensive than GPT 4.
0 mini, right?
So when, when we look at that, like that's pretty remarkable, right?
It's it's um, in fact, it's only two and a half times cheaper than the full GPT 4.
0.
So GPT 4.
0 mini was supposed to be the haiku say of the opening eye series of the GPT 4.
0 series.
Right.

(12:50):
And so.
Here we have essentially a model haiku that's coming out and saying, Hey, I'm still the smallone, except I'm now going to cost you, you know, something closer to the full model size.
That's a really interesting play.
And I think this speaks to something very interestingthat's happening with the economics of these models, right?
Like one of the big questions has been, we've talked about a lot here, but, um, To whatextent do LLMs just get commoditized right to the point where the margins go to zero?

(13:15):
Like your model is basically the same as their modelis basically the same as your other competitor's model.
And so everybody has to just basically price based on theraw cost pretty much of producing the model and serving it.
And at that point, your, your profits go to zero, or,you know, this is kind of what happens economically.
And one of the challenges is You can't do that and build enough bankto spend for the next generation of massive multi billion dollar data

(13:41):
centers if you're just living a hand to mouth existence like this.
So a bit of a structural problem.
This is the first time we've seen that trend bucked wherewe've seen a model come out and say, Hey, you know what?
On the basis of my higher quality, I'm going to up the,the, the, the cost associated with using this model.
You know, a lot of developers, um, some might say fairly understandably arecoming back and saying, Hey, you know, this is an unwelcome development.

(14:02):
Um, not necessarily because of the price increase per But becausethe framing, um, is, is that, Hey, the, this is better quality.
So therefore we're charging you more.
Um, this is really interesting, right?
There's this classic thing when you do startups, when you do, I get me, it's more broad than that.
It's economics really.
Uh, when you want, when you're trying to sell something and makea profit, uh, value based pricing is the thing you go with, right?

(14:26):
You, you advertise how much value you can bring to thecustomer, how good your product rather than talking about.
Uh, the cost, the, when you talk about how much it costs you to make a thing, that'swhere you, that's a hint that your whole industry has been commoditized, right?
So when you go out to McDonald's and you say like, Hey,well, can you give me the same burger just a buck cheaper?
They'll tell you like, no, the patty costs this much, thebun costs this much, the cashier's time costs this much.

(14:50):
So therefore I have to sell you this bread.
They probably won't do that.
They'll probably tell you to leave, but whatever.
Um, so.
Um, they'll literally tell you, sir, this is a Wendy's, uh, anyway, um, the, uh, but youkind of get it right when, when you're dealing with a commoditized industry where everybody
can basically offer you the same product, your margins go to zero, you argue based on cost.
This is different.
Claude and anthropic is coming out and saying 3.

(15:10):
5 haiku is higher quality.
Therefore we'll charge you more.
People pushing back on that is an indication that.
Well, actually this space is pretty commoditized, you know,like anyway, I think this is a really interesting tell.
Um, one of the big consequences, by the way, of all this stuff, as you see prices going up and downand side to side, and you've got new products coming online, um, it really makes a lot of sense.

(15:31):
If you're a company working in this space to have the.
Uh, the scaffold you need to very quickly assess through automated evaluations,whether the task you care about is being performed well by a given LLM.
So a new LLM comes online with a new price point.
You should be able to very quickly and efficiently assess does this LLM,At this price point, at this quality makes sense for my use case, right?

(15:54):
If you can't do that, then you can't ride efficiently this wave of lower and lower LLM prices.
You're not going to benefit from that in your product.
So just to kind of, I guess, side thought there, you know, really important for, for companies to.
To get into the habit of checking these latest models, because there are companies for whom Haiku 3.
5 is going to be way, way better than the other options.

(16:15):
Um, but the question is, what are you competing against?
Are you competing against GPD 4.
0 or are you competing against GPD 4.
0 mini?
And you know, right now we're somewhere in between.

Andrey (16:23):
This is, uh, yeah, I think to me a little surprising.
Uh, the announcement of 3.
5 haiku was at the same time as 3.
5 sonnet, which recovered, I think about two weeks ago now.
And it was just this past week that they announced a price change.
And that is what led to people responding.
I didn't, you know, four, four times.
Uh, raise a price is pretty dramatic, so it must be a mix of, it wasunderpriced to begin with, perhaps like significantly underpriced.

(16:55):
Uh, and I guess there's also perhaps a factor of them just emphasizing 3.
5 sonnet as the main one they want to compete with going forward.
I don't know.
Yeah.
It's a, it's a certainly an interesting move from a competitive perspective.
On the lighting round, we are starting with Flux 1.

(17:15):
1 Pro Ultra and Raw.
So, Flux 1.
1 Pro from Black Forest Labs, one of the leading AI image generator providers, hasnow been upgraded to support 4x higher image resolution, up to 4x higher resolution.
million pixel, I don't know what MP stands for, but really high resolution.

(17:37):
And it still has faster generation times of 10 seconds per sample.
And this is priced at just 6 cents per image.
And they do have this raw mode as well, which just leads to kind of realistic looking images.
more akin to photography.

(17:58):
So, yeah, I guess not too surprising.
We keep getting better and better models, more and more realistic, but I think we'rekeeping up with black forest labs and, you know, they're moving pretty rapidly in the space.

Jeremie (18:10):
Yeah.
And they're the ones who have memory serves partnered up with, uh, withX, uh, you know, formerly known as Twitter, uh, to, uh, support the
Grok app and the image generation functionality that they're developing.
So, uh, you know, this is.
This is them continuing to pump out their own independent product line,which I don't know, may be integrated as well with, with Grok at some point.
Um, yeah, looking at the images again, I mean, I, I findmyself continually saying this, I'm not a, an image guy.

(18:34):
So I, like, I don't know the, um, you know, the, the, the kind of, uh, the, the,the aspects of image generation, let's say that are of greatest interest to people
who really dig into the space, but the images look like they're really high quality.
The raw mode especially does look really gritty and real.
Um, Because I'm a bit of a buffoon in this space, I kind of look at these and go, uh, cool.

(18:56):
I feel like I've seen a lot of other models that have the same quality.
Um, so I'm kind of not sure, you know, where the, where themode is in this space, but, but still, um, does look impressive.
And, uh, flux has kind of come out of nowhere too, uh, with these new models.
And speaking

Andrey (19:11):
of X and Grok, we have a bit of a story on that.
X is testing a free version of a Grok chatbot in some regions.
So this was previously exclusive to premium and premium plus users of X.
And now there is a free tier where you can do 10 questionsin two hours for two models and 20 for the Grok 2.

(19:38):
mini models, plus a few image analysis questions per day.
So, uh, you do have to sign up to X, of course, and you do need to have a linked phone number.
But, uh, certainly, you know, this is something that you have in chatGPT, I think also in Anthropic, the ability to use the chat bots for free.
So this is just being tested in New Zealand now, butit'll be interesting to see if a continue of expansion

Jeremie (20:06):
to more users.
Yeah, and obviously a big goal anytime you launch somethingfor free like this is to collect user data, right?
Upvotes and downvotes for, say, RLHF or something else.
Um, and, uh, and also just to own more mindshare.
I think one of the things that OpenAI continues to enjoy a massive lead on isthe fact that ChatGPT is a household name, whereas, you know, Claude is not.

(20:26):
Um, and Grok increasingly is becoming one, but that'sonly thanks to the distribution they get through X.
And so I think, um, you know, at this point you combine.
The, the X distribution factor with the, the X factor, if you will, uh, with, uh, with the factthat this is free, that could be really interesting, but the, the code is interesting too, right?
Like a, a query code of 10 questions within two hours, I don't know about you,but when I'm sitting down with like with Claude, which is, you know, for some

(20:52):
of the work that I do, I tend to spend quite a bit of time with Claude actually.
Um, There are long sessions and there's a lot of back and forth and there's a lotof like going back and editing questions and you know tweaking prompts so uh that
that quota might be challenging for some of the heavier use cases which makes sense.
Yeah this

Andrey (21:08):
feels like you know you want to give people a taste so to speak so that people might consider subscribing to X which probably Hard to say,
I'm not sure if Grok will convince people who aren't subscribers to do so,

Jeremie (21:23):
but you know, maybe.
No, you're right.
I mean, there's value in bundling it on with, with X, right?
Like I was going to say, there are other free chat platforms that give, you know, thatdon't give you a limit, but the fact of the X integration, that distribution is so, so key.
And I think it's still probably being underrated.
So.
We'll see.

Andrey (21:39):
Moving on to applications and business.
Speaking of chatbots, , we have a kind of fun story, nota very consequential story, but, uh, one that is neat.
Uh, OpenAI has acquired the domain chat.com.
Uh, and I, we don't know the exact details of how much it cost, butit appears to have cost a lot, like in the maybe, uh, 10 million ish.

(22:05):
Uh.
We know that it was previously acquired by HubSpot co founder Dharmesh Shah for 15.
5 million.
I think just roughly two years ago or so.
And it has now been revealed that he sold chat.
com to open the eye and, uh, Sam Altman on X tweeted or posted just chat.

(22:27):
com.
That was the entire post, I guess, showing off.
So, uh, it's not yet, uh, I guess been promoted heavily.
There's no new brand.
Uh, it's still called Chad GPT, but you know, I mean, 10 million for UL, that's pretty significant.

Jeremie (22:45):
Yeah.
I mean, if it were 10 million, uh, that would be a haircut on the initial acquisition cost of 15.
5 million, which, uh, you know, it's pretty significant, but, uh, from, fromthe context, it seems like something more interesting, maybe going on here.
It seems apparently as if, so Shah Dharmesh Shah, the, uh, Theguy who acquired it, uh, may have been paid in open AI shares.

(23:07):
So if that's the case, that would be kind of interesting too.
Uh, he had this somewhat cryptic post on X.
Um, all of this is very cryptic.
It's the most cryptic launch of a new domain I've ever seen.
Um, but if you do go to chat.
com, you will see, of course, the, uh, Uh, right now the chat GPT four Oh, uh, uh, interface.
So there you go.

Andrey (23:27):
Right.
Yeah.
To emphasize 10 million.
We don't know if that even is of a ballpark.
That's just based on what was previously paid.
You would expect it to be, you know, Around that maybe next up a more serious storyand it is that Saudis are planning 100 billion AI powerhouse to rival the UAE tech hub.

(23:50):
So this is Saudi Arabia, of course, and it's planning this artificial intelligence projectto Yeah, pretty much develop a technological hub to rival that of the United Arab Emirates.
This will be used to invest in data centers, startups, and other infrastructure.
It's, uh, titled the initiative project is called project transcendence, which is pretty fun.

(24:14):
Not, not dystopic at all.
Yep.
Well, you know, pretty ambitious, you could say.
And of course this will also be used to recruit talent to the region, which I'mguessing is perhaps not quite as prevalent there as in the U S or elsewhere.
So yeah, we, we've covered in the past how the UAE has invested significantly.

(24:36):
There've been developments from the region like withFalcon models that are pretty notable at the time.
Don't know that we've had too much to cover.
In recent times from the UAE, but, uh, certainly it'strue that, uh, these countries are trying to invest

Jeremie (24:54):
and be a player in the space.
Yeah.
I mean, I think the, the biggest kind of recent stuff with the UAE has been infrastructure kindof structural stuff with G42 and the questions around, you know, can they decouple from Huawei
technology and Chinese tech and, you know, the department of commerce getting involved there.
So really the question about where Where's the, the future of sayAGI training run scale data centers, where is that going to be?

(25:15):
And, and this idea that, you know, the UAE has this massive energy advantage, which is abig part of the reason and capital, which is a big part of the reason why so many people are
interested in it as a, as a hotbed, as a, as a place to build out the, this infrastructure.
Um, this is Saudi Arabia basically saying, Hey, waita minute, uh, we're a giant oil rich nation with.
Deep concerns over how much longer that oil is going to hold up and be viable.

(25:39):
And so they're looking for ways to diversify out of that industry.
And well, guess what?
Oil comes with, you know, that awful lot of, uh, of energy and, and that's, that's great.
So it gives them a lot of the ingredients they need again, themoney and the energy to potentially seed something like this.
They already have a sort of similar structures, let's say to project transcendence.
There's a A company is sort of, um, state backed entity called allot.

(26:02):
Uh, that's a fund that does sustainable manufacturing.
It's got a hundred billion dollars in backing.
That's about the order of what's speculated could beassociated, could be associated with project transcendence.
We don't know yet how much actually will be forked over.
Um, but there are discussions with potential partners, uh, whichinclude, I think I saw market or sorry, Andreessen Horowitz.
Yeah, that's right.

(26:22):
Um, yeah.
So apparently A16Z is talking with, uh, this, um, the public investment fund, whichis sort of the, the state, uh, kind of entity that would be overseeing all this.
Um, so that's, it's kind of interesting.
I mean, a Western private, uh, private actor looking at that, Apparently the, the, uh, funditself is maybe growing to as large as 40 billion in commitments again, aiming for that 50

(26:45):
to a hundred billion in total, which would be, which would be pretty, uh, pretty impressive.
But keep in mind that is like about what a year of Microsoft infrastructure spend.
Um, and the, the challenge here is that the buildout for this is, is slated for like, you know,2030, there are a whole bunch of problems right now plaguing Saudi Arabia on this front as well.
They've seen an overheating economy that's now causing them to claw back someof their previous commitments, um, to, to do similar, uh, buildouts in other

(27:11):
tech sectors too, um, including semiconductors and, and like smart power.
You know, smart everything basically.
Uh, so, you know, now there's a little bit ofuncertainty about the future of some of those projects.
This one certainly has a lot of truth, uh, buzz around it.
So, you know, see, see where that ends up going.
Um, and, and by the way, you could did a little digging.
We, what kind of history does Saudi Arabia have in the LLM space?

(27:34):
I was not tracking this, but there was a 7 billion parameter model.
It's the only one I've been able to find so far,but, um, for, you know, take it for what it's worth.
Uh, there's a tech company called Wattad.
That apparently built this model called the Moolhem and it was a SaudiArabian domain specific LLM that was trained exclusively on Saudi data sets.
So a bit of a scaling scaling issue there in terms of getting beyond that.

(27:56):
But um, uh, there you go.
So they have a, you know, a small footprint in this space, obviously hoping to attract it.
talent, which is going to be a really, really important resource.
Um, and I think that that's going to be a challenge for,for both, um, Saudi and frankly, the, the UAE as well.
Um, at least on the model development side, theinfrastructure side, I think might be a bit of an easier play.

Andrey (28:15):
Yeah.
So good call out there.
This is.
saying a backing of as much as a hundred billion andthis is a people familiar with a matter kind of article.
So yeah, not too many concrete details there.
After the lightning round, the first story is again on OpenAI, but this time it's about hardware.
And it's that Meta's former hardware lead for Project Orion is joining OpenAI.

(28:40):
So this is Caitlin Kalinowski, who was the former head of Meta's AI glosses team,and has also worked on VR projects, uh, and also worked on MacBook hardware at Apple.
is now joining OpenAI seemingly to focus on robotics and partnershipsto integrate it, uh, integrate AI into physical products.

(29:03):
We covered, uh, pretty recently how, uh, OpenAI did start recruiting for robotics positionswith the descriptions of a job having to do with integrating charged GPT into robots.
We did see, uh, figure, the developer of a humanoid robot showcase their robotworking with shared GPT, having conversations and being told to do stuff.

(29:27):
So perhaps this is, uh, this recruitment points to opening.
I want to

Jeremie (29:32):
do more of that.
There's a lot of reading tea leaves, especially this week with open AI and it's hires, youknow, there's a, so, so apparently one of the, part of the speculation in this article is
that Kalinowski, um, Is there to kind of to work with love from her old boss, Johnny Ive.
We've talked about Johnny Ive, um, partnering.
So he was the designer of course, of the iPhone.

(29:54):
Um, now he's been brought on board to open AI, uh, to launch as he put it, a product thatuses AI to create a computing experience that is less socially disruptive than the iPhone.
Um, so I, I couldn't quite interpret.
What he was saying there is, was he saying it's going to be less, lesshorrible socially than the iPhone was, or it's going to be less of a
game changer than the iPhone was, um, probably he meant the former.

(30:18):
I'm not sure.
But anyway, uh, so apparently she'll be back working with him.
So that's sort of a, a natural, a natural partnership there.
Um, she has a lot of experience doing design and Apple as well.
Really, really unhelpful, I will say, of OpenAI to have two separate media threadsthat involve the word Orion, because, uh, there's this model we'll talk about, right?

(30:39):
The speculative model, we talked about the spore of the model Orion, and nowyou have the former Orion lead from Meta, different things coming to OpenAI.
I really wish that they would keep their headlines a little bit, a little bit straighter, but.

Andrey (30:50):
Yeah.
And why, why Orion is a, you know, be a little more original.
Okay.
In your, uh, project names also worth mentioning openly.
I did acquire a company building webcams earlier this year, I believe.
So could play into that.
We don't know.
This is just, uh, we don't know what they're doing here.

Jeremie (31:09):
It's, it's also an interesting about face, right?
Cause like they, they did, they disbanded their entire robotics team.
This is like four years ago and now they're really rebuilding it.
But it does seem that the new robotics team.
Is a lot more, um, market focused, like product focusedand so that in itself is, is sort of interesting.
You know, there are pros and cons there.
They'll get a lot more real world feedback by havingtheir systems out there and, and more interesting data.

(31:31):
But, um, yeah, anyway, so, uh, the structure of open AI continuesto tilt towards more and more of a product oriented org.
And just

Andrey (31:38):
one last story on OpenAI.
This one is, I guess, a fun one as well.
Uh, and it is that OpenAI accidentally leaked access to theupcoming O1 model to anyone by going to a certain web address.
So this was accidentally leaked, uh, in the sense that users could access it by altering a URL.

(32:02):
for a brief period of time.
It was shut down after two hours.
I think maybe when people were aware of it or something.
So we have the preview model of 01 that you can use,but still we don't have access to the full 01 version.
Now, yeah, people were able to play around with it.
Opening actually confirmed that this was the case, uh, and saidthat, uh, there was not too much access since this was resolved.

(32:30):
So people play around with it.
And as you might expect, they'd say that it was pretty impressive.

Jeremie (32:36):
Yeah.
The opening, I at least said that they were preparing with a limitedexternal access to the opening IO and model and ran into an issue.
So I guess in the process of trying to give people, you know, maybe special links to access it.
Um, it, it leaked in that way.
I still think so.
So by the way, some of the demos are kind of interesting.
Uh, there's a classic one where, you know, you have this like image thatis an image of a triangle and it's subdivided with a whole bunch of lines.

(33:02):
And then those lines form sub triangles within the image.
And then you ask how many triangles are there in the image?
Um, standard.
Multimodal LLMs really struggle with this.
Uh, in fact, the preview version of 01 struggled with this and got the answer wrong.
The new version did not.
Um, so, you know, one of these little things where, you know,maybe a bellwether, uh, eval or something like that, who knows.
Um, but I think one of the most interesting aspects of this, apart from the fact that it teachesus quite a bit about, um, opening eyes, continued struggles with security, it must be said.

(33:32):
Um, you know, this is, this is an organization that, uh, Explicitly has said that theyare trying to prevent people from seeing the full reasoning traces of o1 because that is
critical intellectual property for them Well, guess what this this o1 version the fullo1 version which was leaked to begin with also leaked out a full chain of thought When it
was asked to analyze in one case a picture of a recent SpaceX launch and then other otherthings in other other cases So for this sort of critical Um, uh, competitive secret, really.

(34:03):
And that's what it is.
Uh, the reason opening, I didn't want to release that, uh, those chains of thought initiallywas precisely because they were concerned that those chain of chains of thought would, would be
really valuable training data for people to replicate what is so precious about this model series.
And so, you know, here they are kind of leaking it out themselveswith this, uh, you know, haphazard, Uh, haphazard launch.

(34:24):
So doesn't really inspire a lot of confidence in open AI security approach.
Their philosophy, really frankly, the level of effort that they're putting, uh, into this.
I know it sounds like a small thing, but when you're dealing with, you know, the stakes asthey may potentially present themselves in the future or national security, otherwise like.
This is not a small screw up, um, and it could have been mined.
If you imagine it's not an individual who's accessing this, it's an A.

(34:47):
I.
Agent or something, and it's collecting, you know, using theopportunity to collect a bunch of training data, not saying you could
do a ton of it in that time, but this is an important vulnerability.
And, um, anyway, so, uh, kind of, uh, kind of amusing and a little disappointing, especially giventhat open AI has made such a big public, um, show of, of trying to get into the security game more.

Andrey (35:08):
And just one little caveat with regards to a full chain of thoughts.
Uh, we don't know for sure if that's the case, uh, one Twitter userreported seeing it, uh, but that may or may not have been the full full.
It was just a detailed response that did.
Uh, include some of the reasoning

Jeremie (35:27):
steps, so yeah, no, that's fair enough.
It did look different enough.
Yeah, you're right.
It did look materially different enough, um, from the sort of standard reasoning tracethat's put out and similar enough to the one that the reasoning traces that were shared,
that opening I did share right when they launched that it's like very suspiciously like

Andrey (35:45):
It seems like at least it's similar to what it's doing internally.
Yeah, yeah.
And one last story, NVIDIA is once again even more valuable than before.
This time it is the largest company in the world.
It has surpassed Apple on Tuesday.
I don't know what happened on Tuesday, I guess.

(36:08):
Find out.
So the shares rose at 2.
9%.
Uh, leading to a market capitalization of 3.
43 trillion ahead of Apple at 3.
3 And for reference, Microsoft is at 3.
06.

(36:28):
Uh, for reference, NVIDIA has gone up by, uh, more than 850 percent since the end of 2022.
So yeah, still an insane story of NVIDIA's

Jeremie (36:43):
rise.
It's sort of funny because it's, uh, like all my friends at the labs, like not tomake it a whole stock story, but a very, very, uh, big wave of people who went in hard
on, on NVIDIA from the frontier labs, like in the sort of like, uh, 2021, 2022 era.
And, um, uh, you know, you think about the, the, the, the revenues they'remaking plowing it into NVIDIA and now that's kind of 10 X in value.

(37:08):
I, I, Yeah, there, the, anyway, there's a conviction about where this all might be going.
We're not giving stock advice on the show.
Don't invest based on our stock advice.
Um, but, uh, yeah, certainly AI scaling has been good to NVIDIA.

Andrey (37:21):
Yeah.
I will say, I remember when I was in grad school, like in 2017, 2018, I was like, Oh wow.
NVIDIA is really doing good because of all of this deep learning stuff.
And here GPU is being a backbone of deep learning, which is a big thing in AI.
And.
Even at the time I was like, I wish I had money to invest and it was not

Jeremie (37:39):
a poor grad student.
So, uh, well, and Jensen saw that in like 2014, 2013, right?
Like he, he has been positioning NVIDIA and the whole CUDA ecosystem for this for a long time.
And yeah, it's a pretty wild.

Andrey (37:54):
Moving on to projects and open source.
The first story is about new research which we've covereda couple times and them launching a user facing chatbot.
So this group has previously released Hermes, specifically Hermes 370B in this case.
It's a variant of 1 and new, new, new research.

(38:21):
One of our big trademarks is these unrestricted models.
So having free access for all, doing completely unrestricted ability to track, so less safety,uh, This one, uh, the article writer at least did find that it did, uh, refuse to do certain
things like go into how to make drugs, although according to, uh, news, this is not from them.

(38:47):
So they didn't add any guardrails to this user facing chat bot.
Some of it was already baked into a model.
previously.

Jeremie (38:56):
Yeah, I do find this interesting.
Like, um, there's a certain, um, eagerness to do like fully, fully, uh, no guardrails.
Like, I don't think even like even XAI doesn't, um, uh, or sorry, even the platform X.
through grok and kind of XAI.
Therefore, they don't pretend to like, be trying to do a fully no holds bar thing, right?

(39:17):
They're like, we will adhere to the law and, uh, and not producethings like, you know, child, child pornography or whatever else.
Um, so same, same things happening here.
And, and noose is interesting because they are especially into thisthesis in a, what I interpreted earlier is like a more extreme way.
Um, but here they're, they're basically saying like, Oh no, like,of course, of course we have safeguards on the actual model.

(39:40):
Like, of course we try to prevent it from, you know, from doing really, reallybad things like helping you make illegal narcotics, like meth, like naturally.
Um, so anyway, the, the model as you'd expect has been jailbroken plenty,the prompter, um, very, uh, very quick on the case as usual, uh, finding
a really powerful exploit to basically get some through everything.

(40:01):
Um, you know, that's.
It's only interesting.
I mean, we, we, I'd love to do a deep dive on plenty of the prompters methodology and, and approach.
Cause there's some fascinating stuff there, but, um, new, reallyinteresting to note that they're even launching this, right?
This is not a new model.
It is just a chat interface.
So they are trying to play in that space as well.
Um, yeah, so.

(40:21):
We'll see where it goes.
I mean, I don't know if they're going to be charging for this stuff at somepoint or how that'll play out, but they are really into the, you know, make
it available for everybody up to and including training methodology, right?
We covered their distro optimizer a couple episodes ago that, um, anyway, it'smeant to make it possible for like people to pull off massive training runs
distributed across basically the whole world between GPUs, that type thing.

(40:43):
So, uh,

Andrey (40:43):
anyway.
That's right.
And this is, I suppose, part of a platform news chat.
So that's very much like charge your TV interface.
You log in, you have a text prompt window.
It has a fun kind of visual style to it.
A little more like.
I don't know, old windows or a terminal.

(41:04):
It looks a little, I don't know, nerdy.
And one fun thing about it that is kind of interesting is you do have accessto a system prompt and you can modify it directly, which is not the case.
with chat GPT.
So just to read a bit, the system prompt that is here by default isyour Hermes and AI to help humans build, create, flourish, and grow.

(41:28):
Your personality is empathetic, creative, intelligent,persistent, powerful, self confident, and authentic.
top adaptable.
You communicate informally and in succinct responsesthat feel just like another human, et cetera, et cetera.
So, uh, I don't know, neat that they do provide access to that and you can configure it.

(41:48):
Next up, we got frontier math, a new benchmark.
So this one is crafted by over 60 expert mathematicians from top institutions andhas original unpublished problems across various branches of modern mathematics,
meaning that you shouldn't be able to find it on the web and learn on it.

(42:12):
So compared to existing benchmarks like GSM 8k and math, whichhave simpler problems and do have the benchmarks out there with.
Here you have problems that require deep theoretical understanding and creativity.
And as a result, things like GPT 4, Gemini 1.

(42:32):
5 Pro struggle and solve less than 2 percent of the problem.
I believe that was a quote from Terence Tao, one of the people involved that thisshould be challenging for models for at least a year or at least a couple of years.

Jeremie (42:48):
Yeah.
And they've got an interesting framework that they, so it's not just the benchmark, right?
They're coming out with a whole evaluation frameworkthat's all about automated verification of answers.
Part of that is to prevent, um, uh, guessing.
So they, they want to prevent LLMs from being able to succeed just by.
by kind of throwing out a guess and doing well.
Um, so they set up these, uh, these questions so that the, the responses are,uh, deliberately complex and non obvious like the, the correct answers, uh, to

(43:17):
get, to reduce the chances of, of guessing, getting you to where you want to go.
Um, they're also designed to be the kinds of problems that it wouldn't take.
Like, it's not just a question of, you know, I want it to take me a really long time to find theanswer to this question, but I can do it through relatively straightforward reasoning, right?
So it's not like an undergraduate physics question, for example.

(43:38):
Um, it's also not like a, uh, some of the, the, you know, GPQA questions, like thegraduate, um, uh, question answering questions, which sometimes you can answer in one shot.
Like without thinking you need to have the expertise, but if you have it, in some casesin that data set, you can just go ahead and respond without thinking too, too much.

(43:58):
They're trying to combine those two things together.
They want it to be like really, really hard and alsorequire hours, if not days, as they put it of human.
Uh, thought time to, uh, to solve for.
So you can really see, I mean, like everybody keeps saying this with newbenchmarks, um, if a model can solve this, then, then it's going to be AGI, right?

(44:18):
The only AGI will be able to solve this.
The problem is every time, you know, we keep seeing these new benchmarks come out, therekeeps being a trick, you know, some way to make models that do really, really well at it.
Occasionally those tricks actually have broader implicationsfor AGI kind of for the spillover in general knowledge.
Um, but, uh, but, you know, and that can happen quite often, but they, they certainlydon't require the full kind of AGI that, uh, that some people think they might.

(44:42):
So this one, yes, we're at 2 percent right now, successrates for cutting edge language models like cloud 3.
5, sonnet, um, you know, Gemini 1.
5 pro, all that stuff.
Uh, but, um.
Yeah.
Uh, unclear what's actually going to, going to get them there.
Is it a better agentic scaffold?
Uh, is it a better trained foundation model?
You know, what is it?
It's, it's, it's going to be interesting to see what actually ends up cracking this, this metric.

Andrey (45:05):
Pretty impressive to see, or at least a sign of the times you could say that now people are developing these absurdly difficult things that most humans could even try.
Like they have some sample problems.
Uh, this one that is in the paper, they say is high million, to medium difficulty.
Just to read the problem, it's construct a degree 19 polynomial p of x in c of x suchthat x has at least, has at least three but not all linear irreducible components over x.

(45:35):
Choose p of x to be odd monic, have real coefficientsand linear coefficient negative 19 and calculate p of 19.
So I don't know what that means.
And we Uh, solution they provide in paper is like an entire pagewith a bunch of references to various, uh, theorems and so on.

(45:57):
So this is like hardcore math over here, and I suppose it'snot surprising that current LLMs, uh, can't, uh, beat it just

Jeremie (46:05):
yet.
Yeah, and you can really see in that, that problem phrasing there, the, thelayering on of sequential requirements that makes it harder to guess, right?
You can't just like one shot that, um, even with aguess, like you'd have to guess multiple things, right?
Which reduces the chances that you get a, an anom anomalous result.
So it's all meant to make it harder.
Automated, automatically, uh, evaluate, evaluatable.

(46:27):
Geez, having a, having a hard time saying

Andrey (46:29):
the words.
And last up, we do have a new open source model.
This is a Hunyuan Large, an open source mixture of, uh, expertsmodel with two, 52 billion activated parameters by Tencent.
So this has a 389 billion total parameters.

(46:49):
And it's pretty beefy and impressive.
So it can process up to 256 tokens, uh, 256, 000 tokens, and does beat LLAMA 3.
170b on various tasks like logical reasoning, language understanding, and coding.
Seems to be, uh, Somewhat comparable to LLAMA 3.

(47:13):
1 400 50, uh, 405 B.
So certainly seems like Tencent is trying to flexthe muscle and showcase their ability to build this

Jeremie (47:26):
scale of model.
So one of the, one of the interesting things about this paper is, so they present a whole bunchof scaling laws and they share, you know, their, their thoughts about like, you know, how,
how many, um, tokens of text and, um, uh, of text data and how many parameters and all that.
So when you, when you do the math, at least by, by my math, um, uh, which, uh, Claude is veryhelpfully helping me with, uh, we get to a compute budget of about 10 to the 21 flops, right.

(47:54):
And compute budget is also something that, you know, It's good to beinterested in when you see a Chinese model, because one of the things
that they're really constrained by is us export controls on hardware.
And they find it really hard to get their hands on enough hardware to train these models.
So here we have 10 of the 21 flops.
So for reference, when you think about a GPT four class model, a llama, a llama three,400 B class model, you're looking at training budgets there of about 10 to the 25 flops.

(48:20):
So we're talking 10, 000 times bigger than.
Uh, is that right?
Yeah, 10, 000 times bigger than this model in terms of compute budget.
So, so I find this really weird.
They claim that this model is on par with Lama 3 400 B.
I may be missing something in my calculations.
If somebody like, if you can spot this, like please do.

(48:40):
Uh, this seems to me to be, Uh, very much stretching like this.
This seems very frankly, like implausible.
I must be missing something or the paper must be missing something.
But, um, if that is the compute budget, then, thenthey're doing something really janky, really weird.
And, uh, and that would be the headline, like if the actualbudget was that, but again, um, yeah, llama 10, 000 times greater.

(49:04):
training budget.
And, uh, and here they're, um, uh, they're saying that it, it performs on par with Llama 3.
1405B.
So that doesn't make any sense to me.
Um, would, would love to, uh, Yeah,

Andrey (49:16):
it seems that maybe there's a typo.
Maybe we didn't quite run the equation right.
They do say they trained for 7 trillion tokens and there are 59billion activated parameters that would mean that it shouldn't be that.
different on that order of magnitude.
So lots of details in the paper, they do talk about the architecture, the number oflayers, the attention heads, the type of attention used, which is also the case with LLAMA.

(49:44):
So these kinds of details on the nitty gritty of how this is implementedalways, I think is useful for pretty much everyone working on LLMs.
And on to research and advancements, we begin with some work from Google and some affiliatescalled Relaxed Recursive Transformers, Effective Parameter Sharing with Layerwise Laura.

(50:09):
So this is a pretty.
Novel, pretty interesting technique for getting more out of tiny models.
As we've seen, we've made more and more gains in the space of one and two billion parameter models.
And this one introduces the notion of recursive models.
What that means is they train, uh, like a vanilla transformer has N layers, right?

(50:32):
And each layer is distinct.
What we do in this paper is Say that you can take a setof layers and then basically stack them again and again.
So you have your P2 layers a few times in a row.
And just by doing that, you're able to, uh, still go to asmall size, but retain the performance of a larger model.

(50:56):
And that's probably the title of the paper.
The relaxed part there is that.
While they do repeat the, um, layers a few times, they still applyLoRa to differentiate, differentiate them slightly across layers.
So that, I think, is a neat little technique showcasing continued progress in the space ofbeing able to really squeeze out all the performance out of, uh, less and less parameters.

Jeremie (51:26):
Yeah, this is a really interesting paper for a lot of reasons, including the hardware interaction here.
But, um, for, for sort of intuition building, like I found this really weird when Iread it, to be honest, I was like, how I wasn't familiar with the literature around.
Cause there is some, um, around, I guess what they're calling recursive transformers.
People have done some little experiments.
Right.
And then

Andrey (51:45):
actually just to call this out, uh, it might be confusing.
So recursive.
Going back a little while, like there has been research on recursive different from recurrent.
So recursive is different because you're not kind of updating a hidden state.
There's no like, uh, time sequence element here.

(52:06):
It's really just, you have one input and you pass it through the same.
neural network several times to get to a better answer.
So you take an input, you pass it for a way to get an output,you put that output back through the same set of weights.
And that's what it means to be recursive.
And yeah, it's been out for a little while that it actually is possible to trainneural nets to be better after several recursive passes, several passes through itself.

(52:35):
And, uh, yeah, I'll let you, Jeremy, take over.

Jeremie (52:38):
Yeah, no, but, but that fact itself, right?
That, that's something that I was not aware of going in myself.
And, and it's, it struck me as quite counterintuitive, right?
You, you, you feed the same data as you put data into amodel, a layer one, and then you make it go through layer one.
And then instead of going to layer two, you may go backthrough layer one and over and over and over again.

(53:00):
And you get a better result out.
And.
You know, I was trying to build some intuition around this.
Best I could tell is like, so reading a book twice, right?
You're, you're kind of doing the same thing, even though you're usingthe same, um, algorithm that are the same, the same layers and all that,
uh, you're able to extract more and more information with each pass.
And so this is essentially the same principle.

(53:22):
Basically you're chewing on the data more.
Uh, you can think of it.
As a way of just expending more compute to in the process of chewing on that data.
If, if you want to compare it to just like, uh, feeding it through just the layerone time, now feed it through multiple times, you get it, you get a better result.
Um, so one of the challenges is, sorry, let's talk about the advantages.
First, the advantage is, uh, you are copy pasting the same layer over and over, which meansyou don't need to load a, you know, Uh, I don't know, uh, uh, an 8 billion parameter model,

(53:51):
maybe you get to load a 4 billion parameter model if you reuse every other layer, right?
Or, um, uh, anyway, you know, you can, you can keep playing games like that, wheredo you, you have layer stack like three times in a row, the same layer, and then.
A different, you know, the next layer and copy that one three times in a row.
Or it could be all the same layer.
There are all those configurations that are possible.
And so, um, one of the advantages here is it cuts down on theamount of memory that you need to use on your chip, right?

(54:17):
This is really good for memory usage.
Um, Still need to run the same number of computations, though, even thoughyour layers are identical, your weights and parameters are identical,
your data as it's, you know, the embeddings of that data are changing.
So you still have to run those calculations.
So the logic cost, the number of flops, the flop capacity ofyour hardware is still, you know, needs to be utilized intensely.

(54:42):
There is a way that you can Even get an advantage on that level, though, um, becauseso much of your, your computation looks the same, it makes it easier to paralyze it.
So they, they have a section of the paper on continuous depth wide batching, where they're talkingabout, okay, how can we leverage the fact that the layers are identical to make the actual logic.

(55:03):
Less demanding on the chip, which is, which is really cool.
But the, the really big boon here is for memory usage.
Cause you're literally, you're functionally cutting down on the size of your model, right?
Pretty dramatically.
Um, so that's, that's really cool.
It's, it's such a dead simple method.
There is this technique that they're using.
Uh, that seems to have worked best in terms of deciding whichlayers to copy paste, uh, that they call their stepwise method.

(55:25):
This was the, the one that worked best.
So basically they would take, if you have a, I don't know, like a 20layer transformer, um, they would take every other layer and copy it once.
So, you know, take layer one, Um, repeat layer one, one time, right?
Then take layer three, which would be the next one.
Cause you layer one, layer one, then layer three, layer three, thenlayer five, layer five, layer seven, layer seven, all the way up to 20.

(55:48):
And that's kind of the thing that they found worked best.
Uh, the intuition behind that just being that, hey, therewere like, There was prior work that showed that this worked.
So a lot of this is just sort of like janky engineering, but still a really interesting wayto kind of play with again, play with hardware, see what can we do with chips that have crappy
memory, but maybe good logic, you know, unclear, like which chips would would necessarily fitinto the category once they use this continuous depth wide patching, batching strategy, but,

(56:18):
but really interesting and a great way to get more out of your, yeah, out of your AI hardware.

Andrey (56:23):
This paper has quite a bit, uh, to it, a lot of details that are interesting.
So they do use a step wide strategy, uh, initially, but when they add this other trickof Laura for these layers to be able to adopt them slightly, uh, for different layers,
uh, they do a slight modification to a step wise method where I average two layers.

(56:46):
So like layer one is an average of layer one and four, and then the other one is the average of 2.
5.
Just empirically, they found this work better and you do need to, uh, they say up train.
So you need to train, uh, an initialized model for a little while to get it towork well, but they do say that you need to, you don't need to train it very much.

(57:07):
You have just like a basic 15 billion tokens of uptraining, a recursive Gemma one model outperforms.
Even full size pre trained other models like by FIA and tiny llama.
So yeah, it's, it's quite interesting and we'll beseeing, I guess, if this gets adopted in practice.

(57:28):
And,

Jeremie (57:28):
and, um, I don't know if we talked about the, the Laura adapter role in this kind of conceptual structure, but maybe just worth emphasizing when you lock
in, right, these parameters and you're just repeating the same layer over and over.
Um, you, you might want to give your model a little bit more.
Degrees of freedom than that, a little bit more of an ability to kind ofadapt to the new problem domain that you're going to be training it on.

(57:53):
And that's really where the lower adapters come in.
It gives the model a little bit more room to stretch itself, right?
Hence the relaxed, uh, um, qualifier here and relaxed recursive transformers.
You're giving it a few more degrees of freedom to, uh, to kind of modifyitself without that constraint of all these layers have to be the exact same.
So that's kind of the intuition.

Andrey (58:12):
Yeah, right.
So Laura also, uh, for some reference is a way to, uh, like say efficientlychange a bunch of weights by tweaking a smaller set of parameters.
you could basically reduce it to.
So that's the idea here is you're not updating, you're still sharing most of theweights, but you update a few parameters that make them a little more distinct.

(58:35):
And onto the next research, we got applying golden gate clodmechanistic interoperability technique to protein language models.
And this is, Not a paper, actually, this is more of an open source projectthat looked into the idea of, uh, applying the same technique that, uh, we've

(58:56):
covered, I believe like now a few months ago, where we had sparse, uh, howto encoders where that can be applied to LLMs to get interpretable features.
So you can say, uh, the, the famous example, I guess is the Golden Gate Bridge.
feature in Cloud, you can see that there is this kind of notionconcept within Cloud that gets activated for certain inputs.

(59:24):
And that is done via the sparse autoencoder technique that compresses theoutputs of certain layers in LLM and then finds regularities at a high level.
So this work was applying that same technique to to, uh, a modelspecialized in protein prediction, I guess, protein language models.

(59:46):
And, uh, they found some interesting features, uh, in this context.
And I think, uh, Jeremy, you read more into it, so I'll let you go ahead and take over.

Jeremie (59:55):
I mean, I, I really like this, um, uh, this paper and, and for, for context to like the, the SAE, the sparse autoencoder is a bit of a darling of the AI
interpretability world, especially among folks who care about loss of control scenarios.
And like, is my AI trying to, um, plot against me tryingto scheme as the, believe it or not the technical term is.
Um, so the, uh, the idea here is, yeah, you havesomewhere, so let's pick a middle layer of our transformer.

(01:00:21):
And we'll pick specifically the residual stream.
So residual stream is basically the, the part of the circuit in the, in the, I shouldn'tsay circuit, the, the, the part of the, the architecture that takes, um, whatever
the, the weights were from the previous layer, just copy paste them into the next one.
It's a way of preventing the information from degrading as it gets propagated through the model.

(01:00:42):
But anyway, uh, essentially pick a slice, uh, of, of your transformer and, uh, you, youfeed the model some kind of input and you're going to get activations at that layer, right?
Now pick those activations and use them.
As the input, okay, you're going to feed them to a model called their sparse autoencoder.
The sparse autoencoder is going to take those activations and it's going to haveto represent them using a small set of numbers, like a compressed representation.

(01:01:10):
So, you know, maybe you have, uh, well, as a cartoonishversion of this, say you have 10, 000 activations.
Um, then you want to compress them down to like 100 dimensional vector, right?
So that's what the sparse auto encoder is doing.
It compresses them.
And then from that compressed representation, it's then going todecompress them and try to reconstruct the original activations.
And the loss function it uses usually something like the, the difference betweenthe old and the reconstructed, the true and the reconstructed activations.

(01:01:37):
So basically it just gets really good at compressing these activations down to a smaller number.
representation.
It turns out, and Anthropic found this, that when you do that, uh, the, the individual entriesin that smaller compressed representation end up correlating to human interpretable features.
Uh, so for example, like the idea of deception mightbe captured by one or a small number of those numbers.

(01:02:04):
Um, the idea of, of a molecule might be captured, you know, in, in the same way.
And so this is basically just meant to be.
A way of taking this very complicated thing, all the activations in this residual streamand compressing 'em down to a manageable number of, of numbers that we can actually
get our arms around and start to interrogate and understand and interpret, right?

(01:02:25):
So that's kind of part of that, that hope of the alignment game plan is like we'll be able to usethis to understand the thinking in real time of ais that are very potentially dangerously advanced.
That's the theory.
Um, a lot of interesting success has been found there, including on steering the model's behavior.
So if we do something called clamping, we pick one of thoserep, one of those numbers and that compressed representation.

(01:02:45):
And let's, let's say it's the number that represents banana or encodes the idea of banana.
We crank up its value artificially.
And then we reconstruct the activations.
We can then get the model, uh, to based on those activations togenerate outputs that are tilted towards banana, whatever that means.
Maybe it talks a lot about bananas or something like that.
That was the golden gate, uh, Claude, uh, experiment, right?

(01:03:07):
So they found the entry that corresponded to the golden gate bridge.
They clamped it to give it a really high value.
And then that caused the model to yap on about the golden gate bridge.
So here, the question is going to be, will we find the same thing if we work on Uh, transformersthat are trained on bio sequence data and they pick a model that was developed by this
company ESM, sorry, sorry, this company Evo scale that's made the ESM series of models.

(01:03:32):
So we covered ESM three many months back, fascinating model.
Uh, it was the first ever bio sequence model, by the way, to meet thethreshold of reporting requirements under Biden's executive order back then.
Um, so it was a really, really big model.
What they did was they took a smaller model, ESM two thatthat company had built, and they played the same game.
Can we pick a, you know, a middle layer of, of that transformer?

(01:03:55):
Um, you know, build a sparse autoencoder and can we recover human interpretable features, right?
Can we find features that correlate with, in this case,uh, common structural components or facets of biomolecules?
A common example here would be like the alpha helix.
So if you put proteins together, um, certain kinds of, sorry, if you put aminoacids together, certain kinds of amino acids, when you string together to form a

(01:04:20):
protein, They'll they'll tend to form a helical structure called an alpha helix.
The other, um, secondary structure that they sometimes formis called a beta sheet or beta pleated sheet or whatever.
There are all these different, um, structures that these things will form dependingon the kinds of Lego blocks, the kinds of amino acids that you string together.
They all have slightly different charges.

(01:04:41):
So they attract and repel in these nuanced ways.
And it's notoriously hard to predict what the actual structure is going to be, but Well, hereusing this technique, they're able to find, okay, we actually have in our SAE, in that reduced
representation, we have some numbers that correlate with, oh, this is going to be an, there's goingto be an alpha helix here, a lot of alpha helices or, you know, beta hairpins or whatever else.

(01:05:03):
And so that's interesting from an interpretability standpoint, we can understand a little bitmore what goes into making these, uh, these proteins take the do, but then they also found
that by modifying the uh, values in that compressed representation by doing this clamping thingand artificially, you know, let's say we enlarge the value of the alpha helix, um, number,

(01:05:25):
you could actually prompt the model to output a sequences that would have more alpha helices.
And so this is kind of interesting from a protein design standpoint, right?
It's the first kind of tantalizing hint.
Uh, well, maybe not the first, but you know, bucket it with, uh, alpha foldas a series of tools that could allow us to better understand how proteins
fold and actually come up with designer proteins with certain structuralcharacteristics that otherwise would be really, really hard to, uh, to design.

Andrey (01:05:53):
And onto the lightning round, we begin with a pretty fun blog post from nap time to big sleep using large language models to catch vulnerabilities in real world code.
So this is by Google's Project Zero.
And this is a team that's been around for a while since2014 dedicated to finding so called zero They vulnerability.

(01:06:16):
So well, nobody's in code that aren't yet known or out in the wildthat hackers can then exploit without there being protections for it.
They have previously had this project nap time.
Evaluating offensive security capabilities of large language models.
They had this blog post several months ago where they introduced this framework oflarge language model assisted vulnerability research and demonstrated the potential

(01:06:45):
for improving state of the art performance on the CyberSec eval2 benchmarks from Meta.
That was a little while ago and now nap time has evolved to big sleepwhere Google Project Zero is collaborating with Google DeepMind.
And in this blog post, they announced a pretty exciting result from this big sleepagent, this LLM that's optimized for helping with, uh, I guess, vulnerability

(01:07:13):
detection, they discovered a vulnerability via this agent, an unknown, realvulnerability in a major project, SQL Lite, and reported it and the developers fixed it.
So to their knowledge, this is the first time an AI hasbeen used to find a real world vulnerability like that.

(01:07:35):
And this blog post goes into a whole lot of detail into a vulnerability that seems to be.
you know, somewhat tricky case, not, uh, uh, some sort of trivial discovery, so to speak.
So very exciting for implications of being able to fight

Jeremie (01:07:50):
hackers with AI.
Yeah.
And also a warning shot that, Hey, these things canactually, AI can now discover real world vulnerabilities.
It's always a double edged sword with these things, but, um,yeah, it's, and that's been a big, uh, question mark, right.
In the debate over Over AI and what risks it might pose is, you know, I've had debateswith people or, you know, they'll, they'll say, well, you know, we haven't seen an

(01:08:13):
AI system actually successfully discover cyber vulnerabilities in real world systems.
And so therefore, et cetera, um, now that we have, I mean, I, Iwonder, I wonder what the implications may be, but there've been.
Pilot studies, we've talked about a couple, you know, finding first, it wasone day vulnerabilities where the exploit has already been in logged somewhere.
And now you're just getting an AI agent to exploit it.

(01:08:34):
And then zero days, which is, you know, really figuring out without knowing whetherthere is even a vulnerability, finding one from scratch in kind of more toy settings.
This is the real world though.
This is finding one in a, I mean, SQLite is a very, very popular library.
Um, and this is an interesting, uh, an interesting bug, an interesting exploit.
It's a null pointer dereference, which essentially is you have a pointer that pointsto memory addresses, and this vulnerability allows you to control what it points to.

(01:09:03):
And so this allows you essentially to have some control over what gets written or read to memory,and that could could in principle, allow the attacker to pull off arbitrary code execution.
And, um, essentially, you know, if you, if you just point the, the pointer to, um, somespecific, you know, like buffer space or, or some adjacent memory, you may be able to

(01:09:26):
actually like draw that, you know, pull that data in and use it for whatever purposes.
So there are a lot besides that, there's just like making the application crash, right?
You just have to have a like fucked up pointer or something, and it just won't work.
Um, so all that kind of interesting, Uh, the, they go into how this thing works and it is, I think,quite a, quite an interesting improvement over current techniques, like the best techniques we

(01:09:49):
have right now, which include things like fuzzing, where you basically just throw everything inthe kitchen sink at your application, at your, at your software and just like see, uh, If anything
breaks, um, this is a much smarter approach, obviously, uh, powered by a thinking AI system.
Um, so yeah, pretty cool.
Um, and, uh, this was by the way, a bug that did remain undiscovered after 150 CPU hours of fuzzing.

(01:10:12):
So people had tried the standard techniques on this many times over.
Um, makes sense.
It is a popular library.
But, uh, but those techniques failed, whereas this, uh, AI power run succeeded.

Andrey (01:10:23):
And one more story in this section.
This one, not about progress, but rather lack of progress and some unknown research.
Uh, so it's about OpenAI.
There's been a report from the information about them.
We're probably working on new strategies to deal with an AI improvement slowdown.

(01:10:43):
So OpenAI has been working on something like a GPT 5, uh, theupcoming model has been codenamed Orion, and there you go.
That's the reference to Orion from before.
And the report is saying that it seems to not be showing assignificant an improvement over predecessors as in previous iterations.
So in the leap from GPT 3 to GPT 4, there was a massive improvement.

(01:11:08):
GPT 3 was pretty impressive.
GPT 4 was much more impressive.
And GPT 4 now was, oh, I don't know, like what, two years, two years old.
No, a year and a half old.
It's been a while since GPT 4 came out, and we haven't had that kind of leapsince, uh, except maybe you could argue with, uh, O1, with the introduction

(01:11:32):
of inference time compute, we saw some pretty significant qualitative gains.
Regardless, This report from the information is saying that the sort of commonly used the standardtrick of more data, more compute, more scale may not be as effective as it was previously.

(01:11:54):
So, uh, of course we are having a scarcity of new training data.
That's one of the issues is most of the internet has already been sucked up.
And the surveyors reportedly of this new foundation.
team within OpenAI looking at possible alternatives to just scaling, like doing most, uh,more in the post training phase, uh, doing more with synthetic data from AI models, et cetera.

(01:12:20):
Now OpenAI has, uh, not commented on this and has previously saidthey have no plans to release Orion or anything like GP5 this year.
So I guess it can take a grain of salt, but also maybe not super surprising.

Jeremie (01:12:35):
Yeah, I think this is such an interesting part of the debate and the question over scaling, right?
So there's, there's a question as to whether, so when we look at scalingcurves, what we're typically talking about is how well does roughly the models
next word prediction accuracy improve with more data, more compute, right?
And model size?
Um, What the challenge is, that doesn't tell you how that improvement in next word.

(01:13:00):
Sorry.
Next word prediction accuracy does not necessarily tell you how generally useful a modelis, you know, how good it is at reasoning other things that we might actually care about.
And so you've got this very robust scaling law that tellsyou that the model is getting better predicting next tokens.
Um, but, but then uncertainty about the, the value that's being created in that process.
So that's one dimension of uncertainty without knowing what's going onwith Orion, what the training data looks like, what it's intended to do.

(01:13:25):
Like, is this another reasoning system?
It seems like it's not supposed to be, but there's a lot of fog of war here.
Um, without knowing that it's hard to know whether that what I've just describedhere is an important part of the uncertainty or whether it's, you know, like a
reasoning model and the inference stuff isn't working out from what I've seen,it seems like it is more likely to be the former that this is really meant to be

(01:13:47):
a beefy pre trained, you know, GPD five type model, as opposed to a one, whichwas, you know, putting, I mean, I really don't want to say bells and whistles.
It's way more than that, but, uh, it certainly is.
leaning more towards the inference time paradigm.
And that was, that, that's the big leap there.
You know, we have separate inference, inference time scaling laws nowto, to ride as well that compliment the training time scaling laws.

(01:14:09):
So that may well be enough to do some really interesting things.
But, um, yeah, there, there's a whole bunch of interesting, you know, gossipabout open AI in here, apparently back when, um, 20 percent of its training run.
Um, Sam was really excited about it and was talking internally about how this would be a big deal.
That's where he hyped it up.
It seems that that hype has failed to materialize.
And that's really what's kind of an issue here.

(01:14:31):
Um, there's also questions about what hardware this stuff is being trained on.
Like, what is this training run?
I'm guessing it's the H 100 fleet that opening eyes running right now to train this.
Um, at what scale, like, what are they really pushing in terms of scale?
We don't know.
Really hard to know.
Um, and just more generally, so they, because they are settingup this foundations team to explore kind of deeper questions.

(01:14:53):
Now, you know, if, if the default path to scaling the engineering path,we'll call it right where you just, you know, build faster horses.
If, if that doesn't, doesn't work, what do we do instead?
Right.
That's the big question.
I think in this instance, um, Open AI is really and quite ironicallyput itself in a, in a difficult position over the last few years.

(01:15:13):
Right.
They've bled off.
I think it's fair to say all of their best or not all, much of their best algorithmic design talent.
Right.
So Ilya Setskever has left.
Um, we've seen, you know, the safety team, Jan Laika, uh, we, uh, we've seen, um, Yeah.
Anyway, like basically a huge, huge amount of talent, including product talent.
We had, uh, Barrett Zoff leave recently too.

(01:15:35):
There's like really, really good folks who are gone in many cases to anthropic.
And so if it is the case that we're moving from a domain where it's about exploiting a paradigm,um, in other words, doing really good engineering and getting, uh, scaling to work really well.
To a paradigm where we're looking for new ideas instead, where that's the main bottleneck.
Then you might anticipate talent being the main limiting factor,in which case anthropic starts to look really interesting, right?

(01:16:01):
You've got a lot of companies now that could be, that could be competing here.
Meanwhile, opening eyes hamstrung by a relationship with Microsoftthat is currently tepid at best in the recent investor communications.
Microsoft did not refer to opening eye in the future tense at all, right?
That is a big, big change.
Um, so.
As that starts to happen is opening eyes forced to work with companies like Oracleto develop infrastructure because Microsoft apparently isn't meeting their needs.

(01:16:26):
There's tension there too.
Like this starts to become really interesting forSam and he's got to find a way to square this circle.
He's got to find a way to keep raising money.
He's got to find a way to keep scaling for what, what that's worth.
And then he's got to retain talent.
Um, it, it would be interesting if this turned into a very significant,uh, Structural challenge for open AI, if they've, if they've doubled

(01:16:47):
down too much on scaling, but again, this is all speculative.
We don't know until the models start dropping.
And frankly, I think when the, the, uh, Blackwell series ofGPUs come online, we get those big clusters running next year.
I mean, look, everybody I know in the space really expects bigperformance improvements from the early tests they're doing.
I suspect we'll be, we'll be looking back on scaling as like, yep, that, that was a realthing all along, but, um, but if not the implications for open AI, at least are interesting.

Andrey (01:17:14):
That's right.
And also worth noting, this is not sort of unique to open AI, right?
It's an open question in general, if it is even doable to scale in part, because of training datarunning out, that was a speculation for a while and just, just paint a high level picture, right?
What scaling means is GPT 3 was like around 180 billion parameters, GPT 4we don't know, but the speculation or the rumors where it was like around

(01:17:43):
closer to 2 trillion total parameters, but it was a mixture of experts models.
So there was.
Some smaller set of activator parameters, uh, and so GPT 5 or whatever the next model is, Orion, youcould say maybe would have 10 trillion total parameters or 20 trillion, you know, that kind of jump
in size and speculation is, well, If you do that same kind of move from GPT 3 to GPT 4, GPT 2 toGPT 3, and just add more weights, add more scale, add more training, will you get that bigger jump?

(01:18:15):
Right now, it's unclear, right, and this report is basically claiming, or seems to claim,that maybe it's not quite as successful as it has been in the past, but it remains to be seen.

Jeremie (01:18:28):
Yeah, I think, uh, you know, worth noting, though, on the data scarcity side, like, There is eventually a data wall, presumably, unless synthetic data carries us over.
The data wall, though, is expected to kick in about an order of magnitude offlops, like training compute further than, for example, like power constraints.
Um, so And right now we're not even close to power constraints in our current runs.

(01:18:51):
Like we're, we're seeing, you know, 10 to the 26 flopruns next year, probably shading into 10 to the 27.
Um, that's still like two orders of magnitude before you hit even the power constraints on the grid.
So right now I don't think that that the data scarcity isactually the thing driving the, the limited capabilities here.
I think there's something sort of.

(01:19:14):
Something else is going on here and we'll, we'll presumably have to wait and see.
Um, that's part of the reason why I'm curious, you know, what happens thatnext beat when we get the Blackwell clusters online, when we start to see
the a hundred thousand, uh, you know, GB 200 GPU clusters running, like.
Do you then see the transcendence to use the Saudi Arabian terminology for this?

(01:19:35):
Do you start to see that kind of improvement?
I don't know.
But, uh, I think it's, yeah, there's a lot of experimentation withmany billions of dollars that are, that will be run, uh, to find out.

Andrey (01:19:46):
That is being run.
Yes.
You know, this is a big question and I guess we'll find out.
Alrighty, moving on to policy and safety, and as promised, we are going totalk about Donald Trump's victory for the presidential election in the U.
S., and in particular, what it means for AI.
No political commentary from us, even though as a U.

(01:20:08):
S.
citizen, I have some opinions.
But regardless, Donald Trump is going to return to the White House.
And there are, there's not a ton we know about specifics of what might happen,but we do have some pretty decent ideas as to at least some of what will happen.
So for instance, we do know Trump's administration is presumably going to repealpresident Biden's executive order on AI, which we've covered plenty on the podcast.

(01:20:36):
This is a very big, Uh, order, not a law.
So, uh, the Trump administration, because this was just an executiveorder, the Trump administration could just cancel it more or less.
Now, there might be, uh, retention of some of the features of that.
It might be revising it rather than fully canceling it.

(01:20:57):
Uh, but it does seem likely, at least.
We don't know for sure.
Certainly there'll be revisions to that.
And then of course, we know that Trump loves to fight with China and that'sbeen an ongoing situation in the US for a while, so there are probably more be
more about, but Jeremy, you're, you're a policy guy, so I'll let you do more.

(01:21:18):
So we're talking here.

Jeremie (01:21:20):
Yeah.
I mean, I used to think of myself as a tech guy, I guess.
Uh, yeah, I guess half, half and half a bad.
Yeah.
No, look, I think, um, it's funny because in the, so, so the, thepolicy universe I live in is, is the national security policy universe.
Um, to the extent that I live in the policy universe.
And, um, I think that there are a lot of people in the kind of, uh, generalAI safety world who, who are really concerned about a Trump administration.

(01:21:45):
And I actually think that a lot of those concerns are quite misplaced.
Like, I think this is a misreading of, um, of what we need and where we are.
So just for, for context, we've seen Trump on various podcasts.
That's all we have to go on, by the way.
And this article, uh, goes in depth into comments that Trump's made on.
There's

Andrey (01:22:04):
been no promises, uh, no guarantees.
So this is kind of reading two leaves and guessing based on various comments.

Jeremie (01:22:10):
Yeah, exactly.
Exactly.
And so you've got, you know, Trump is, is rightly described in myopinion, AI as a superpower and cause it called its capabilities alarming.
Um, he's also referred to China as the primary threat in therace to build advanced AI, which I think is also correct.
Um, and, uh, and then you've got this interesting question as to, youknow, cabinet staffing, like Elon is a massive influence in the cabinet.

(01:22:33):
And to the extent that that, or I should say, sorry, on the transition teamand broadly on the team, I don't know that he'll be in the Cabinet officially.
'cause he's kind of busy with a lot of companies, . Um,but he, you know, obviously a massive influence.
Very concerned about.
Everything from weaponization to loss of control.
A lot of, uh, good quotes from Dan Hendricks in this article who advises Elon quite a bit.

(01:22:54):
And, um, then the question is that's Musk.
Uh, that's Elon.
You've got Vance on the other side, Trump's VP, obviously, um, who's expressedconcerns in the past over closed source AI entrenching the tech incumbents.
Now, I mean, I think this is a, it's a very rational concern to have, right?
Like you don't want closed source, uh, Um, uh, you know,pure plays and not allow people to open source stuff.

(01:23:18):
I think that, you know, that is going to start to change inevitably, uh,as, as you start to see open source models actually getting weaponized.
And it's just going to become super obvious to all concerned.
And at that point, the administration clearly is preservingtheir optionality to, to, to, uh, Go in that direction.
Um, at the time it was some big questions here remain around the AI safety Institute.
For example, uh, that was sort of a, a spawn off of the executiveorder, at least the kind of, a lot of the bones were laid there.

(01:23:43):
Um, interesting question as to, as to whether that remains,it is the case that most Republicans do support the AZ.
Um, it's, it's a, a part of the broader American strategyon AI and it's a, certainly a home for expertise.
Question as to whether Paul Cristiano continues to run it.
You know, that's another degree of freedom they have.
They keep the AZ, but swap out, uh, Paul Cristiano, who, you know, the former headof alignment at OpenAI, who invented reinforcement learning from human feedback.

(01:24:07):
Um, so that, that would be an interesting, an interesting question.
Um, but then more broadly, the executive order, right?
The famous Biden executive order, 110 pages, it was the longest EO in living memory.
You know, I think there are a lot of components there that are, uh, problematic.
likely to be preserved in a, in a Trump administration.
Um, I think you'll see some stuff get scrapped.

(01:24:27):
Look, that EO did tons of stuff.
It talked about, you know, bias and civil rights and all kinds of stuff under the banner of AI.
Um, I think you could well see that get, you know, carved out, hollowedout, you know, the, you know, Trump has said, he's going to rip out the EO.
That's not a, That will probably happen, but what itgets replaced with is really at issue here, right?
How much of the national security stuff gets preserved?

(01:24:49):
I wouldn't be surprised if we end up seeing an awful lot of that stuff still in there.
Um, and, uh, anyway, there's all kinds of questions as well about,uh, you know, what do we do on the energy infrastructure side?
What We have a ton of work in the United States to do, to get energy back on the table.
Like we have forgotten how to build nuclear plants.
We can't build them faster than 10 years, right?

(01:25:10):
We like, we need a way to keep up.
We just talked about the, the power bottleneck and how that kicks in at about 10 to 29 flops.
Well, that's coming.
Like that's, that's the training runs of like two, three years from now.
Um, if it takes you 10 years to build a nuclear plant, then like.
You've got to change something pretty fundamental.
We need to get natural gas online.
We need to get geothermal potentially, um, and a lotof these things align with the kind of Trumpian school.

(01:25:33):
So making sure AI gets built here, the questions are all goingto be around, you know, what about things like loss of control?
What about things like weaponization and open source?
Those are the big question marks.
And right now, again, like it's an administration that'spositioned itself very, very, very openly, very flexibly.
Um, and, uh, and you know, the, the China angle I think is, is a very bipartisan piece too, right?

(01:25:56):
So I don't think we're going to see all the export controls that have put in, get ripped out.
I think those are actually going to be bipartisan and maintained where we might seea change, um, would be the Trump administration, maybe focusing more on enforcement.
Right.
We've covered a lot, the leakiness of these export controls under the current administrationwould be great to see, you know, actual loophole closing as fast as loopholes are opening.

(01:26:18):
And that's something you could see.
So, um, you know, one last kind of meta note here, the uncertainty that we see.
Around what might come out of the Trump administration here reflects uncertaintyin the technology, but it also reflects the classic kind of Trumpian move
of maintaining uncertainty for the purpose of negotiation leverage, right?
You see this with tariffs and all the discussion around that the threat has to be credibleso that it actually leads to leverage internationally is something that we've seen.

(01:26:45):
other administrations anyway, struggle with is like, if it's, if, if you're speakingsoftly and you're not carrying a big stick, then people will not take you seriously.
And to the extent that there's a lot of negotiation to do with China on thisissue, you, you may actually want to negotiate from a position of strength.
And for that, you need to have the energy leverage and, uh, and other things.
So I think big, big questions around the AZ Big, big questions aroundwhat the focus is on open source and, um, and on a loss of control.

(01:27:13):
But with Elon there, um, I think there's, uh, there's a lot of, uh, roomfor, uh, for positive stuff to happen potentially on the, the safety side.
So yeah, I think the, the, the story is again, much more positive.
A lot of the people who I know in the kind of AI safetyworld, um, seem, seem much more concerned about this.
And I, I think part of that, you know, It may just reflect a, a concernover, you know, frankly, politics, like some people are just, they just

(01:27:41):
don't want, they don't want this administration and that's part of it.
But, um, right now it's unclear and, and the, you know, we just got to wait and see.
I think there's some really good policies that have beenput forward, uh, generally on the energy side and elsewhere.
So wait and see is, is the best approach probably.

Andrey (01:27:56):
Right.
Yeah, that's generally my impression.
Also, this article goes into and basically does lay out the picture of that.
It doesn't seem like there's any obvious big overturnings of what's been happening.
There's going to be a lot of tweaks, presumably.
Similarly, with the CHIPS Act, which was one of the major movements duringthe Biden administration, Trump has been somewhat critical of it, but

(01:28:18):
it's unlikely That the Republican Congress and Trump will repeal that act.
They might kind of revise it, but it does seem more likelythat that will stay in place and continue being a factor.
So that's, I guess, this article's summary and our bestguess at the implications of a Trump presidency for AI.

(01:28:40):
We will have to wait and see what happens in practice.
And speaking of evading AI sanctions from the U.
S., the next article is a fab whack a mole Chinese companies are evading a U.
S.
sanctions and this is a bit of an overview, I suppose, so it's talking about the needfor AI competitiveness, covering the sanctions, and talking about Talking about how

(01:29:08):
companies such as Huawei are exploiting various loopholes to acquire advanced semiconductor manufacturing equipment, which is then enabling them to build large AI clusters.
So again, Jeremy, I'll let you take over on this one since this is your real

Jeremie (01:29:26):
house.
Oh, yeah.
Well, I thought that's okay.
So, so I will always shill, uh, semi analysis any chanceI get, uh, semi analysis is an amazing newsletter.
If you're into AI hardware stuff, hardware stuff, I should say in general, uh, go check them out.
The blog posts are really technical.
So unless you.
Kind of know the hardware space tough to tough to justifya subscription if you're not getting all the value out.

(01:29:50):
But, um, they are, if you're in that space, I mean, you're probably already subscribed.
These guys are amazing.
Um, so this is, yeah, a report on the.
Uh, really, uh, difficult enforcement challenges that are facing the Departmentof Commerce and BIS as they look to, to enforce their export controls on AI chips.
But, um, I just want to give you, uh, an excerpt from this report.

(01:30:14):
Uh, they're talking about, um, SMIC, which is.
China's answer to TSMC, obviously.
Uh, so they produce all of China's leading, uh, leading nodes on the hardware side.
So they say sanctions violations are egregious.
SMIC produces seven nanometer class chips, including the Kirin 9000Smobile, uh, SOC system on a chip and Ascend 910B AI accelerator.

(01:30:36):
Two of their fabs.
Okay, two of their fabs are connected via a wafer bridge.
Okay, so wafer is the thing that you, it's like this bigcircular thing that, that is made of silica and silicon.
And you, that's what you etch your, your circuits in.
And anyway, this is the starting point for your fab process.
Um, so two of their fabs are connected via wafer bridge suchthat an automated overhead track can move wafers between them.

(01:31:01):
But for production purposes, this forms a continuous clean room and effectively one fab.
But for regulatory purposes, they're separate.
One building is entity listed by the U.
S.
In other words, one building is owned by an entity that's on a blacklist.
You're not allowed to sell advanced AI logic to them.
Um, and because of national security concerns, whereas the other one is freeto import these like, you know, Dual use tools and it claims to only run

(01:31:28):
legacy processes and yet they're connected by this physical fucking bridge.
Like this is how insane it is.
You basically have one facility and we're just going to trust China and SMIC that they'renot sending something like a way for right when it should be going left type of thing.
That's the level things are on.
They go into detail on stuff that we've been tracking for a long time.

(01:31:50):
So, um, there is a.
A fab network that is being run and orchestrated by Huawei, where theyspin up new, um, subsidiaries basically as fast as they can to evade us
export controls right now, us export controls work on a blacklist basis.
So you basically say.
Okay.
Okay.
Okay, we're gonna name new entities and organizations.
You are not allowed to sell advanced semiconductor manufacturingequipment to, um, and we try to keep that list fresh.

(01:32:15):
Well, Huawei is just gonna kind of create new, spawn new entities as fast as they need to.
And they have this vast network now, um, that is basically movingHuawei into the center of what you might think of as China's.
Maybe AI ambitions, like if you start to think about what is the, um, not even the openAI of China, but what is, what is the coordinating entity for a lot of China's big scale?
The I work, it is increasingly Huawei, both on hardware and software.

(01:32:38):
So, uh, there are all these, uh, these pushes to, to get Huawei lookedat and, and all this and, and what the, um, this report argues for,
and I think is quite sensical is, uh, you need to start to think about.
tightening in a broader way your, uh, your export control requirements.
So instead of just saying, Oh, look, we've got a blacklistand we're going to try to keep that blacklist fresh.

(01:33:00):
Um, instead using, let's say a wider range of tools to require any material thatis at all us Uh, fabricated in the whole supply chain that that can't be shipped.
So, so even if you're at ASML, you're buildingsomething that has any component of us technology in it.
If you ship that to, uh, to China, that's a no, no, like these broader tools are becoming necessaryjust because otherwise you're playing this whack a mole game that you're destined to lose.

(01:33:26):
And at this point, the stakes are just, just way, way too high.
So, um, by the way, I say this.
You know, semi analysis is they are AI accelerationist, uh, in their bones, right?
This is like, they are not kind of AI safety pilled,uh, as far as I can tell, it's quite the opposite.
And here they are saying, no, no, like we need to like fucking ban, uh, theexport of this hardware to China in a very robust and unprecedented way.

(01:33:48):
I think this makes all the sense in the world.
If you believe this is ultimately dual use technology, then that's what you got to do.
Like we can't be updating blacklists every 20 minutes.

Andrey (01:33:57):
And just a couple more stories.
The next one is, uh, they're much related to thatprevious one, actually an example of sanction violations.
So the story is that the U.
S.
has fined the company GlobalFoundries for shipping chips to a sanctioned Chinese firm.
So this is, uh, 500, 000 penalty on this New York based company, Global Foundries.

(01:34:22):
It's the world third largest contract, uh, chip maker, and it has shipped chipswithout authorization to an affiliate of, uh, SMIC, the Chinese, uh, chip maker.
And this was 74 shipments of 17.
1 million worth of chips to this company SJ Semiconductor, which is affiliated with SMIC.

(01:34:50):
Interestingly, this also says that Global foundries voluntarilydisclosed this violation and cooperated with the Commerce Department.
And there was a statement from the Assistant Secretary forExport Enforcement, Matthew Axelrod, that says, we want U.
S.

(01:35:10):
companies to be hyper vigilant when sending semiconductor materials to Chinese parties.
And

Jeremie (01:35:17):
the global foundries came out and said they regret quote, the inadvertent action due to a data entry error made prior to the entity listing.
So a data entry error blamed for this look, probably true.
Uh, and this is the stuff is really difficult to enforce, especially when youhave A very complex set of layered kind of requirements and all this stuff.

(01:35:37):
Like, you know, the, the rules right now are not simple.
And that, that is a challenge for enforcement.
Um, you know, so, so maybe no surprise to see this is yet another, another kind of leaky situation.
Obviously TSMC had, had similar issues recently, right?
They accidentally sell, sold some stuff to, uh, uh, to, to a Huawei affiliate.
But this is just what happens.
It's part of the reason why you just need stronger incentives, right?

(01:35:59):
Companies like global foundries.
are running processes that are subject to these kinds oferrors, then that just implies, okay, they need to try harder.
The incentives just need to be stronger.
Um, you know, to, to kind of bump back to that semi analysis report that we weretalking about earlier, uh, one of the, the call outs that they make is, you know,
the, the, um, industry side of this has been claiming that this would wreck.

(01:36:22):
Uh, you know, tighter export controls would wreck industry and blah, blah, blah.
And, and they've actually been doing better, not worse, um, uh, including, youknow, decent sales to the Chinese market, um, uh, in the last few years, this has
been an absolute boom time for them in spite of increasingly tight export control.
So the economic argument may be faltering a little bit here.
Um, but, uh, but yeah, we're seeing in real time, these holes kind of appear and yes, youknow, Get plugged like this, this will get plugged and then there are going to be new holes.

(01:36:49):
It's this, yeah, never ending game of whack a mole againto, uh, plagiarize the semi analysis, uh, post title.

Andrey (01:36:55):
And last up, the story is that on fropik has teamed up with a Palantir and NWS to sell its AI to defense customers.
Uh, quite related to the story last week, we had with Meta, uh,altering their license, their user agreement to let defense in the U.

(01:37:15):
S.
use it.
Uh, now the, this collaboration would allow CLAWD, the chatbot from Entropiq, uh, to beused within Palantir's defense accredited environment, this Palantir Impact Level 6, I
don't know what this is, reserved for systems containing data critical to national security.
So Entropiq previously has, uh, you know, I guess, prevented useof onetropic or at least precluded it in their agreements for U.

(01:37:45):
S.
defense customers.
And, uh, per this article and per what we discussed last week seems to be part of a general trend.

Jeremie (01:37:54):
Yep.
Anthropic I have heard has been really transparent, um, internally with theirown, uh, their own teams about this and, and the deliberative process behind it.
I mean, I actually think, you know, this is, you, you want, um, anAI safety focused org to be working with the U S government to have
them understand what's going on, including in defense contexts.
Uh, and this is going to be for, yeah, in intelligence analysis, that sort of thing.

(01:38:17):
Um, so yeah, I mean, like, I actually think they're going to face a lot of flack for this.
I think this is a, a good move.
Um, and, um, and the Palantir partnership is, is actually going to be really important for them too.
Cause you know, selling into DOD is hard.
Uh, you want, you want to work with someone who really understands that process.
So, uh, yep.
This is, uh, another, another big boon for, uh, anthropic potentiallybecause that market is also just, it's really big and it's what, you know,

(01:38:41):
You know, anthropic needs to do to understand both their, their customer.
They're a really big potential customer.
Well, and also for their own mission, they need to be able tointegrate, integrate, integrate tightly with the U S government, with
the national security parts of the U S government and all that stuff.
So, uh, yeah, we'll, we'll see where this goes.
And if we end up seeing more reporting about this, uh, this deal.

Andrey (01:39:00):
Yeah.
And then speaking of the government, this news also covers that, uh,cloud has common to AWS gov cloud, which is a service designed for us.
government cloud workloads.
Wasn't aware there was a gov cloud, but that's neat.
So seemingly it's not just for military.
It's also just in general for use within the U.

(01:39:23):
S.
government.
And that will be it for this episode of Last Week in AI.
Once again, you can go to the episode description for links to all the stories.
Also go to lastweekin.
ai for those links and for the text newsletter.
We always appreciate your comments, your views, your tweets, your comments, all of those things.

(01:39:46):
But more than anything, we do appreciate you listening.
So please keep tuning in and hopefully enjoy this AI song.
That is not terrible.

All Episodes

#189 - Chat.com, FrontierMath, Relaxed Transformers, Trump & AI

Episode Transcript

Popular Podcasts

On Purpose with Jay Shetty

Stuff You Should Know

Dateline NBC

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}#189 - Chat.com, FrontierMath, Relaxed Transformers, Trump & AI