#162 - Udio Song AI, TPU v5, Mixtral 8x22, Mixture-of-Depths, Musicians sign open letter - Last Week in AI

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Andrey (00:10):
Hello and welcome to the latest episode of last week, and I begin
here to chat about what's going on with AI.
As usual, in this episode, we will summarize and discuss some of last
week's most interesting AI news.
You can also check out our last week in the newsletter at last week in AI
for articles we did not cover in this episode, I am one of your hosts,

(00:30):
Andre Karanka. I finished my PhD at Stanford last year
and I now work at a generative AI startup.

Jeremie (00:37):
And I'm your other host, Jeremy Harris. I am the co-founder of Cloud
Sunni AI, which is an AI national security company.
We did the rounds, I guess, in the media recently, and that's been fun
dealing with all of that.
And yeah, by the way, I think we were talking about this.
I'm not sure if I'll actually do it this episode, but, with you that just
came out, we were talking about using a song that we generated in Julio

(01:00):
as the podcast intro source. You may have just heard that for the first
time, which is kind of cool.

Andrey (01:03):
That's right. You may have. We'll see.
When I edit it, I will probably do it and that'll be quite fun.
So yeah. Good example of how these kinds of things are
actually going to be useful for people, I guess.
Right.

Jeremie (01:17):
I was dude, so impressed with all those tracks.
Like, I don't know, I've spent an ungodly amount of time playing with it
and should have been working, but yeah, it's it's amazing.
And then another thing, another weird piece of personal ish news.
So I was in New York City, the last couple days, I ran in for the first
time. I ran into John Freaking Crone from the Super Data Science

(01:38):
podcast that we've been plugging for the last little bit.
And Sadie Saint Lawrence, who is the, she does the Data Bites podcast.
She's the founder of Women in Data.
And I you know, I've known the two of them for a long time.
John Crone especially, you know, we were big fans of, of John.
There was, we call ourselves his cronies.

Andrey (01:57):
Haha.

Jeremie (01:58):
Anyway, so in front of the show, John Crone and, yeah, he's he's taller
than I expected.
He said that I was, and this is interesting.
He said I was taller than he expected.
And the difference is, I thought he was like, you know, at least 600,
like six two. Small difference. But he was he looked at me like I thought
you were like. It was like much, you know, I thought, you're much
shorter. So I wonder, really, am I giving off short guy energy?

(02:20):
I don't know what this is. Maybe. Andre, if you at some point, want to
guess my height, that'd be great. And I'm sure the listeners would
appreciate it, but we can get on with the show. That's that's it for my
personal update.

Andrey (02:29):
That's right. That's going to show.
Just one quick disclaimer. We did miss last week.
Just scheduling stuff. Wound up not working out.
But we are back this episode.
We're going to cover some of the news you missed last week.
And, a lot of the exciting news that's been going on this past week
was a lot of stuff. So let us go ahead and, just dive in,

(02:51):
starting with the Tools and Apps section.
And of course, we have to start with the big news here, which is,
as Jeremy mentioned briefly, youdo.
So there's been a lot of excitement in the music generation
space. We covered sumo and how they generate really,

(03:11):
really, you know, high quality, almost indistinguishable
from real songs. And now there's a new entrant in that space
called UEO that was founded just in December
by for for former employees, at DeepMind.
And they just came out with the model.

(03:32):
We just got some samples of the songs they produce, and
they are really good. It's it's, you know, again, really
hard to catch any sort of AI weirdness
in the tracks.
They just sound really good.
You can start with,

(03:55):
30s. And then extend them so they produce.
Yeah, about two minute songs, roughly.
And, there are a lot of backers for this one, you
know, some heavyweight investors, a16z and also some,
notable people in the music space like, common and will.i.am

(04:16):
and other investors.
So a very much, big deal in the music commercial
space with this new competitor.

Jeremie (04:25):
Yeah, apparently so they've raised, and this is going back to just
a couple days ago, apparently raised $10 million so that,
you know, seems seems small, but I guess that's pre-launch.
The product is super impressive. The one thing is we get so spoiled
so fast. I'm generating, like, incredible quality
music, like jazzy beats, whatever, with lyrics to.

(04:48):
Right, that are really impressive. That's one of the wild things about
this is you get lyrics that almost always make sense, you get some weird
aberrations. And I ran into a couple where, you know, the it sounds like
the lyrics are saying nonsense and maybe you hear people in the.
Background. And kind of maybe gets stuck in a, in a local,
local gutter or something, but really, really impressive.

(05:09):
It takes maybe about, in my experience, about five minutes to actually
generate the 30s of audio.
And insanely, my brain got so hedonistic, adapted
to like, I can do this music generation now.
And my immediate next response is like, wait, what?
Is it? So slow? Like literally like easy, bro.
Like 20s ago you couldn't do this.

(05:29):
Let's just give it a say. So it's just wild how quickly we adapt.
But, yeah, super impressive product.
This is probably going to be a game changer for a lot of different
things. It's so hard to think of all the things, but just the ability to
make, you know, catchy beats that help you maybe memorize things for
educational purposes. This is one application I saw flagged on Twitter.
A lot of people were talking about, you know, the marketing, commercial

(05:52):
applications of stuff, but, you know, definitely valid to call this a
ChatGPT moment, I think, for, for, music generation.

Andrey (06:00):
Yeah. And the competition now, I mean, generally people say this
the quality is similar between nessuno and, you know, you could say maybe
even is a little cleaner.
So, there's now.
Yeah, two big competitors there.
And Edo is trying to distinguish itself a little bit by saying they're
aiming more to cater to musicians.

(06:22):
So they have this ability to kind of control a little bit more
what the generation is like, tweak it.
And so on the article here in Rolling Stones
also mentioned that they were able to, or they found some
examples of songs with, vocals that sounded

(06:43):
a lot like the vocalist, Tom petty.
So there is. Yeah, I guess it is more flagging, but it seems very
likely that both studio and, sumo are training
on copyright data, which is a bit of, gray area for
sure. And something that some people, as we'll get to,

(07:03):
are unhappy about.
But, that's just how I guess it is right now.
They are hitting the ground running, getting these models trained, and the
results are pretty mind blowing right now.

Jeremie (07:15):
Yeah. I'm also really curious about just like the business model and how
sustainable it ends up being. Again, you know, we've seen so many
companies like this get gobbled up, scale, you know, the next, the next
beat. Right. What can GPT five do?
What can GPT six do? Did they eventually just end up being able to do
these things trivially? And the foundation kind of the most general
purpose foundation model eats the rest of the world.

(07:38):
Interesting question. Let's see how it plays out.
Cotton.

Andrey (07:41):
Yeah. And next up, the next story is Anthropic Launches
external tool use for cloud AI, enabling stock taker integrations, and
more. So anthropic has launched the beta of
this tool use functionality allowing cloud to use,
third party features. Basically API v users can insert

(08:04):
simple code snippets in the API interface,
and then cloud will just go ahead and use it.
This is something that of course has been around for a while and stuff
like ChatGPT. So, yeah, this is
another, you know, rollout of features for cloud that makes it

(08:25):
more competitive. Apparently all cloud AI models can handle
choosing from over 250 tools with pretty strong accuracy.
So, yeah. Cloud, really, you know,
based on its performance and cost is, I think,
starting to be a pretty strong competitor to OpenAI.

Jeremie (08:45):
Yeah. When you talk about agent, agent like models, agency, that sort of
thing, you know, the things that are preventing us from getting, you
know, to AGI or something like it through agents.
It, it part of it goes through this ability to choose tools
and then use those tools really accurately.
You know, this article cites the over 90% accuracy in choosing tools

(09:06):
from like a list of 250 tools that anthropic has,
90% sounds really high.
If you have to change the use of many tools in a row, and then you have
to actually use those tools correctly, you not only have to select them
correctly, you have to call them, you know, with the right sort of API
and the right arguments.
You know, that that could really quickly eat away at the overall

(09:29):
successful percentage completion of a, of a long series of tasks or
complex tasks. So I think that's actually something that is going to have
to change over time. It's clearly improved a lot.
Called three is really, really good at this.
GPT four has just gotten a little bit better at this.
We might talk about that later today. But, you know, this is, I think, a
key metric if you're interested in tracking progress towards AGI is like

(09:50):
the successful, completion of, you know, tool
selection and then tool use because you chain those things together in a
coherent way, you get to much more general purpose capabilities.
And another thing to fly to is just to be specific about what's going on
here. We are no longer just relying on web lookup, which is
what, you know, cloud three would previously have had to resort to if you

(10:12):
needed to find out something about, I don't know, stock prices or
something like that. Right. So now there's a dedicated tool for that.
So you know, you have ground truth. The error that comes from the web
lookup stage is now hopefully no longer present.
You're relying instead on a firm tool, whose accuracy can be verified,
as they put it here at the source level. So, yeah, better for sort of the

(10:33):
verifiability, the accuracy of these, these models.
And, and then they have shown themselves to be capable of actually using
these tools, selecting these tools properly. So interesting.
Interesting. Next up here.

Andrey (10:43):
And just want to flag this so that the article doesn't quite mislead.
There is a legacy tool use format that has been
in cloud. So that is not optimized for cloud free.
And it's kind of, I guess less API
friendly. You have to provide a way to definition in the

(11:05):
prompt here. They added it as something, as
a separate input. When you make an API call, you can specify the
tools as one of the inputs.
Not in the, plain text.
So not entirely accurate that this is the first time this is coming
to cloud, but, very definitely expanded how it's used and made

(11:28):
it a little more, I guess cleanly formatted.
And interestingly, I've been testing cloud a bit
more, and the very API has evolved a lot since last year and has become a
lot closer to my eyes.
Yeah. So it's again, it seems like they are really, coming
out of just testing and towards trying to get people to adopt and use

(11:50):
cloud. And after a lightning round with some quicker stories,
the first one is building LMS for code repair from
Replit, and this is covering how they are integrating
AI tools specifically for code repair.
So basically fixing bugs.
Apparently it's using a mixture of source code and

(12:14):
natural language stuff. And the company
this tool will, aim to.
Yeah, pretty much fix issues that come up in your code.

Jeremie (12:26):
Yeah. It's this coming from Replit, of course, who've covered them a lot,
on the company, their CEO, Amjad Massoud, is actually so white, Y
Combinator company. He's really famous in the wiki ecosystem.
Just because Replit is such a kickass company in terms of its growth.
I think Paul Graham himself is, like, personally invested in them.
This is, also partly a play for AGI.

(12:48):
Like he's kind of indicated his interest in turning this into a bit of an
AGI play. And so what they're doing here is, is leveraging
a kind of data that they have that, you know, few other people have and
they essentially have these long kind of histories.
They don't just have the code bases of the people who develop on their
platform. They also have, like the history of, of the

(13:08):
edits that have been made to that, to that script.
And so they're, they're leveraging that in their training process.
It's, it's built off this model is built off Deep Coder, which is
sort of like open source instruction tuned, model that has been really
effective in the past, specifically for code.
So, you know, this is an open source thing in a sense.

(13:30):
And they are being quite open about the techniques that they've used, to
fine tune it. They used eight H100 GPUs.
They, you know, they tell you the, you know, all the, the kind of,
number of shots that they use in their, in their training process.
It's actually like fairly open and it reflects their interest in the
open source movement more broadly. So yeah, definitely an interesting

(13:51):
result. And we'll have to keep track of Replit as well then see if they
keep moving more and more towards the center of this, sort of like,
middle, middle range middle tier companies in terms of I think you're of
like Metta arguably leaving that pack.
But, certainly companies like, Mistral and and cohere
where it's like, yeah, they're building impressive models, maybe not on

(14:12):
the frontier, but definitely contributing to open source in important
ways.

Andrey (14:16):
Next up, early reviews of the human eye pin aren't
impressed. So we've covered the human eye pin, which
is this little wearable kind of square thing that you can talk to.
It also has a projector and a camera and costs $700.
So the idea here is like a new type of hardware that is,

(14:38):
AI first. And that can in some ways replace your phone.
Yeah, you can tell it, you know, to do stuff and it will use AI to
intelligently do it. It has released it started, being
sent out and people started reviewing it.
And the consensus, at least according to this article, is that
it is, maybe could you.

(15:00):
The more work it seems like it's often slow to respond.
It is a little bit buggy, so sometimes it seems to
do the wrong thing. If you, you know, tell it to play a song, it may
instead talk to you about its instructions regarding that.
If you ask it for a weather, it could take 10s to reply

(15:21):
things like that.
So in general, it seems like maybe not too surprising given that this
is a first generation and it's just rolled out.
Of course, the companies saying that there is a lot of updates slated to
improve, but I guess worth highlighting because they
are there is some excitement around these AI driven

(15:42):
devices as a new category of hardware.

Jeremie (15:45):
Yeah, we've seen the rabbit R1. We know OpenAI is working on potentially
their own thing with, you know, the Jony Ive project.
So yeah, not not, not the well, another entry,
let's say in, that long list or growing list of hardware projects.
I'm curious if this ends up taking off. It's a weird form factor.
It's not. It's not what I would have expected.
Apparently. You like. Do you ever watch Star Trek?

(16:07):
You're, like, meant to tap it like a communicator badge.
You know, that they have on their chests.
Basically, you just hit a button and then you ask it.
Whatever. It's a little bit, I don't know, it's a little bit unusual.
Doesn't strike me as the kind of thing that that I would design, but,
hey, that's why I'm not getting paid billions of dollars to do this.

Andrey (16:22):
So next up, microsoft's 365 copilot
gets a GPT four turbo upgrade and improved image generation.
And that is purely for story.
Microsoft now has, priority access to GPT four turbo for
business subscribers.

Jeremie (16:40):
For sure, and maybe buried in the sleet or the buried lede here is that
yeah, GPT four turbo is now outright OpenAI has put that out there.
It's it's got a lot of features that are similar to the old GPT four,
or the kind of prior version, it's got a 128,000
token context windows that hasn't grown, but, it does have a more,

(17:03):
up to date knowledge cut off. So December 2023, as opposed to April
of last year, which was the previous knowledge cut off.
So that's helpful. It's also been reported to have,
OpenAI claimed that it has like way, way better reasoning capabilities.
It's been reported to have somewhat better reasoning capabilities on
simple problems, on complex problems, more advanced problems.
That's where it shines, it seems.

(17:24):
And there have been people who've kind of suggested, hey, maybe there's a
little bit of metric hacking here where, you know, OpenAI is like getting
really used to the benchmarks that they're trying to hit and maybe over
focusing to some degree. Some people suggested that on those benchmarks.
And so you don't, as an everyday user, tend to see the value as much.
I don't know that that's necessarily the case.
It seems to me that the focus here really is on that more advanced

(17:46):
reasoning. And so for most practical use cases, you're not going to be
leaning on those abilities. Maybe that's why a lot of people just haven't
seen it materialize, in their kind of experiments with it.
But, yeah, on a benchmark basis, it does do better.
And, it's what OpenAI needed, right?
To climb ahead on all those leaderboards.
Cloud three opus came out, Google's Gemini nipping at a TLS.

(18:08):
Right. So it had to they had to find a way to get back on top.
Make no mistake, though, this is not the latest and greatest model that
OpenAI has, right? They internally absolutely are going to have more
advanced models that they are not yet releasing, that they're testing,
refining and so on. But if you if you think about what the situation
looks like for OpenAI, they're sitting back waiting for the next model to

(18:28):
kind of beat them on the leaderboard and then saying, all right, you
know, let's let's just go reclaim that number one spot.
That at least seems like it could very well be the play here, but, hard
to be 100% sure. That's at least my suspicion at this point.

Andrey (18:41):
And one last story for the section.
AI editing tools are coming to all Google Photos users,
and that's from a true extent of it to these features such as Magic
Eraser, photo Unblur, and Portrait Light that used to require a
subscription to Google One.
Now it will be coming to all Android users

(19:02):
and extending to more devices.
So I think interesting to highlight just because, this is
now one of the ways to compete in the smartphone space to provide
AI features, and more and more phones are starting to get, especially for
photo editing things that are AI powered.
Another two applications in business, starting with Google announces the

(19:25):
cloud TPU v5, its most powerful AI
accelerator yet. So a TPU with tensor processing unit
has been something that Google has worked on since 2016.
And as the title says, we are now on a V5,
which consists of 8960 chips

(19:47):
and now has the fastest interconnect yet at 4800
gigabits per second.
So lots of claims here, saying that these are faster than the
version for TPUs, featuring apparently A2X improvements
in Flops and three X improvement and high bandwidth
memory. So, yeah, pretty

(20:10):
clear that they are pushing forward in this front.
They do say that apparently Google DeepMind and Google research,
users have observed this, to speed up on
a large language model, training workloads, which is, of course, pretty
significant because a lot of that is presumably happening at DeepMind.

Jeremie (20:30):
Yeah. And this really is, a scaling play.
Right? So the, the v5 e, TPU and by the way,
just as a background TPU as opposed to GPU, right?
TPU is is kind of like Google only architecture.
The GPUs, the graphical processing unit, the TPU is a tensor processing
unit. It's designed explicitly for AI workloads from the ground up,
and it takes advantage of certain anyway.

(20:52):
So certain properties of the kind of like matrix multiplication process,
that just run a lot faster on their architecture.
The previous version, the, the so v5
e the Viper Lite was the previous was the kind of codename for it.
This new one is the viper fish.
So the distinction between them, in addition to the basic specs of

(21:14):
like interconnect end, and flop performance is the
connectivity, and scalability of that connectivity.
So, you know, Andre, you talked about this idea that connect to a part of
basically like almost 9000, other other units.
So this is a big difference from the previous version that could connect
to, you know, 200. Yeah. You could have 256 GPUs connect at a time.

(21:37):
This one you first connect to 64, you can connect up to 64.
And then with the rest of a pod of almost 9000.
So it's just got a lot more of an upper bound in terms of what it can
accommodate. And that's going to matter way, way more as Google
moves more and more in the direction of super, super large scale training
runs. Right. The big differentiator when you look at what Google has done

(21:58):
in its latest papers, I mean, they're it's there almost like a hardware,
not a hardware company, but like their their advances are
disproportionately driven by hardware, partly because of Google's
outrageous scale. They have way more scale than Microsoft.
Easy to forget when we look at OpenAI's progress, which is driven by
their access to Microsoft hardware. Google has way, way, way more.

(22:19):
So when you actually look at, like their most recent papers, usually
the big breakthroughs have to do with like, oh, we just figured out this
new way to connect way more GPUs than ever before.
Famously we talked about this on the podcast, but like connecting GPUs
from across different data centers and getting them to do training runs
together. So really is kind of a hardware focused effort.
They're not that it's not OpenAI and Microsoft, but, you know, Google

(22:41):
just does it at a whole other level of scale. So, this is a reflection of
that focus. And we'll see what kinds of training runs.
It actually empowers.

Andrey (22:49):
That's right. And I think the fact that DeepMind researchers have access
to this, just kind of made me think of like they
do also benefit from their ability to experiment at scale.
So, for instance, we've covered, recurrent architectures.
Griffin. And in these papers, usually they do have, you know,
2 billion parameter, 8 billion parameter models that they present results

(23:12):
for. And as we'll get to in a bit, they actually now have released a
model based on that research.
So having access to this very powerful hardware and being
able to run that training runs so fast with large language models
is a pretty.
Bit of damage for their R&D efforts, for sure.

(23:32):
And the second main story is also about how rare.
I figured we could combine the two.
So this one is about meta, and it has also
unveiled a new version of its custom AI chip.
This is the Meta Training and Inference Accelerator, or Matea,
and they have now a successor to the first version, which was,

(23:56):
from last year.
And as with Vcpu v5, of course, there's
a lot of claims here on its performance.
So, for instance, we say that it delivers up to three
times better overall performance can be compared to MTI, AV1.

(24:17):
And they say that this is especially good at
running and models for ranking and recommending display ads on
the platforms. Apparently they're not using this for training yet, and
they are just starting to roll it out in, 16 of its
data center regions.
So definitely not as far along as something like TPU.

(24:40):
But, now Showcase at Meta is continuing to push in this direction.
And, I guess also trying to have a vis edge of at Google
is, you know, one of the only players that really has a.

Jeremie (24:51):
Yeah, and meta kind of playing catch up here on hardware as well as
software. Right. Like you mentioned, the TPU at Google being much, much
more mature. I mean, they've been on this for, you know, the better part
of a decade now. You know, Microsoft as well as their Athena chip
that they're designing. This is basically all a bunch of people saying,
hey, whoa, wait a minute, Nvidia like, your profit margins are insane.

(25:11):
You're able to charge extraordinary prices for your GPUs.
That's starting to change, by the way, and for very interesting reasons.
But, increasingly, you know, there's a desperation
to find other choices and, and a sense
as well. By the way, we talked about this almost a year ago when ChatGPT
came out and we were we're talking about the margins and like, where does

(25:33):
the profit ends? End up getting stuffed in the stack.
Right. What are the actual parts of the generative AI stack that are
profitable long term? There's an argument that says it's not actually the
model developers, because that ends up being commoditized.
Certainly at the open source level, there's so much competition.
Like you're not going to make money by making the best open source model
for the weak, right people. The cost of switching between platforms is

(25:54):
just so high. And but but at the same time, like, you just you can't
justify. So so you're not going to appeal people over data quality at
that stage. And those advantages end up being defensible for such short
periods of time. And so a lot of, well, my own personal
thesis and I think this is now bearing out is, you know, the, the profits
to be made at the hardware level, the big heavy capital expenditure

(26:15):
bottlenecks are in, you know, semiconductor fabrication.
So I talk about that a lot on the show.
But also on the design of new generation chips.
And that's where meta is desperately trying to catch up again.
They're behind Microsoft. They're behind, you know potentially Amazon as
well. You look at Gowdy three that's coming out too. So a lot of activity
in this space and it's not clear who's going to win.
But if you're going to be a hyperscaler in the next, you know, five, ten

(26:39):
years, not having a homegrown, chip design
play is probably not going to be an option.
There's just too much profit to be made at that level of the stack.
So I think that's really where this starting to happen.
And then we're even seeing some of the chip developers, the fabs, like
for example, Intel try to venture into the design space as well.
So everybody's trying to climb on different levels of the latter to own

(27:00):
more and more of that stack. And I think we end up seeing a lot of fully
integrated stacks, within the next, well, a couple of years, it's going
to take a while for these things to get off the ground. But this play by
meta is just sort of like another one in a long series of Fang companies
making similar bets.

Andrey (27:16):
And moving on to Lightning Round. Actually, why not start with that story,
that you mentioned with Intel and a story is that it has unveiled its new
AI accelerator, VDI three chip, which is meant to
enhance the performance in training, AI system and inference.
They say it will be widely available in the third quarter.

(27:37):
So it's just about unveiling for now.
They claim that it will be faster and more power efficient than
the Nvidia H100, saying that apparently trains
certain types of AI models 1.7 times more quickly
and, running it inference, 1.5 times.

(27:58):
So some pretty big claims with H100 being one of the leading
chips, that people use pretty much every leading accelerator
so far. Until of course, when new Nvidia one
becomes the main one. So, yeah, Intel really coming
for Nvidia with this one. And it'll be exciting to see if they're able to

(28:19):
compete.

Jeremie (28:20):
Yeah I think this is a good opportunity to to call it and start to like
maybe build, you know, some of our listeners intuition about what matters
when you look at hardware. And these big announcements that say, oh,
well, we've got this thing. It's going to be better than the H100.
Yeah. With a big question you always want to ask is when will the
production run actually be scaled?
Right. When are we going to start to see Goudie three ships coming off

(28:42):
the production line in quantities that actually matter?
Because remember, a lot of this stuff is bottlenecked by semiconductor
fab capacity over TSMC.
How much? How much capacity has Intel bought out for that production run?
How fast can they get those, get the designs kind of finalized, shipped
then and then fab and packaged and sent back.
So right now, the problem that Intel has is by the time the 43

(29:05):
comes on the market, in a meaningful sense, in large enough quantities to
make a difference. You know, Nvidia will already have juiced out
the h100. They're already on the 200.
The B 100 is going to be imminently coming on the market.
You know, there's the window for profitability here.
Looks like it actually may be fairly narrow.
So even though yeah it's impressive Intel it if true they'll

(29:27):
be hitting, you know, pretty, solid performance relative to the H100.
It may not actually matter all that much by the time they can actually
get this to market. Still important for, Pat Gelsinger, the
CEO of, of Intel who's betting big on this kind of big hardware play.
Really, really important for them to flex those muscles and start getting
better at this. So this is a good step forward for them.

(29:50):
But, you know, they absolutely need to catch up.
Gelsinger, by the way, did not, give pricing for this new chip.
He said it would be very cheap. He said a lot below were his words the
cost of Nvidia's current and future chips.
Clank. So, you know, we'll have to see.
He said that they provide a really good answer and extremely good, he
said. Total cost of ownership.

(30:11):
This is the total cost of owning the chip over lifetime cost of running
it. You know, maintaining it, buying it in the first place.
That's the thing you really care about, right? What is the total cost of
ownership versus how much profit can I make from the chip if it's
running, you know, at a reasonable, reasonable level of, of use,
let's say, over its lifetime.
And, so the claim here is, yeah, this is going to be a really, you know,

(30:32):
dollar efficient, option, but we don't know how it'll stack
up. Not against the h100. But again, the relevant thing at that point may
well be the black. Well line it, you know, it may well be, something much
more powerful. So we'll just have to see.

Andrey (30:46):
Next up, Adobe is buying videos for $3 per minute to build
AI models. So apparently they're offering a
$120 to its network of photographers and artists to submit
videos of people performing everyday actions or expressing emotions.
They're. Yeah, asking for these short clips with,

(31:08):
people showing emotions and then different anatomy
and also interacting with objects.
The paper submission on average apparently will work out to
$2.62 per minute, but could also be more so
showcasing that Adobe is committed to building an AI

(31:29):
video generation model, and that they are still committed to their very
safe, conservative roots.
Or you could say tactical root of not leveraging potentially
copyrighted data, instead gathering data that they own and can
safely train on.

Jeremie (31:46):
Yeah, it's actually really interesting. They are implicitly placing bets
essentially against what OpenAI and other companies are doing.
Right. To the extent that OpenAI, for example, is just like training
straight off a YouTube, which was the allegation that was brought forth
that we talked about this on the last episode. I think, you know, if that
ends up not being kosher, well, then, you know, then

(32:07):
the advantage goes to Adobe. But if if, the opposite is true, then
Adobe's kind of wasting its resources here to some degree.
Not this is going to be a huge spend.
Yeah. Interesting that Adobe's doubling down on this.
Obviously, they were really, I think, the first company to bring in
indemnification offers for their users.
Right. Saying, you know, we are so sure that you're not going to face
copyright issues by using the output of our models because they were

(32:29):
generated on our own proprietary content, that we are going to defend you
in court if anything happens. So, you know, this is them doubling down,
presumably on that dimension.
They're trying to make that their differentiator. I think this is a
really good play. I mean, it's, it definitely is a legit differentiator.
They forced other companies to try to catch up and do similar things.
So yeah, these kind of pay to play data sets, is something we may

(32:53):
see more and more of in the future.

Andrey (32:55):
And speaking of data and getting access to data,
the next story is OpenAI. I transcribed over 1,000,000 hours
of YouTube videos to train GPT u4.
So this is, reportedly what happened.
This was just, bit of info that came out and
headline this said it all. Apparently they had to transcribe

(33:19):
a lot of YouTube videos to get a more useful data to train
with. And of course, if that is the case, Google stated that it
would be against. Their policies.

Jeremie (33:30):
Yeah. And we we call this out of the time that whisper came out, you
know, this famous, speech to text model that OpenAI put together.
But, you know, it was pretty clear at the time that this could well
be a strategy to kind of find new sources of data, to collect new sources
of text, data to collect, basically, as they were starting to run

(33:50):
out of the text, they could crawl on the internet.
It looks like this was kind of legally questionable behavior,
but OpenAI's position was that it was fair use.
This is all a gray area, so nobody really knows, the answers to these
these thorny legal questions about what is and isn't fair use in this
context. Notable, though, that OpenAI President Greg Brockman was
personally involved in collecting the videos that were used.

(34:13):
So this is very much like, you know, all the way to the top type thing.
Now, what was interesting about this, too, is Google was reached out to
for comment on this story.
And, you know, OpenAI claims that they respect robots.txt files, which
of these kind of subdomains on a website that tell crawlers like the one
that OpenAI presumably, or may have been using, that tells crawlers what

(34:36):
they can and can't do on the site. And the Google spokesperson said,
well, you know, both our robots.txt files and our terms of service
prohibit unauthorized scraping or downloading of YouTube content.
And so the implication is if that's what happened here, like
that is a violation.
So anyway, it seems like this, this may well have been, a thing, but,

(34:58):
Google itself has actually, it's worth noting, trained its models on
YouTube content. The, the spokesperson said this, they said it was in
accordance with the terms of their agreement with YouTube creators
anyway, which of course makes sense, but still kind of worth noting.
There's an asymmetry if that's the case. In terms of legal exposure here
at Google is able to use YouTube content OpenAI can't, or may not

(35:19):
be able to. They're doing it anyway, but they may or may not be legally
allowed to do that. So kind of an interesting, way in which the leader
sets with the AI training runs these days.

Andrey (35:27):
Yeah. And it it really makes me wonder, like, when will we finally
know? Wow. Yeah.
Because it's been going on for a while.
You know, we know that image generators, of course, use copyrighted data.
And I guess the legal process will drag on for a while with
all of these things. But, at some point, right, this question

(35:47):
will have to be answered of can you use is is the fair trade argument
legit? We still don't really have an answer.

Jeremie (35:55):
Yeah. Maybe lawyers who listen to the show. I know there are a couple at
least. Please let us know if you if you're tracking anything.
I know there's one, actually. Who, who who reached out to me.
I gotta get back to him, actually. But, he gave me a rundown on this a
couple months ago, and there's some cases that seem like they're starting
to kind of create that, you know, the precedent that

(36:17):
might establish the answer to these questions, but, that that was a few
months ago, so who knows. Kind of interesting track next.

Andrey (36:24):
Moving away from that sort of thing.
We have a story. Waymo will launch paid robotaxi service in Los Angeles
on Wednesday. So that's it.
They have been offering free tour rides in Los Angeles over the past year.
They received regulatory approval to expand to a paid service
just last month.
And they are going to start rolling it out to where over 50,000

(36:49):
people on the waitlist to use the service.
And they'll currently just cover a 63 square mile area from Santa
Monica to downtown L.A..
So yeah, exciting to see them starting to expand
the paid service to another city.
We have primarily been in San Francisco so far, and they are also testing

(37:11):
in Phenix.

Jeremie (37:13):
Yeah, you can finally go to LA Andre.

Andrey (37:15):
Yeah, because, of course I will not go anywhere else in Waymo.

Jeremie (37:19):
That's right. You're no human driver is policy is is intact.

Andrey (37:23):
A couple more quick stories.
This next one is OpenAI removes Sam Altman s ownership of its
startup fund. We covered this a little while back that I was reporting,
this weird ownership structure. Or Sam Altman was the owner of
the startup fund out of OpenAI.
Well, I guess after rap came to light, we decided to go ahead and change

(37:43):
that up. And that is okay.
So there was, filing with the SEC that
it is no longer the case.

Jeremie (37:52):
Yeah. This is like, you know, if you're thinking, hey, this seems super
weird and you've never heard of a situation where somebody
was just, like, put in charge of an entire fund, and apparently
the intent was not for them to just keep it, but like, they're just being
trusted to hold on to the bag until the situation could be worked out.

(38:13):
Yeah. That is that is really weird.
That's that's weird. I've never run into anything like this, in my
whole life in Silicon Valley. I mean, this is highly unusual, but
everything about opening is highly unusual, right?
They had this weird cap for profit structure.
This weird. Nonprofit board. So, you know, I'm not a lawyer.
I'm not sure what might justify this.
But part of me is tempted to say, like, this is.

(38:36):
Yeah, it just it seems like something that maybe should not have been
done this way. It's $175 million fund.
So large. But, you know, relative to OpenAI's, valuation, relative
to how much they've raised, relative to how much they make, not actually
all that big. But it has been now moved over to Ian Hathaway, who was a
partner at the fund since 2021.

(38:56):
This is according to that filing.
And Sam is no longer going to be a general partner at the fund either.
So it seems like like I'd be more curious to dive in specifically
to understand what is his exposure to the upside from those investments.
Is there any like how does that work?
But in any case, at this point, a spokesperson from OpenAI is just
saying, like, look, the general partner structure was

(39:20):
never supposed to be the way this went. Long term, it was always a
temporary arrangement, and that they hope that this change
provides further clarity, which, as ever, with the intrigue
of the OpenAI board or now the open AI fund.
It seems like further clarity is the last thing that the resolution seems
to bring. I think, you know, I don't know, I kind of feel like

(39:43):
for something that's important, a bit more clarity would be nice.
And, it seems like they're not keen to share, so I guess we'll just have
to see.

Andrey (39:51):
And, do projects in open source, with the first story coming back
to Mistral, one of our favorite releases of models.
And, this headline, I guess, is fairly accurate.
Mr. AI stuns with surprise launch of new mixed draw
eight x 22 B model.

(40:13):
Stuns is a big one, is a big term, but this is a
pretty big deal. We've launched this eight x 12 b.
So that's coming up on their previous model
which was I think 8X7B.
And so this is the biggest smile they've released so far.

(40:33):
It has 176 billion parameters, context
of 65,000.
And it of course will outperform the previous 8X7B
model, which was already, really quite performant.
And of course also llama too.
And in addition the model has an Apache 2.0 license.

(40:54):
So once again pretty much do whatever you want of it.
No restrictions for commercial use, no restrictions at all.
And so probably, you know, this just happened.
So we don't have too much information, but probably
a top of the line open source model at this point for sure.

Jeremie (41:14):
Yeah. It's, you know, Mr. Owl keeps doing this and coming out with like,
the next big thing. This is really impressive.
One thing to note. I mean, it is a big honkin model, right?
22 billion parameters.
For each expert, there are eight experts, 176 billion parameters
total. This is not the thing that fits on your laptop, right?

(41:34):
It's 281 gigabyte file just to download.
So you're going to need a lot of a lot of horsepower, a lot of actual AI
hardware to run this, which, you know, you get the usual, the usual
random questions like, what is open source?
Really? Right. Again, we talked about this last episode, but like, is
there a sense in which if you release a model that is so big that
no one can use it without, like, you know, tens of thousands of dollars

(41:57):
of advanced AI hardware? Like, how should we think about that fitting in
the open source, ecosystem?
I mean, this to me, the real I don't know, I think it's a little bit
pedantic, but it is being asked. I think the real answer here is like,
well, they made the model. What do you want? It's open source.
The weights are available. This does have, by the way, 65,000
token context window.

(42:18):
So this is this is a lot I mean, you could fit a book in that, that is,
I'm trying to think. I think that may be the largest, context window
available in the open source right now.
I can't I can't easily think of another, and it is comparable,
or actually outperforms, GPT 3.5 on a number of, of benchmarks.
So it's, we're barreling towards a world where we have GPT

(42:41):
four equivalent models out in the open and, where the context windows
are getting long enough that, you know, you start imagining these things,
doing some really impressive sort of task chaining.
I think that this is probably going to be another big bump for, you know,
the agent, agent like models and agent systems,

(43:02):
because that, you know, that task coherence length is indexed to context
window size to some degree, and certainly to scale and overall
capabilities at reasoning, which this thing very much seems to have so
impressive result. Mr..
Mistrial, as I like to say, they are French company.
A is yeah. Is is that it again?

Andrey (43:20):
Yeah, they come in strong.
And I just looked this up out of curiosity on the Hugging Face
leaderboard. Board and they are topping of the benchmarks.
You know, beating a lot of the other open source, models
like Quinn and Bass.
Yeah, etc.. So as before, aggressive format

(43:42):
model and one of the biggest ones, I think grok is still bigger,
if I remember correctly. But yes.
Yeah, I think that's.

Jeremie (43:51):
300, some billion if I recall. Yes.

Andrey (43:53):
Yeah. That's right. Next, some smaller models
now coming from Google.
So Google has announced some new additions to its,
GEMA family of lightweight open source AI models.
The first one is Code Jemma, which, as per the title, is meant
for coding of a trained on 500 billion tokens of

(44:16):
English data from web documents and code.
And apparently it performs quite well.
Doesn't beat the best models out there like deep sea
coder, but is quite fast to do
inference on and gets pretty good results.
Pretty good numbers. The more interesting one is recurrent Jemma, which

(44:38):
is an efficient model with the current structure of a linked
to this paper where we discussed a while back.
Griffin makes gated linear recurrences with local efficient,
local attention. And yeah, not too much information
on this one.
They basically say that it, has

(45:01):
the nice scaling property of being able to basically scale
linearly. So with throughput on different, sequences
goes on a lot slower than non recurrent architectures.
They do say that it doesn't seem as performant.
in the announcement we don't highlight performance especially.
So not to clear you know how well it performs.

(45:22):
But they are saying that this is released primarily for research
purposes and for people to build upon.

Jeremie (45:28):
Yeah. And again, you know that that focus on throughput really, really
important on a number of, of levels.
One of course, it's it's more of that kind of focus on hardware and like,
let's get, you know, these systems to be able to use our hardware as
efficiently as possible, kind of get those tokens pumped through really,
really matters in code, especially because when you're talking about know
agent like models that can do useful things very often, you know, a

(45:52):
coding model is the kind of model you're using for that, right?
You want to build apps, you want to do, you know, interact with websites
in certain ways. So the kind of robustness of the logic, but also the
ability to do inference really fast and efficiently by having a lot of
throughput, that's going to allow you to do more thinking,
if you will, at inference time, which is exactly what agents are all
about. Right? Agents are just a way to move compute resources from the

(46:16):
training phase to the inference to to inference.
Time to reallocate your budget.
And, and often give you a big lift as a result.
So getting really good throughput means you can do complex tasks in short
periods of time, which matters for things like user experience, and
anyway, and experimentation and all that good stuff.
So, yeah, really interesting that the recurrence play, something that

(46:37):
we've definitely seen before. And when you think about state space
models like Mamba, like recurrence definitely is philosophically
aligned with that. That's very much what those are about.
So, you know, all of these ideas seem to be bubbling up more
and more to the surface, both in terms of throughput but also like
context window, expansion and indeed all the way to like insert

(46:59):
context windows we may end up talking about.
So yeah. Very a very Google result here.

Andrey (47:04):
And just one more story in the section.
And this one is, pretty different.
They're pretty interesting. It's Aurora em, the first open source
multilingual language model. Red teamed according to the US executive
order, this is a 15 billion parameters, model trained
on English, Swedish, Hindi and so on.

(47:25):
Trained on over 2 trillion tokens.
So pretty significant.
Apparently it started off from Star Coder Plus and then trained
on some extra data.
And as per the title, I guess the big deal was that this was rigorously
evaluated across various tasks and languages.
Let me say that the model was fine tuned and human reviewed safety

(47:47):
instruction, and that apparently has aligned it with the
red teaming iterations and specific concerns articulated in the
Biden-Harris executive order.
Not too much. You know, we haven't seen people kind of calling out
the idea that their model has been, developed in accordance
with executive orders. So I guess we are trying to stand out with that,

(48:10):
pointer, and it'll be interesting to see if others like anthropic and
so on, will also start highlighting that,
capacity of their models.

Jeremie (48:20):
Yeah. No, you're absolutely right. Like, I get,
I get some, some vibes from this that I think I.
I think they really want us to focus on that.
US executive order red teaming piece.
The model itself. You know, it's nothing to write home about.
It's a 15 billion parameter model.
Malama two which by now is I don't know how many months old, but it's a,
you know, pretty, pretty old model. A 13 billion parameters outperforms

(48:43):
this 15 billion parameter model on on most benchmarks.
Right. But of course, that's not what this is about.
It's really about that red teaming piece, really about showing that open
source models can adhere to this kind of new regulatory
framework that's being proposed. So that's kind of interesting.
I will say, one notable omission,
you know, they do a whole bunch of evaluations for the what's known as

(49:07):
the Seaborn portfolio, chem, bio radiological
and nuclear risk. So interestingly, they call it the CBR.
They call it CBR, which I've never seen before. But Seaborn is usually
what this is called. So they have a bunch of tests there.
One thing to flag is you cannot, cannot, cannot do these tests, in the
open source thoroughly enough.
There are always like if you think about the level of, of, access, you

(49:30):
would need to actually know what the right evaluations even are to run
for nuclear risk, for biological weapons risk.
This is not the sort of thing that you either contend to know as an AI
developer, or certainly open source.
And if you did, that would come with its own risk.
So there's always this challenge when it comes to evaluating open source
models. And this is an open problem. Like how do you come up with evals

(49:53):
that are, that are safe to even publish.
Right. So, so, you know, like just as I guess, I guess as a baseline
background piece of information, these are going to be leaky in some
sense. Still an important contribution.
The eval dataset consisted of 5000 red teaming instructions, 4000
they pulled from anthropic, 1000 they made themselves and they cover

(50:14):
that whole smattering of concerns. But one interesting omission nothing
in these evals about self-replication or self propagation,
right? This idea that models may be able to, you know, replicate
themselves or whatever it's associated usually with the climate risk,
scenarios that, by the way, is in the executive orders.
The fact that it's absent here, I thought, was actually really

(50:35):
interesting in some sense. This is not a complete test, of
the it's not a complete red teaming according to the executive order, is
missing a fairly significant and important, set of, of components, so
not sure why that happened. Self-replication, propagation, much more
difficult set of evals to design and execute.
So, you know, maybe understandable. They want to wait for later or not do

(50:57):
it. But, I thought that was kind of interesting because it is the sort of
thing that, you know, will have to change over time if we're going to
start to full on adhere to these, executive orders.

Andrey (51:06):
One last interesting thing to mention is that this is coming out from a
huge group, a collaboration of people from across
33 different institutions, led by the Tokyo
Institute of Technology and by, a person at the MIT, IBM
Watson lab. So a real mishmash of universities,

(51:29):
groups and so on, I guess, contributing data and contributing to the red
teaming. Definitely. As you said, if you look at the paper, it
doesn't outperform Lama to 13 be on benchmarks.
It's not, you know, the top of line model in
as far as open source models go.
But still worth appreciating that

(51:52):
they put out a paper, they put this out in the open, and they
did kind of make a point of saying that they,
went through this route of red teaming.
According to the U.S executive.

Jeremie (52:04):
Order, it'd be great to see more stuff like this in the future for sure.

Andrey (52:08):
Yeah, and onto the research and advancement section, with the first
story being from DeepMind and I think one of the big ones
from last week, it is mixture of depths, dynamically allocating compute
and transformer based language models.
So we know about mixture of experts.
It's where you have a model that, as it does its forward

(52:30):
pass, can essentially pick different branches to go down and
say, you know, for this output I will have these
subset of weights be active and they will produce
the output. And this paper extends that in a
slightly kind of similar but different direction where instead

(52:52):
they say different inputs can choose to skip,
layers in the network. So instead of saying I will go to this branch
and not this one, that they'll say, I will skip this layer and
just, you know, use less compute to output.
And it works, you know, fundamentally similar to mixture of experts.

(53:14):
They, say that, you know, you can train,
model to select, you know, it ranks which tokens
to keep and which tokens to skip.
And that as we have mixture of experts, what you end up with if you
use this, especially if you combine it with a mixture of experts, is the

(53:34):
ability to get the same loss for less cost.
So using fewer flops you can achieve
lower cost. And so this to me seemed pretty exciting given how
significant my mixture of experts has been with things like mixed
draw. And, you know, allegedly GPT four seems like mixture of depths

(53:56):
can do that as well.
And which can be combined to have even more, benefit.

Jeremie (54:04):
Yeah. And I think it's probably again, I mean, you see
Google focusing so, so much on how can we optimize for the,
the use of our hardware, right. That like, and we're not seeing the same
publications coming out of OpenAI, probably just because this is being
done internally. We don't hear about it, but just the fact that Google
has so much hardware means and naturally they're going to orient towards

(54:26):
this, really, really hard. And it's it's an interesting result.
It's an important one. You know, alternatives to this historically have
focused on, methods like early exiting.
Basically you have you think of, of your input that gets fed to your
model and it goes through layer by layer by layer.
And then eventually for a given token, the model might decide, okay, you

(54:46):
know, it's not worth investing more resources into massaging this one
further. So I'm just going to route that straight to my output.
Right. It could do early exiting for that token.
In this case it allows you to essentially do like
instead of early exiting where you have to just decide binary like either
you continue down the rest of the model or to the next layer, or you

(55:07):
leave and you can interact with any of the future layers.
This allows you to skip specific layers, and they speculate in the paper
that this might actually be very desirable for scaling reasons that they
don't go into much detail on.
So kind of interesting, they do highlight that they're able to improve,
by, 1.5% on the essentially

(55:29):
what the log probability of the training objective.
So, the basically the, let's say the, the error or the, the loss
function you're using to train the system, by using the same a number of
training flops. So same amount of training compute and you get a 1.5%
improvement in the loss function, which is not necessarily tied in a
transparent way to performance, but it's certainly indicative, they're

(55:52):
able to train, models to parody with
other Transformers sort of vanilla Transformers, if they use the same
amount of training compute as well.
In a, in a context where they can save upwards of 50% of their
compute at inference during a forward pass.
So it makes it a lot cheaper to actually run, not just train.
So these are a lot of really interesting, advances from,

(56:16):
DeepMind. Obviously we saw their heavy duty mixture of experts paper
come out a couple of weeks ago. We covered it where, you know, the, the,
the, the token would be routed to a path like.
Anyway, it was like a more augmented version of mixture of experts.
This is instead breaking it down, as you said, Andre, layer by layer,
thinking of each layer almost as a kind of a submodel.

(56:37):
So they're definitely exploring a lot of, a lot of acrobatics with their,
their parallelization schemes.

Andrey (56:43):
That's right. I will say, one thing I found disappointing in the paper is
they didn't really compare.
They do cite and discuss, you know, related prior work and then
similar ideas. But there's no comparisons here.
They just, show results on using this exact technique.
So it's a little hard to say if similar ideas have been proposed that also

(57:05):
promising this. But of course, the exciting bit also is that they
are evaluating this at scale.
So they scale up to 3 billion parameters, train models and
do show pretty significant benefits.
So, you know, regardless of whether maybe something like this existed
or something similar, here we see that,

(57:28):
you know, this general idea seems like it potentially
could be used, together with mixture of experts to keep
scaling, which is, of course, probably what people intend to do.
And on to the next main paper, this one from Google, not DeepMind,
but I guess also Google.

(57:51):
The paper is leave no context behind efficient infinite
contact to summarize with infinite attention.
And that is the idea that we propose a way to scale
transformer based LMS to infinitely long inputs with bounded
memory and computation that is done by incorporating a compressive

(58:12):
memory into a standard attention mechanism that includes both
local attention over at memory and long term linear
attention, and they compare to some other
variants of this. So there was a paper called alchemy former also
from last year variant somewhere.

(58:32):
So people have done that with k and n lookups and variations
here. They I guess, are different in that they focus
on the compression every time step and they
show according to their evaluations that relative to these alternative
ways of achieving essentially the same thing, they are able to do better

(58:53):
on things like 500 K, length book summarization.
And they evaluate this with 1,000,000,008 billion parameter
atoms. So yeah, another, you know, exciting
and potentially useful way to vary the
architecture of a transformer coming from Google.

Jeremie (59:14):
Yeah, I actually thought this scheme was surprisingly simple.
And also I haven't heard people say this, but and
maybe it's wrong, I like I I'm just I don't like it reminds me a lot of a
state space model. Like there's a lot here that it kind of
philosophically is like that.
So you okay?
In a standard vanilla transformer, you take in your inputs,

(59:37):
you, let's say one, one input at a time, one sentence at a
time, one whatever at a time.
And you're going to, you know, do the, the train this thing to do text
auto complete on the sequence that you fed it essentially.
That's kind of the training process. And okay, that's that's all there is
to it. The model, the model's weights get adjusted as it learns more

(59:57):
context from doing that and then eventually gets good writing.
But each time it looks at an input, it is just looking at that input.
It's just looking at, you know, whatever the sequence is that's just been
fed to it. Or as they put it in this, paper, the segment that's just been
fed to it. In this case, though, it's kind of like if it's reading a long
document, it'll, you know, it'll read whatever

(01:00:19):
the segment is that, that it's been fed.
And then it, it proceeds to the next segment, it stores, it keeps a
compressed representation of all the stuff that's read before.
It kind of maintains that in a sort of memory.
It's again a compressed representation of the keys and queries, if you're
familiar with the architecture.
And then it combines that with the input from the current

(01:00:43):
segment that it's reading using a vanilla, transformer and
glues them together, basically concatenates those two things.
So you have the memory piece and you have the, immediate segment that
you're looking at. And then based on those inputs, you do your final kind
of prediction for the, the next word or whatever it is, you're the next
token, whatever you're predicting.
Again, like, this strikes me as being very state space model.

(01:01:07):
Like, you essentially have this explicit memory that's being updated on a
regular basis as you move from one segment to the next in the text, and
that gets combined with your, like, your immediate sort of short term
memory focus or kind of, causal memory focus, if you
will, if the thing that you're looking at in the moment.
So, yeah, I, I thought this is a really simple idea.

(01:01:30):
It seems to work really well. And no coincidence, I don't
think that this is coming out.
After Google Gemini, I started to learn about, like, these infinite
context windows. You know, we know Google has a research only version of
the transformer, some kind that can do up to $10 million and in fact,
more of context window size. So, you know, those two ideas

(01:01:52):
may be related, right? This may be the way that this is done.
We talked about another hypothesis that maybe this is a ring attention as
well in a previous episode. It's a bit unclear which you know, which
possibilities are being, are being deployed here, but I
actually so I saw this on there's a great video where I explain to covers
this at a high level. He doesn't go into the the architecture like we

(01:02:13):
just did, but, he, he highlights that one of the authors of this paper
is actually one of the authors of the Gemini paper as well, that came out
with a sort of very large context window, which sort of leads one to
suspect that this may actually be the thing that's powering those very
large context windows. Nothing known for sure, but but there it is.

Andrey (01:02:32):
Could definitely be related. And they do.
They are. Again, it's worth mentioning that this basic
kind of idea. Is not new.
Surveyors papers, for instance, auto compressors and RMT memorizing
transformers. Compressive transformer.
The idea of compressing, what you've computed so far

(01:02:54):
and using it, forward, down the pass is not new.
So I think the details are really more and how the compression
is done and the kind of specific math behind it.
But regardless, it's again coming from Google.
They evaluate at scale and they show pretty decent

(01:03:14):
improvements on the prior state of art.
So again, another kind of idea worth keeping in mind, this idea
of infinite compression and kind of adding recurrence
to our labs, which is not dissimilar from Griffin or Mamba,
as you said. Next up moving out to Lightning Round with some
faster stories. First one is octopus V2 on device

(01:03:38):
language model for super agent, and the gist is
that they have a method that enables and modifies model with 2 billion
parameters to outperform GPT four in terms of accuracy and latency,
and reduce the context size by 95%.
Yeah, it reduces latency in particular to level suitable

(01:04:00):
for deployment across various edge devices in production environments.
And, yeah, it's all about that on device lamb
kind of stuff.

Jeremie (01:04:10):
Yeah. And actually so when what they're doing here that's a little bit
different or one of the key differences is they represent functions
that they want this model to call on as their own tokens
that give them their own specific tokens during training.
So they're going to, you know, pre-train this model or take a pre-trained
model. In this case, they're taking, Google's GEMA, I think a 2 billion

(01:04:31):
parameter version of that model. And they're going to give it a little,
little extra training, a little bit of fine tuning.
They're going to train it on a data set in which tools
are given, again, their own tokens.
So when we talk about tokens, right, we're talking about usually parts of
words or whole words that are essentially part of the dictionary, the
fundamental list of, foundational entities

(01:04:53):
that this model is able to reason about.
Right? So it's, you know, syllables or characters sometimes, as
it was in the old days, and or whole words, whatever that may be.
So they're actually going to, instead of having the model like, spell out
the function name, for example, that it might have to call, they're going
to assign the whole function one token.
So this is one thing, one idea that the model has to hold on its head to

(01:05:16):
call on. And that gives it a level of concreteness that makes it much
more reliable. Like the the model no longer has to solve many different
problems at the same time. To call a function properly, it just has to
like reach for the right token instead of spelling out whatever the
function is or reasoning about what?
What? It's, yeah, what its sort of character is might look like.

(01:05:38):
And so, yeah, they end up integrating a, a two stage process
that is usually used by an agent to call a function.
So this isn't usually one pick the right tool.
Right. Usually using a classifier. And then second, generate the
right parameters to call that function to use that tool.
And normally that's treated as two different steps.

(01:06:00):
The first one you'll use like a classifier to solve for that.
And then the second one maybe you'll use like a language model.
They're integrating those two into one step thanks to this,
tokenize function calling technique.
They're able to generate the parameters for the function call at the same
time. And, allows the problem to be solved in a way that sort of
leverages advantages that only become apparent if you can hold the whole

(01:06:22):
problem in your head at once. So I was I was thinking about how to
explain this a little bit. One way to think about this is like sometimes.
When you're picking the right tool for a job.
You want to think a little bit ahead about, like, how you might
creatively use that tool and only buy it only if you think both of the
tool and how you want to use it. Do you fully have the kind of the option

(01:06:45):
priced out? Right. So like a, you know, a hammer may may
not seem like the right tool for the job, but if you think about a clever
way to use it, you might be like, oh, that actually is better than, you
know, pair of scissors or, or whatever.
So same idea applies these APIs.
You know, if you're thinking both of which API you're going to use, which
which function you're going to call, and the arguments you're going to
feed to it, you might kind of spot opportunities to use other functions

(01:07:09):
in unorthodox ways. So this leads to performance improvements,
that are, that are quite noteworthy.
There's also just like much better inference, inference time.
For these systems, they're able to just like get a lot more throughput,
with octopus than with, say, llama llama 7 billion or

(01:07:29):
GPT 3.5 using retrieval augmented generation.
So pretty impressive result.
And another a bit of a it's a simple tweak, right?
It's just this idea of like, let's instead of spelling out the function
names, let's give them their own token.
But if you're interested in doing AI agents, this may just be a quick way
to avoid, an, you know, small percentage, small fraction of the errors
that otherwise might compound over the course of a complex interaction

(01:07:52):
and cause your agent not to work.

Andrey (01:07:54):
Next up, bigger is not always better scaling properties of latent
diffusion models. So this is looking at the scaling of
particularly image generation which is nowadays usually down with latent
diffusion models. And there's various results in this paper regarding
scaling. The key one they highlight is that interestingly,

(01:08:16):
if you, sample images with the same cost.
So if you take kind of a same number of steps when accounting for the
model size, the actual image quality output
could be better from the smaller models.
So if you don't expend more, compute.
And when you use a larger model, the smaller models apparently are kind

(01:08:39):
of potentially more efficient in their computation.
And the paper highlights some potential ramifications in terms of being
able to improve efficiency, when, for instance, distilling
larger models into smaller ones.

Jeremie (01:08:53):
Yeah. Full disclosure, I have not read this paper.
Actually, I'm really curious about it. Just because, you know, scaling is
so central.

Andrey (01:08:59):
Scaling, of course.

Jeremie (01:08:59):
Yeah, yeah. That's it. So no curious to check it out.
And, and how specific it may or may not be to labs.
Like, I don't know if this is dependent on that architecture, but yeah,
something, something I'll definitely be diving into.

Andrey (01:09:13):
Yeah. It's a very empirical study.
They have, you know, a bunch of image outputs.
We do, of course, have, quantitative metric for image quality.
So it's a, you know, compared to something like language models,
I think in latent diffusion models and image generation, we haven't had
as much research on scaling.

(01:09:34):
So this has some interesting insights.
And of course, we, you know, show that as you scale, you
get, better outputs, similar to language models.
And moving on to the last paper, which we'll try to get through.
Correct. I might wind up discussing it for a little while.
The paper is many shot jailbreaking coming from

(01:09:57):
anthropic with a few collaborators, such as University of Toronto,
Vector Institute, Stanford, Harvard.
And the short version is they present
a new kind of way to jailbreak, to make their
language models do things that you don't want them to do.
So, for instance, language models are meant to not answer questions like,

(01:10:20):
how do I make math? But there are many ways to kind of get around
it and fool them into answering questions like that.
And this many shared shot jailbreaking approach is
pretty straightforward. Basically, you just start by
giving it a lot of examples of it's doing the wrong thing in the prompt.

(01:10:42):
So start by saying, how do I hijack a car?
Then you do yourself provides.
You answer, how do I steal someone's identity?
Provides answer. Eventually, after a lot of these,
you insert the actual question you want the language model to answer.
And because it has in-context learning, because it picks up

(01:11:02):
on the pattern of the prompt, it
will go ahead and be jailbroken and respond.
So they highlight this, potential way
to jailbreak, especially for larger models that are better at
in context learning. And they look into mitigation.

(01:11:22):
Training is, you know, somewhat effective, but really it's hard
to mitigate against because it is a factor of in-context learning.
So, and then you may need to do something a little more
involved, like classifying or trying to catch a prompt
rather than just tuning remodel.

Jeremie (01:11:41):
Yeah, I really like this paper, partly because I think what it reminds
us of is this complex dance and interdependency between
capabilities of an AI system and its alignment.
Right. Like this really does show us.
Yeah. Like it's just learning in context, like that's what it's supposed
to do. You try to train to refuse these, you know, dangerous prompts, but

(01:12:03):
then you give it enough examples of it, not refusing those dangerous
prompts, and then it'll just like, start to not refuse them again.
It's just learning in context, as you said, that's just capability.
And yet it manifests as a misalignment.
And so this tells us that there's something kind of deeply wrong with the
way that we're training these models, if we actually want them to behave
that in the way that we do. More training, as you said, is not the

(01:12:26):
solution. Right. What ends up happening is and they talk about this in
the paper, you know, you could take the approach to say, okay, well, you
know what? I'm just going to train it by showing it a long chain.
And they do like 256 examples of dangerous requests.
And then the execution on those dangerous requests and then,
on the next one, you know, put a rejection, have it say no, even at that,

(01:12:48):
I'm not going to give you, give you what you're looking for.
So essentially, like, you can try to say, okay, well, it's not resilient
to maybe three, shots of jailbreaking, like, if
I, if I give it three examples of, of it responding to a query that it
shouldn't maybe it it gives you the answer, then you're like, fine, okay.
So I'm going to train it by giving it, you know, three of those things.

(01:13:11):
And then the fourth make it a refusal.
But what ends up happening is you just keep pushing that.
You just have to go further and further and further. And sure, you might
do this with 256 examples, but then on the 257 to the 300 through
whatever, it's, you know, just give it enough in-context examples
and eventually it will pick up on the pattern and fold.
And that's really where they get into this idea that more training, more

(01:13:33):
fine tuning is maybe not the answer here, that seem not to work super
well. And they did have more success with methods that involve
classification and modification of the prompt before it was passed to the
model. So in a way you can interpret this is sort of a
patch, right? You're not actually solving the problem with the model.
You're actually bringing in auxiliary models to review the inputs.

(01:13:55):
And if you accidentally let one of those inputs through, the same problem
will kind of present itself again. So, yeah, I thought this is
fascinating for what it tells us again, about that connection between
capabilities. Like, don't get mad at your model if it picks up on the
freaking pattern that you put in the prompt, that's what it's supposed to
do. But like, we don't really know how to how to solve for that.

(01:14:16):
And, anyway. Oh, sorry.
A last kind of philosophical note there is.
We've also seen how there are there's a potential sort of
like equivalence in a sense, or exchange rate between the compute use
during fine tuning and the compute use at inference.
Given enough, of a long prompt, enough context,
the impact of that context may actually be fundamentally the same as

(01:14:40):
the impact of fine tuning. So it is almost as if people are fine tuning
out the, refusal to respond to certain queries.
That is one interpretation of what's actually going on here.
And if that is true, again, is a pretty fundamental thing.
If we want our prompts to not affect the behavior of the underlying
model, well, that's going to prevent it, prevent the model from
displaying any useful capabilities.

(01:15:02):
So anyway, fascinating paper big up anthropic for another
another great piece of work.

Andrey (01:15:07):
And moving on to policy and safety.
The first story is Schiff unveils AI training
transparency measure. This is about, Representative Adam Schiff from
California. And this is brand new legislation called
the generative AI Copyright Disclosure Act, which would
require organizations to disclose

(01:15:31):
whether they used a copyrighted data.
So they would need to submit a notice to the register of copyrights
with a detailed summary of any copyrighted works and the URL
for any publicly available material.
Apparently, the bill would require the notice to be filed no later than 30
days after the AI system is available to the public for use,

(01:15:52):
and it would also apply retroactively to AI systems already publicly
available. So that seems like a pretty
spicy bill.
That is being proposed here that would, you know, require
AI companies to disclose where they use it.
Copyright data, which, you know, for now is, is not at all the case.

(01:16:15):
And the register applied to publish an online database available to the
public with all the notices, so we would know
what copyright people use.
So hard to say if it'll pass, I guess, but.
Only seems exciting to consider what would happen
if it does.

Jeremie (01:16:35):
Yeah. We'll try. It's one of those. Careful what you wish for things.
Right. Because you were talking earlier about, you know, how courts come
down on this whole copyright thing. And.
Well, this is, I guess, one step in that direction.
Interesting to note that, you know, this does not necessarily say you
cannot use copyrighted, data.
Right. This is saying you just have to let us know, which is sort of,

(01:16:56):
let's say a step before that. So in that sense, you know.
Yes. Spicy Bill, in the sense that it is asking for something.
But, you know, I think realistically, given the the scope of capabilities
we've seen here, we literally just opened this episode with like an AI
generated, totally plausible sounding jazzy intro
song that, you know, might with it from a capability that could automate

(01:17:18):
away a crap ton of jobs. Sora.
You know, like, think about all the changes we have before us, and we're
almost certainly underestimating the magnitude of the changes that are
going to come.
There are a lot of spicy bills that ought to come forward if we're going
to, to sort of meet the challenge of these major changes that are going
to unfold to a society. So in that sense, you know, stuff like this, you

(01:17:39):
know, it, I think it's it ought to be considered in the context of that
broader technological change. I think superficially, this seems to
make sense. You know, at least have people disclose the training data
that they are using, or at least if that training data uses copyrighted
material. I don't see anything here that specifically says you have to
tell us about what your training data is, which itself would be a very

(01:18:02):
difficult thing to ask for, right? Companies are not going to be in a
hurry to reveal what the training data are.

Andrey (01:18:07):
That's, it's it seems a little ambiguous.
It says detailed summary of any, any copyrighted works used
and therefore any publicly available material that is in the
article. So it seems like maybe you would want to be
able to be a little.
Yeah, a little specific.

Jeremie (01:18:28):
Yeah. Okay. I was interpreting that as meaning, you know, a detailed
summary of any copyrighted works used.
So like if you're going to use the copyright essentially the tax here is.
Yeah. Look you're using copyrighted material.
You have now abdicated your right to do that in secret.
That's right. You know, you got to be open about that.
But to the extent that you're using openly available stuff that is non

(01:18:51):
copyrighted, then like totally kosher, you know.

Andrey (01:18:54):
Yeah. Yeah that's true. Yeah.

Jeremie (01:18:56):
Yeah. So like it's a bit of a battle but you're right.
Right. Like like this is I don't know how you, how you square the circle
around this stuff while respecting the copyrights of companies.
Well, while even just respecting the companies ability to litigate
it, like we right now are in a situation where, you know, The New York
Times or whoever it is, has to discover has to it has to find a clever

(01:19:16):
way to to demonstrate that, you know, some company has trained on their
data and that itself, you know, it's unclear if that's actually a
reasonable situation to put a publication like, you know, The Wall
Street Journal or The New York Times or Fox News or whatever, like,
should they really have to do the heavy lifting to prove that you did
this, let alone having to do the legal battle to establish whether

(01:19:39):
or not that is okay in the first place.
So I think it's it's interesting. It's also like, I don't know, I again,
not a lawyer, but for people to even know that they ought to be
considered as part of a, like a class action lawsuit.
If that happens in this context, like I would need to know that in fact,
my rights, my copyright has been infringed upon, or at least that my
copyrighted material has been used to even figure out if I should join

(01:20:01):
the class. Again, super.
Not a lawyer, but there is a sense in which a new kind of information,
could be made available to the public that that would be helpful for
people to defend their rights to, the material they make, you know,
hadn't read the bill. So I don't know if this is overall a good idea or
not, but just the high level picture.
It feels like the kind of debate that should be happening.

(01:20:23):
At least that's how it strikes me.

Andrey (01:20:25):
Next story kind of a spicy, dramatic one
that it was going to be exciting.
The headline is Lin Wei Ding was a Google software engineer.
He was also a prolific thief of trade secrets, say
prosecutor. So, apparently this person is facing
federal felony charges for allegedly stealing 500 files containing

(01:20:48):
Google's AI secrets and marketing himself as an AI expert to Chinese
companies. And this, of course, plays into
a larger question or, I guess, topic of,
copyright weft and intellectual property.
There is a bit of a history of Chinese companies going after intellectual

(01:21:09):
property of the US, for decades now, especially
when it comes to tech. So, this, plays into
that. And yeah, it's, I guess if you're Google, you have to be careful
now for that sort of thing, for your trade secrets.
Yeah.

Jeremie (01:21:24):
I think a lot of these, AI companies really are going
to have to mature quickly on the stuff, you know?
That. The reality is that they've gone from iterating on
prototypes for the last decade to all of a sudden they're building
artifacts of profound national security importance.
And that didn't happen sort of in a way

(01:21:45):
that would have been transparent to them.
People are unclear about how important these things are from a national
security standpoint, but as far as China is concerned, they absolutely
seem to be, and they're, you know, indications that China's trying
to exfiltrate models and things like that.
So, you know, this is something, something to be concerned about.
The backstory here is kind of interesting.
You know, when we're digging himself, spent months, it said at a time in

(01:22:09):
China, despite living in the US and supposedly being full time as a
software engineer in Google's San Francisco area offices.
And he apparently had people like doing the equivalent of, putting in,
like, punch cards for him so that it would seem like he was actually at
work while actually being, you know, overseas doing, doing whatever else.
So he stole 500 files containing, as they put it, some of Google's most

(01:22:32):
important AI secrets. I tried to dig this up.
I seem to remember a tweet from a Googler saying, hey, you know, I
want to clear the air on this.
Lin Wei Ding was actually, I don't know, sharing some, like, internal
stuff about, I don't know how we're thinking about.
I don't know if it was more on the ethics side or whatever, trying to get
input from Chinese companies for some reason.

(01:22:54):
Anyway, it seems, at least based on this article, the claim seems to be
that that is not the extent of what was stolen.
There seems to be some, like, important technical secret sauce there.
And so his home was searched by the FBI.
And it was apparently just days before, he was about
to board a one way flight to China where he he was arrested at that point

(01:23:16):
in March and their federal felony charges.
So, yeah, this is, strengthens the argument that certainly,
I have made that a lot of people in the space have been making that
progress at Frontier Labs, to the extent that it is not secured,
is Chinese progress.
We've heard this from folks at, top AI labs when we speak to them,

(01:23:39):
where they're talking about the security situation there often, often
comes up. There's a running joke, as we said in our report, actually,
that, you know, one of these labs is like China's top AI lab because
probably their stuff is being stolen all the time.
That's what we were hearing. Not always a great sign.
This certainly doesn't help, in terms of, in terms of that narrative.
So, yeah, we'll we'll have to see what the next moves are, what the

(01:24:00):
evidence is that surfaced. Oh, yeah.
It's a story that there's one last thing I want to highlight here.
A bit of a quote in the article says the indictment said Google had
robust network security, including a system designed to monitor large
outflows of data. But Deng circumvented that by allegedly
copying data from Google source files into the Apple

(01:24:21):
Notes application on his Google issued MacBook laptop, converting
the Apple Notes into PDF files and uploading them for the, from the
Google network into a separate account. So this is not, you know, this is
not rocket surgery, right? Like this is a this is just like what you
would do to get away from using Google products, Google networks and so
on. Fairly intuitive. And there were apparently was not, a measure put in

(01:24:43):
place to prevent this from happening. Not the kind of thing you would
expect from, you know, nuclear security, you know, chemical,
you know, like biological, chemical weapons research facilities, things
like that. To the extent that we think these things may be on a
trajectory like that, which, you know, Google certainly seems to think
publicly, you know, suggests the need for these sorts of things.

(01:25:04):
These are not bad actors. It's this is a very challenging and thorny
problem. But, but certainly means that there's a lot of, growing up that
needs to happen internal to Google.
And I'm sure, a lot of the frontier labs, you know, to, to deal with
these kinds of risks.

Andrey (01:25:19):
That's right. And in the article they go a little bit into
what was seemingly stolen. They have a professor comment on it,
and apparently the technology secrets dealt with the
building blocks of Google's supercomputer data centers
and had a mix of information about hardware and software, including

(01:25:40):
potentially chips, which of course would be very dramatic.
So, yeah, still, early on, it's not clear if
the data, or secrets were actually distributed.
It's not clear if, the person will be,
in jail for a long time. Apparently, he could face up to ten years in

(01:26:01):
prison. So, yeah.
Definitely, dramatic story and highlighting.
Have, what, a big deal. I bet this sort of stuff happens.

Jeremie (01:26:11):
All right, moving on to our lightning round. We have responsible
reporting for frontier AI development.
This is, an AI policy paper, coming out of, a research collaboration
that includes, folks from the University of Toronto, the Vector
Institute, which are kind of linked together, Oxford and the center for
Security Emerging Technology. Also Google DeepMind, MIT.
It's a really, really broad group.

(01:26:33):
Some very, very respectable researchers there.
Leonard Heim I see they're on the the list of names, too.
Gillian Hadfield, Market Central it's a really a very knowledgeable folks
looking at what are the reporting requirements that we ought to set up.
When we think about advanced AI development with increasing levels of
risk, trying to figure out how to get information from AI developers

(01:26:54):
in a way that balances the need for intellectual property protection.
But, you know, also this need to inform policy and regulators,
policymakers and regulators so they understand what's going on and have
the full risk picture. So they have this great table where they summarize
the kind of data that they think ought to be reported, the category of
people who should be receiving it. And there's a lot of finessing and

(01:27:18):
trying to trying to figure out what the borders and boundaries should be
between, sort of like the, you know, who should be able to have access to
what obviously a lot of sensitive information.
One of the things that, you know, that they say here is that
sharing the models themselves, I thought this was interesting.
They don't actually propose that. So they're proposing reporting on a

(01:27:38):
wide variety of different things, you know, risk assessments, you know,
ideas about, anticipated applications, current applications, that sort of
thing. What they don't propose is actually sharing the models.
Yet I like I find this interesting.
I'd be curious to to hear why specifically they went in that direction.
Obviously very high IP bar, but our own research suggests that that may

(01:28:00):
be something that, you know, that you actually ultimately do need.
This may be a stepping stone, but as they put it themselves, you know,
aviation regulators are authorized to conduct sweeping investigations of
new aircraft technologies, while financial regulators have privileged
access to cutting edge financial products and services in order to assess
their anticipated impact on consumers and markets.
You know, the fully analogous case here, I would think would be to allow

(01:28:22):
regulators to directly read team models themselves, and maybe even hand
over, model weights temporarily in a secure setting.
Obviously very different risk profile associated with that, a lot of IP
risk. But I'm always interested in like where people draw that line and
I'm sure they'll have very interesting reasons for that. But, just wanted
to fly that. It's kind of an interesting, interesting question.

(01:28:43):
And the last piece I'll mention here is, they look at a mix of like
voluntary measures and regulatory measures.
So what would it take if it's just voluntary?
What do you think you can get these labs to agree to just on a voluntary
basis. And, you know, they look at things like, okay, we'll disclose
information only to developers or only to government.
Don't share it with other developers, obviously for IP reasons.

(01:29:05):
Have this was interesting, have anonymized reporting to avoid
reputational risks. Right. If you have a frontier lab that comes
out and says, oh shit, are thinking like help you, I don't know, design
bioweapons or something, there's reputational risk that is borne then by
that entity. And they may not be keen to just take that on.
Right. If it's a purely voluntary thing.
So, you know, maybe you guarantee a anonymized reporting and just say,

(01:29:29):
hey, a developer has flagged this.
And then on the regulatory side, anyway, if you can have this be
formalized in regulation, what happens?
They look at what consequences to bring in if, labs don't report
and, anyway, safe harbor measures if, if they do report dangerous
capability. So like you, you can't basically be more liable or have

(01:29:50):
more legal exposure for reporting these things.
You want to encourage that to actually happen.
Anyway, I thought it was a really interesting paper. If you're into AI
policy and the catastrophic risk piece, I thought it was nicely thought
out, well-organized and yeah, but all around fun read.

Andrey (01:30:04):
Yeah, not much to add to that. I think it's, it's a good summary.
So next story is U.S government wants to talk
to tech companies about AI, electricity demands and AIS, a
nuclear fusion and fusion.
So yeah, apparently the Biden administration is seeking to expedite
discussion with tech companies about their soaring electricity demands for

(01:30:26):
AI data centers.
The Energy secretary, Jennifer Granholm, has highlighted the
increasing needs for this, saying that AI
is not the issue, but, that regardless,
there needs to be something. And apparently the Department of Energy is
considering the idea of placing, quote, small nuclear plants near

(01:30:48):
tech companies, large data centers.
So, kind of, you know, we've seen before some
considerations like this, like Microsoft, have has already invested
in nuclear fusion. Yeah.
So definitely more, you know, still a very tentative
kind of conceptual thing.
But an interesting thing to note is that potentially this could be

(01:31:12):
something that ends up being needed for AI is straight up
nuclear power.

Jeremie (01:31:18):
Absolutely. And the big challenge, right, is, is renewables offer you
such high variability in terms of power output?
You know, is the wind blowing, is the sun out, that sort of thing.
Whereas the requirements of large training runs are very high baseload
power. Right. They just need like, they eat a constant
amount of energy over whatever the period the training run is going on

(01:31:39):
for. So, that means that you need a very, very powerful
and consistent source of power.
And nuclear is exactly that.
And that's part of the reason why, you know, these big data center
infrastructure bill that's are often being paired with, you know,
concurrent build outs of nuclear energy and other things.
Right now, power, like baseload power, is

(01:32:02):
the key bottleneck in the West.
It's not in China, right? China has tons of power infrastructure that
kind of looking at the opposite problem, where they're bottlenecked by
chips, because the export controls that the US and other countries have
imposed. But in the US, the issue absolutely is, is energy.
And that's why Sam is so focused on fusion.
It's why he's invest in Helion energy. It's why he's, you know, pushing
so hard for all these things. But, yeah, always interesting that, like,

(01:32:26):
the bottlenecks don't have to be the same in different geographies.
And ultimately the bottleneck is the only thing that's preventing you
from moving on to that next level. So the main thing you want to focus
on, if you want to scale more and in this case, energy just so, so
important.

Andrey (01:32:39):
Next up for on to a legal thing in Washington state judge blocks use
of AI enhanced video as evidence and possible first of its kind
ruling. So this related to a man accused of
shooting outside a Seattle area bar in 2020 are, 2021.
And the lawyers sought to introduce a cell phone video enhanced

(01:33:01):
by machine learning software.
And the, prosecution argued that enhanced video
predicted images rather than actually reflecting the original video and
called it inaccurate, misleading and unreliable.
Therefore, yeah, the judge blocked the use of it.
And potentially this could have bifurcations for, you know,

(01:33:25):
being able to use machine learning to enhance the clarity of videos
or photographs in, cases in the future.
Yeah.

Jeremie (01:33:33):
I'm always like, I'm obviously not a not a lawyer and don't know much
about this stuff, but I'm a bit of a legal nerd in terms of these
precedent setting cases.
And I thought the backstory here was kind of interesting.
So there's you've got a guy who's accused of having opened fire
outside a bar. This is in Seattle back in 2021.
He killed three people, wounded two.

(01:33:53):
And he wanted or his lawyers wanted to introduce cellphone video
evidence that, yeah, was enhanced by AI software.
This whole confrontation, by the way, was captured on the the video.
Right. So they have the original video, but they're trying to
enhance it in a way that, you know, the I'm sure the defendant would say

(01:34:14):
provides further context, but the prosecution is saying, it's
made up context, it's generative AI.
And so anyway, apparently the defendant turned to, a
guy who previously had no experience handling criminal cases but had a
background in creative video production.
And, so he used this tool by,

(01:34:35):
by Topaz Labs, who I'd never heard of before.
They say that they help people like creative professionals supercharge
video. That's sort of the generative AI play that they're doing.
And interestingly, Topaz Labs itself said that the basically
like, don't use our stuff for this shit.
Don't do it, don't do it, don't do it.

(01:34:56):
And what he did, what they did, the defendants, as they said.
Yeah, yeah. What is it? What does that do it, do it.
Definitely do it. Yes. And they went ahead and made that case and they
said like, yeah, I know Topaz Labs are saying don't do it.
And then you shouldn't rely on their stuff, but you should rely on it.
You should because we say so. And so the prosecutor's office, apparently
is making the case that like, hey, this these enhanced images are, as

(01:35:17):
they put it, inaccurate. Misleading and unreliable.
Yeah. So so ultimately, that's that's where things stand.
The judge came out and basically said, look, this technology is novel.
So that's always an issue right? In, in when you're setting legal
precedent and it relies on, quote, opaque methods to represent what the
AI model thinks should be shown.

(01:35:37):
And as a result, kind of threw this out as, sort of black box, move
stuff. But it's interesting.
This is a really interesting question. You know, how do you assess the
the truth value of generative AI stuff?
Right. If it's just like a Bayesian model, we're all trying to do this
Bayesian inference. That's what a legal proceeding kind of is.
We're all trying to decide, like, does this meet our, you know, beyond a

(01:35:59):
reasonable doubt threshold or balance of probabilities threshold,
whatever the case may be. But but like arguably a model is kind of trying
to do the same thing in a technical way.
Still, as, as, my dad would often say there's a reason.
There's a reason that the threshold of probability in a criminal
proceeding is framed as beyond a reasonable doubt, and that they don't

(01:36:19):
just say, like, you got to be more than 90% sure, 99% sure.
There's deliberate ambiguity there.
And so you can't necessarily just have an AI system.
Yeah. Math out what the odds are of a certain thing.
I guess that would be contrary to the spirit of this.
I'm done nerding out. I thought, this is a really interesting story.
And, yeah, don't go shoot people and try to use generative AI to cover

(01:36:41):
your tracks.

Andrey (01:36:42):
Yeah, now we know. Don't do this.

Jeremie (01:36:44):
No, no.

Andrey (01:36:45):
And while the story. Trudeau announces 2.4 billion for AI
related investments.
This is, of course, the Canadian government and the 2.4
billion, most of it will go towards providing access to computing
capabilities and technical infrastructure.
2 billion and then another 200 million will be

(01:37:06):
dedicated to promoting the adoption of AI in sectors like agriculture,
health care and clean technology.
And then there are some other, details as to where the
rest of that 2.4 billion will go.
There's also, Bill C 227
involved here, which is apparently the first federal legislation

(01:37:28):
specifically aimed at AI.
And we'll update privacy laws and introduce new applications for high
impact systems.
So yeah, Canada, we don't, I guess, often talk about it.
But there are some, you know, pretty significant research institutes for
sure out there. And, the government

(01:37:49):
is seeming to want to push that forward.

Jeremie (01:37:52):
Yeah. Bill C 2017, the AI and Data Act, which is contained therein,
is kind of like Canada's attempt to do something like the EU AI act,
but it does have a bit more bite in some, some interesting ways.
So anyway, it's it's a whole rabbit hole.
I think we might have talked about it previously on the podcast.
The, the big thing to me was actually a small thing buried

(01:38:15):
in this $2.4 billion. There is a plan to launch a $50 million
AI safety institute to protect against what it calls, quote,
advanced or nefarious AI systems.
And you know, who was at the announcement of this thing was Yoshua
Bengio. So I suspect the government's going to look to tap Yoshua to,

(01:38:35):
maybe oversee this. You know, he certainly has concerns around, loss of
control and weaponization of these systems.
He's emerged as the sort of like, de facto consensus,
expert guy internationally for, you know,
like Rishi Sunak tapped in to help lead their sort of build consensus
operations internationally on AI risk.

(01:38:56):
And, it's interesting to see him implicitly endorse this with his
presence there. So kind of cool.
$50 million for an Ice institute?
No, no, not a ton of money. But still, it means Canada joins the US
and UK in having its own AI safety institute, if in fact, this does go
forward. So I think that's kind of noteworthy.
You know, it was part of the vision that I think, I think it was Gina

(01:39:16):
Raimondo. The Department of Commerce said, you know, we want to have this
network of AI safety institutes around the world, and this certainly
would be consistent with that vision.

Andrey (01:39:26):
Just a couple stories left.
One in synthetic media and art that is worth highlighting.
We have a story about Billie Eilish, Pearl jam, Nicki Minaj and
many others. 200 artists called for responsible AI
music practices. So we signed an open letter
issued by the nonprofit Artists Right Alliance that,

(01:39:50):
called for organizations to stop using your why
AI in ways of and fringe and devalue the rights
of human artists. So in particular, we, have concern
on AI. ML is trained on unlicensed music,
which, they say is unfair.
So yeah, pretty significant statement there from very

(01:40:15):
famous and notable musicians coming.
Just before what we started with.
All right. And sudo. So, bit of, awkward timing there in a sense,
but certainly seems like what happened
with image generation. Yes.
Like two years ago is now happening with music generation.

(01:40:37):
And, in it to be interesting to see because in music
there's a bit more organization, a bit more, you know, money and fame
in general. How that plays out compared to what happened with image
Generation.

Jeremie (01:40:51):
Yeah. It's also just like, you know, this this last bit where they're
asking folks not to develop or deploy AI, music generation technology
that undermines or replaces human artistry or denies fair compensation
for artists work like, I don't I don't see how
you can ever do that and have these things be useful at all.

(01:41:12):
So like I this amounts to, I think, a plea for these things
just not to be used or certainly not to be open source.
That's guaranteed, right? There's no way that you can open source models
and be consistent with this. But then, yeah, more generally, even for
paid things like how would I? I don't even know how I would do this
without implicitly taking away an opportunity for an artist.
So, look, it's a thorny issue, and everybody's trying

(01:41:35):
to to solve these problems. You know, we had to we had to deal with the,
the WMD version of this with our, our action plan.
And this is the music industry's version of this.
How do you square the circle?
Hey, I wish I had the answers. In the music industry, this is pretty,
pretty tough.

Andrey (01:41:52):
And doing a highlights one fun story before we close out.
This one is open. I saw I just made its first music video, and
it's a psychedelic trip.
So there's been, quite a few examples of Sora being released in
past weeks. Opening. I started shopping around and we saw some fun
that short films.

(01:42:14):
This one is a music video to the song World Weight
by August Camp, created entirely by Sora.
And it's, just, a series of short
clips, with various kind of surreal elements and environments,
sort of, there's no narrative to it is just imagery,

(01:42:37):
accompanying this kind of ethereal music.
So in a sense, we fit well together.
Some sort of magic video. You could see it working in this sense,
and it's.
Yeah, it's I think to me an interesting example of, you know,
a useful application of video generation where you could

(01:42:59):
use it especially for B-roll and for things
like music videos with kind of a loose, no narrative, no
sort of overarching, visual content, but,
a mixture of clips that I can then generate for you.
And, you know, it is worth highlighting that.
Yeah. Now you have a music video with a bunch of very cool imagery that

(01:43:22):
would have been much harder to develop otherwise without Sora.

Jeremie (01:43:25):
Yeah, absolutely. And I think you're absolutely right on the B-roll
piece. You know, I'm thinking about the handful of times I've had to use
B-roll or, you know, going to Pexels or whatever, you know, whatever the
websites are now, and the level of specificity
that you're often trying to get for the B-roll that that ends up being a
big block or. Right, you're trying to find, like, yeah, somebody's using

(01:43:45):
their their right hand to grab a drone and do such and such.
And this is exactly the sort of thing that allows you to bridge that gap.
Right? It's it's all sort of, it's not out of distribution.
It's it's very much, you know, it's been trained on a bunch of different
things, and you often want to just combine those elements.
The combinatorics is the big problem for, for, sort of free to download,

(01:44:06):
B-roll footage websites. So, yeah, I think this is a big challenge for
those platforms. Like, how do you like what's the response?
You know, do you have your own generative video thing?
And then is that a race to the bottom? I don't know, but new, new era for
that.

Andrey (01:44:21):
And with that, we are done with this latest episode of last
week. And I once again, you can find articles we guest house here today at
last week in that I you can also reach out and give us
comments or feedback at contact that last week can I or hello
at Gladstone that I was are in the episode description.

(01:44:42):
As always we appreciate you listening and sharing and rating.
And to be sure more than anything to keep tuning in and
listening.

All Episodes

#162 - Udio Song AI, TPU v5, Mixtral 8x22, Mixture-of-Depths, Musicians sign open letter

Episode Transcript

Popular Podcasts

Stuff You Should Know

24/7 News: The Latest

The Joe Rogan Experience

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}#162 - Udio Song AI, TPU v5, Mixtral 8x22, Mixture-of-Depths, Musicians sign open letter