Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Andrey (00:05):
Hello and welcome to the latest episode of the Last Week in AI podcast,
where you can hear us chat about what's going on with AI.
As usual, in this episode, we will summarize and discuss some of last
week's most interesting AI and use, and you can also check out our last
week in AI newsletter at lastweekin.AI for a bunch more
articles that we will not cover.
(00:26):
I am one of your hosts, Andrey Kurenkov.
I finished my PhD at Stanford last year and I now work at a generative
AI startup, and this week we do not have our
usual co-host, Jeremie. He is on vacation, but we do
have a wonderful guest co-host.
Jon (00:44):
Hey, this is Jon Krohn.
I think this is my third time guest hosting on the show.
Andrey (00:49):
I think so, yeah
Jon (00:50):
But the previous two times I was co-hosting with Jeremy.
So it's the first time that you get Andre and me, and I'm excited for this
one, as regular listeners will know or people who heard me
in those previous episodes.
I am a diehard listener to this podcast, and so
it's such a treat to be able to come on here, in addition to spending
(01:11):
as much time as I can every week listening to this podcast, I also host
the Super Data Science podcast, which is, as far as we can tell, the
world's most listened to data science podcast.
And, that's more of an interview show than a new show if people
are interested in that kind of stuff. I'm also co-founder and chief data
scientist at a machine learning company called Nebula.
So, yeah, so that's me and Andre.
(01:33):
Should we should we talk about the beers now? This reminds me of the
podcast that I used to love, and it inspired me to get into podcasting.
About data science was a show called Partially Derivative.
And on that show, partially derivative.
They always were drinking beer on air.
And so they would open up every episode talking about what beer they're
(01:55):
drinking. And so in today's episode.
The sound of me opening, my beer, which is but we
have a bit of a twist here. This is kind of like, I don't think I even
qualify as a millennial, but it's like a millennial twist on having a beer
on air because we're drinking Athletic Brewing Company beers, both
(02:15):
of us. And these are basically nonalcoholic.
They have half a percent of alcohol, per beer.
Andrey (02:22):
Yeah. It just so happened that I had one open, at my desk and,
John saw it and also got one out.
It seems we're both fans. I wish we could be sponsored
in. This would be an ad, but it's not. It's just some fun little detail
for our listeners.
Before we get going with I, we are enjoying
(02:43):
some nice, like, beer to keep us hydrated as we talk for
an hour and a half about all the latest I news.
Jon (02:50):
If if I can drag this on for like, one more second.
Andre. What flavor? What's the, particular kind of athletic brewing that
you're enjoying today?
Andrey (02:56):
I got We Run Wild IPA, my.
Jon (02:59):
Favorite.
Andrey (03:00):
Variant. And, yeah, you know, it's I'm not sure if you could call
this healthy, but it's certainly better than soda or whatever
else you want, and it's. I really like it, so.
Jon (03:12):
Yeah, I think it's delicious. I have this, Emerald Cliff's, which
is a limited edition one. If you can still get this online, I highly
recommend it. It's supposed to be, I guess, like a Guinness.
And it's absolutely delicious.
I love it.
Andrey (03:24):
And just one more thing. Before we get into the content of a
show and all venues, as usual, I do want to call out and thank
a couple of reviewers.
We got a couple new ones over the last few weeks on Apple Podcasts.
There's al assume that said great podcasts
and enjoys it a lot was lone Wolf are also
(03:46):
saying great podcast and has some useful bits of feedback with
I think Jeremy sometimes repeats some details.
I say just because we say so much.
So thank you for the feedback. We'll try to incorporate it as we go
forward and we'll far. And I guess, John, I, I
try.
Jon (04:05):
To I'll try not to be such an interesting one because if you paraphrase
the important points, it can end up really making it easier for listeners
to, to apprehend that, particularly important information.
So it's a tricky balance to strike, I think, as a podcast host.
But good feedback for sure.
Andrey (04:22):
Yeah. All righty. Well, let's dive into venues starting
as usual with the tools and apps section.
And the first story is about AI generated
music yet again, the kind of one of the highlight trends
of a year, it seems. And this time it is coming from 11
labs, 11 labs.
(04:44):
If you have been following it for a while, you probably know is a
leading source of, AI generated voice.
You can give it text and get really good, high quality voice outputs.
And now they have previewed, they have given us a.
A sneak peek of their work on AI generated music via
some posts on Twitter or ECS.
(05:07):
So, not quite out yet.
Not clear when it'll be out, but they did post a few examples
and the examples I listen to them really clean,
really convincing results where, you know,
we now have two big players.
We recently discovered a new entry with UEO.
(05:28):
There's also now 11 labs, seemingly also entering into a space
or soon to be entering into the space of music generation
with yet another model that is like when I listen
to it, it's hard to tell it's AI generated.
So yeah, we are very much in the realm of generating
(05:49):
full songs that are indistinguishable almost
from AI music or human music.
Jon (05:57):
With AI, Yoda was the vendor that you guys used a few weeks ago to change
up your theme song, right?
Andrey (06:02):
That's true. Yeah.
Jon (06:04):
I was surprised that you guys haven't continued to have different theme
songs every week. I guess it would kind of take away from the branding of
it.
Andrey (06:10):
Yeah, I don't know. I have a soft spot for our current theme.
I kind of just like the sound of it, and it could be fun.
Maybe. Maybe even outro music.
I every time.
Jon (06:21):
I do like your intro music, it makes me.
It feels like I'm being lulled into calm
by an AI system that is going to take over the world.
And my brain, it's like you're just being gently, just relax, don't
worry, AI is coming.
It's going to destroy everything. Just relax, I love it.
Andrey (06:40):
Yes, it's a good, kind of prepare for the onslaught
of AI news we are facing every week.
Jon (06:47):
Yeah. And so this is interesting. This 11 lab story.
I've just been digressing basically so far since I've been on.
But, they're a big player in this generation space.
They closed a $80 million series B three months ago
at a unicorn valuation and at the Super Data Science Podcast, we've
actually experimented with using 11 labs to clone my own
(07:09):
voice, because we recently started doing these.
In case you missed it, episodes that happened once a month, and they recap
the most interesting parts of conversation from the preceding,
month. And the production team is like, let's
make your life easier so that at least one episode of the month, you don't
have to be involved.
(07:30):
But in the end, it's one of those interesting things where you're like, I
could do it, and it absolutely sounded like me.
But because to make the transitions between those conversations
really good anyway, I've got a scripted and
then I might as well just recorded it anyway. It takes like an extra
minute. So yeah, one of these interesting things you guys were talking
(07:51):
last week about how in generative AI, the chat application
is kind of well-worn through, but there's lots of other areas.
It was GitHub, I think specifically last week. You're talking about that
with where there's new ways that you can be using generative
AI, but it's interesting how it can get you so close.
But just be a little bit off of really, at least today,
(08:15):
saving you time in reality.
And I guess that's just going to get better and better and better as we
have more and more tools and we become better at, syncing them all
together and linking them all together and kind of having a web of AI
systems that act as agents, act as generators, and really do make
our lives easier and save us a lot of time.
Andrey (08:33):
Yeah. I think it's, very much context dependent.
So, like, we're not a place where I can replace everyone at everything,
right? But as it would be talking about later, for 11
labs and voice generation, there's now a prominent
example of audible where you do have audiobooks.
Many of them, we'll get to this near the end of a show being generative
(08:56):
AI now. And similarly, I would imagine with music it won't
replace popular artists, but for TV
shows, for movies, for things where you might want
to quickly generate a song, especially for indie
or, you know, less, financially backed
(09:17):
projects like podcasts, like podcasts.
Yeah, exactly, exactly. So anyway, exciting news from other labs.
And you can go to the link in the description, as always, to actually
listen to the song. It's pretty impressive.
Next story. This one actually happened last week, but we
do want to make sure to touch on it since we did last week, and it is
(09:40):
about the mysterious GPT two chat bot I modeled appear
suddenly and confused.
Exports. So this was on the LMS, this chat bot arena,
which we mentioned quite a lot. It's where people can, basically vote
on which of our best chat bots and give a score.
And it appeared and got.
(10:00):
Pretty good scores. And everyone was wondering, is this from OpenAI?
Is this some kind of a new model they're working on?
There was no details at all as to, what it's from,
but it did have pretty impressive reasoning abilities and,
did pretty well, although also had some weaknesses compared to GPT
(10:21):
four. And all of this is happening, as we
are waiting, OpenAI has announced that, next Monday.
So next week, just a couple of days from when we are recording, they
have some sort of announcement comping, on Monday morning.
So people have been speculating that it's search and OpenAI.
I said, no, it's not search and it's not GPT five.
(10:45):
Is this gpt2 thing related?
Maybe it's some sort of extra small model that's really good and nobody
knows. But, it's interesting that this happened and,
yeah, it's a fun little story to be aware of.
Jon (10:58):
You've had a lot of shows recently on over, well, not
overtraining, but following the chinchilla scaling laws that you guys talk
about on the show a fair bit.
There has been a trend recently towards taking small models
and training them a ton, with a huge amount of data beyond
what would be kind of the economically optimal choice
(11:21):
according to the Chinchilla scaling laws.
But you do nevertheless get really great results.
And so it would be interesting to see if this is in that space.
This is it's kind of I don't know if this is a shameless plug in.
You can pull this out, Andre, if you want to when you're editing.
But, the the team that created this, we
(11:41):
had the head of that team, Professor Joey Gonzalez, on my show Super Data
Science and episode number 707, and he talks about Elon Says a lot, which
was new at that time. And yeah, it's a really cool way to be
comparing models in a way that is, crowdsourced
and leveraging real human feedback.
It's definitely the biggest, the biggest player for doing that.
(12:05):
I think the most well thought through,
website for comparing, chat bots in that manner.
Andrey (12:13):
Yeah. And these days, you know, as we see more and more
announcements of new models, we always mention that
with benchmark results, it's always a bit iffy.
Right? You don't really know if a model was potentially trained on some
data. You want to believe that people avoid all sorts
of hacks to get the best numbers, but in some cases there have
(12:37):
been some hacks or some at least weird ways of reporting numbers.
So losses and things like it are pretty much the best way to know
model quality, or at least one of the best, better ways.
And again, later on in the show, we'll have another interesting
bit of news with open source as to this topic.
Jon (12:56):
And it's more expensive, but definitely higher quality and more reliable.
Andrey (13:00):
And on to the lightning round with some faster stories.
First up, Soundhound AI and perplexity partner to bring
online LMS to and next gen voice assistants
across cars and IoT devices.
So Soundhound AI provides these sorts of voice assistants
and have announced this partnership with perplexity.
(13:21):
Perplexity being an AI driven search engine that
has exploded over the past year.
So that's about all the details we need to provide.
Basically, you'll be able to query perplexity and answer, complex
questions.
Interesting to see perplexity starting to do these sorts of
partnerships and expanding beyond just your own platform, providing AI
(13:44):
services to other platforms.
Have you, used perplexity at all?
John?
Jon (13:48):
I actually have not. I've seen demonstrations of it and it seems like a
cool tool. I am kind of embarrassed that I haven't.
As I'm sitting here, I'm like, man, I don't have a lot to say on the
store. And I really should have used perplexity by now.
Andrey (13:59):
Okay, well, now you have some homework.
For your next interviews. Might want to do some research with it.
Because, I played around with a little bit.
I'm the same. And that I haven't used it actively.
Some people say they use it every day, like Nvidia CEO.
I will say I don't use it actively, but I have played around with it
and it's cool. It's a cool tool.
Jon (14:20):
I think, like as we've spoken about this for 30s more, my mind
has kind of shaken off some cobwebs and reminded me that I think when I
first heard about it, maybe a year ago, I did do some searches and I was
like, well, it's like it's really nice UI, but
it didn't scratch any actual informational need for me
(14:40):
beyond what I could already retrieve very conveniently with
either a Google search or a generative AI search with one of the tools
that I was already using.
Andrey (14:50):
Next story stability AI, SOS, Gen AI, discord
with stable artists, and kind of weird naming there from VentureBeat.
But what it means.
Is that you will now be able to use, stability image generation
from discord via discord bot, similar to what
Midjourney has been providing for a long time, and this will be a
(15:12):
paid service. There's going to be a free trial, and then you do need to
buy credits. Very much the same as with Midjourney.
That's pretty much the story. And I guess for people who love with
discord in a way of generating images, another tool you can use now.
Jon (15:27):
Yeah. I don't know why anyone would deliberately want that.
The weirdest thing for me about that is seeing other people and the images
that they're generating, or the content that they're generating alongside
yours. So we would experience.
Andrey (15:38):
Yeah, it's it's good for communities.
But as a, you know, individual making
content maybe less so, but still fun to see these new developments.
Jon (15:49):
It's the exact opposite of perplexity in terms of user
experience. It's like perplexity has such a beautiful UI, and it's so
thoughtful with the user experience and, yeah, with with
doing everything in discord. It's the exact inverse.
It's. But I guess that's kind of that's the deliberate play there.
It's let's get as quickly as possible the models out to
(16:10):
people and not focus on UX.
The interesting thing I think about focusing on UI, like perplexity
does, is that it gives you the capacity over time to be improving.
If I went to Back to Perplexity today, I might say, wow, you guys
have really done a lot more here and it is so much more valuable to me.
Whereas, you know, with a kind of discord experience, that's not going to
(16:33):
change.
Andrey (16:34):
Next up, Apple will revamp Siri to catch up
on its chat bot competitors, and this is paired with a second
story that Apple is nearing a deal with OpenAI to put
ChatGPT on iPhones.
So they are apparently finalizing terms for a deal
to have ChatGPT features in Apple's iOS 18.
(16:57):
And alongside that, there also has been a major
reorganization within Apple, with a major focus on generative AI
and seemingly this idea of upgrading Siri with
generative AI capabilities.
You know, we've mentioned it before. It seems like a no brainer.
And I'm sure they'll get it going.
(17:19):
Pretty seriously. And there are some announcements, you know,
presumably coming in the next few months regarding this huge story.
Jon (17:26):
And it highlights the interesting dependencies between some of
the biggest players in Big Tech. The other one that comes to mind for me
immediately is the way that Google pays crazy amounts
of money to Apple to have Google Search be the default search
on the iPhone. And this similarly seems to me
this kind of situation where surely Apple
(17:49):
would prefer not to be dependent upon Microsoft or ChatGPT
for this technology, but it's expensive, and
there's very few talent on the planet that can get you to the frontier.
Like just a handful of labs around the world are on generative AI, and
so it makes a lot of sense, pragmatically, I'm sure.
However, with things like scrapping the Apple Car, which
(18:12):
I was super excited about, I'd I'd love to even just see what that was
going to look like. But things like that project being scrapped
so that resources can be diverted to generative AI.
You. I wouldn't be surprised if we're not
too far away a year or two, maybe from Apple being able to
do these kinds of generative AI projects on their own.
(18:34):
And I would bet that they will have a huge focus
on security and,
yeah, reliability, probably more so than being right off the cutting edge
of capability. I imagine Apple will be a little bit more conservative in
trying to make sure that they have. Yeah, the most secure, the most
reliable outputs.
Andrey (18:54):
And just one more story from a section.
Alibaba rolls out the latest version of its large language model to
meet AI demand.
So this is from Alibaba Cloud.
And they have released the latest version of our model Quen 2.5.
Some benchmarks say that it's compatible with GPT before in
some capabilities like language and creation.
(19:17):
And Alibaba does say that there has been over 90,000
deployments by various companies.
So as always, be mindful that there's a whole ecosystem
over in China of people competing and trying to create
ChatGPT models and on to applications
and business. And the first story is open AI and Stack Overflow
(19:40):
partner to bring more technical knowledge into ChatGPT.
Just last week, we talked about how OpenAI has been making a lot of
deals with news publications like the Financial Times, and now
there's been this announcement of OpenAI and Stack Overflow making
a deal. Stack overflow is.
Has been kind of a leading website for conversations regarding
(20:03):
programing and just technical details.
A lot of people go over to ask questions of like, how do I do this, I
have this bug, etc., etc.
and as a CSS student from like a decade ago, this
used to be a site you went to all the time, right?
They have a lot of data, and I'm pretty sure ChatGPT and all
(20:24):
these, our models have already been trained on that data.
Kind of. Anyway, so there's now a deal for OpenAI to have access
to the StackOverflow API and be able to
use feedback from, that community to enhance
their models. And StackOverflow will be attributed within that
(20:45):
ChatGPT. Interestingly, similar to how ChatGPT will be attributing
to things like Financial Times.
So yeah, another play by them, another
potentially way to be able to overcome the limitation
of having, knowledge cut off in these models.
Right. That's one of the key limits of GPT is it
(21:06):
will always be trained up to a point in history of like April 2024
or whatever. So I think these sorts of plays speak
to that where we have integrations of new sources
and, you know, these kind of forums regarding different question and
answer topics.
These tools can still be useful for things that come out after
(21:29):
retraining data.
Jon (21:30):
Yeah, I think I'm probably a bit older than you, Andre.
And it's interesting that when I began programing, the only way
that I could get information on how to improve the code I was doing,
or even just learn things about code, was from a book.
Andrey (21:46):
Yeah. That's, from a while ago.
Yeah.
Jon (21:49):
So this is in an era where we were just starting to get dial
up internet at home, and we definitely did not have internet at school.
So we would have PC labs, and you would go in and they would have
the class textbook, and you'd learn C plus plus or Java or whatever,
and you were just limited to your teachers knowledge, your
(22:10):
classmates and knowledge and whatever was in the textbook.
So definitely a different era there.
But then the internet allowed us, of course, to be able to do Google
searches and find information originally
more so in the documentation, kind of the official documentation for
a given programing language.
But then StackOverflow became invaluable.
(22:32):
It became the default, where I would get some stacktrace
and grab the error at the end of that trace, paste it into
Google. It takes me to the Stack Overflow, and then I can just copy
whatever the solution is without even having to think about it.
And to underline the importance of Stack
Overflow. It's interesting how.
(22:54):
Gemini from Google Cloud, also inked a deal with
Stack Overflow in February. And you can see why if you want to be getting
great coding suggestions, Stack Overflow is got to be
the best place on the internet by far.
That is 90% of the time or more where that stack trace, copy
paste into Google takes me.
And now it is really interesting.
(23:17):
That is an example of something we were talking earlier in the episode
about tasks like my voice being simulated for,
an episode of mine once a month where that didn't quite meet the mark.
Generative AI for me.
Now, Claude three opus is where I go every single time
when I have an error in my code, because it is so brilliant.
(23:39):
In the same way that if you're familiar with doing that in GPT four or I
haven't used, GitHub Copilot myself, and I'm sure it's the same kind of
situation where you get great feedback right away.
And the thing that is different about doing that
with a web search is that it is contextualized exactly
to your code, so it can just rewrite your code for you.
(24:01):
You don't have to change any variable names or anything.
Andrey (24:03):
It's true, I think for non-programmers listening, I'm sure some
of you, it is very much the case, I think, like there
have been stats for Stack Overflow usage or visits have dropped by
like 40% just because ChatGPT and tools like
it have pretty much surpassed the usefulness of things.
(24:24):
So now going into a future Stack Overflow will be
mainly useful for things that don't exist yet, for dealing with
new tools and stuff like that.
And just one more note on the story before moving on.
I think it is worth highlighting that an additional story
that came out after this ad was that Stack Overflow has actually been
(24:45):
banning users who have, in a sense, rebelled
over this partnership. Some users have started trying to delete
their answers to prevent them from being used to train
GPT three. And, yeah, Stack Overflow has
tried to prevent that by banning users.
(25:05):
So as we've seen often with these sorts of deals,
to some extent with Reddit, not everyone is happy to have their
work, their, you know, contributions be used
to train AI models.
Jon (25:20):
It's an interesting problem where I suspect the terms and conditions of
using this AI to give them the right to do that, but nobody reads those.
And then when things change, on the other hand,
it's interesting to think we have this.
People talk about us running into, what's it called?
Model collapse, where if we start losing
(25:41):
high quality real sources of data like
Stack Overflow, if people stop contributing to Stack Overflow because they
can just do everything in generative AI tools, does that mean that
we're going to have a big, negative impact on the quality
of generative AI models in the in the future, such as they completely
collapse? Andre, I don't know what you think about this, but I actually
(26:02):
I'm I'm relaxed about this.
It seems to me like the AI systems,
their ability to evaluate their own outputs.
For example, I just said that my favorite generative AI tool of this time
is called three opus, and that uses reinforcement learning from
AI feedback as opposed to from human feedback.
(26:25):
And that seems to be part of how it's been able to outperform GPT four and
Gemini. And so, I don't know, it seems to me like we're we're figuring
these things out. And we the the folks at the Frontier Labs,
seem to have really clever ways of distilling down to the highest quality
training data one way or another, whether it's AI generated, human
generated. I don't know if model collapse I'm relaxed about, but what do
(26:46):
you think, Andre.
Andrey (26:47):
I agree, I think it's, you know, there's a reason to be concerned
because the internet is starting to get flooded with spam.
AI generated spam as we've covered many times.
But to your point, I think we also these frontier
labs have, you know, the most brilliant people basically.
Right? And they've been doing this for a while now and getting things
(27:09):
like cloud free alpha ground.
So I think it's a problem that exists and it's also
a problem that is solvable.
And the bigger problem, I guess, is the end
of data that is being approached, and where
there is kind of an open question of, you know, how much, how much
(27:30):
further can we push your systems and onto the next story?
New Microsoft AI model may challenge GPT four and Google
Gemini. So Microsoft is building their
own large scale language model that apparently is codenamed
my. One. And this one will have
500 billion parameters approximately.
(27:53):
So this is much bigger than what Microsoft has been releasing with Fi,
kind of smaller large language models.
This is more on par with things like GPT before.
And I guess it it is interesting because we know Microsoft has
a partnership with ChatGPT.
They have been integrating OpenAI into Azure.
(28:14):
ChatGPT. Rather into Azure.
They brought on, Mustafa Suleyman from
infection, where they trained a large language model, and it seems
like he will be heading this development.
He also was, of course, at DeepMind.
So it seems like a way to kind of de-risk things from the Microsoft
(28:35):
side, where we don't have to rely on OpenAI to have their own large
language model. They will have it and of course, will be able to
compete with Google and Meta as one of the players
that is able to develop these kinds of systems.
Jon (28:51):
Yeah, to allow our audio only enjoyer.
So people who aren't checking, I guess the this stories is you kind of
I wonder sometimes as I'm listening to last week and I have my iPhone open
in front of me and I'm scrolling to be, you know, keeping an eye on the
story names. And so, if you're not doing that within
(29:12):
phonetically, this model is my one, but it's spelled M, I
hyphen one, for people who want to like track that for later on.
It's not like a.
Andrey (29:22):
Microsoft I am. Yeah, yeah.
Jon (29:25):
Yeah, yeah, exactly. And, it is interesting to have Mustafa
Saleem in there. Previously Google DeepMind co-founder
and. Yeah, then CEO of inflection, which seemed to
be doing really cool things. And I haven't been following that story very
closely as to why it seemed to suddenly just
everyone who was major there has gone on to Microsoft,
(29:48):
and this seems to be what they're working on, maybe because of the kinds
of things that you guys are often talking about on this podcast related
to. If you want to be at the frontier, you've got to have access to a
ridiculous amount of cutting edge compute and
inflection. Just might not have had that. And so they figured, you know
what? Let's go over to Microsoft and do it there if they've got
tons of compute. And the interesting thing you already highlighted
(30:12):
this. I'm, I doing the thing that that,
reviewer explicitly asked us not to do, but I maybe
with a little bit of a twist or a little bit of additional commentary,
which is that it makes perfect sense to me that Microsoft would be trying
to do this stuff on their own without requiring OpenAI, because they
haven't acquired OpenAI. So they've taken a huge stake.
(30:33):
But this isn't their IP.
OpenAI can make deals with Apple like
we just talked about earlier in this episode.
And so Microsoft would, you know, you could imagine a scenario where
if somehow Microsoft had created GPT 3.5
and released a chat GPT tool, something of that quality before OpenAI
(30:55):
had done it, that would have been an even better outcome for them than
needing to invest in an an outside party and get the models
from them. So, yeah, you could yeah, you can totally understand why they
would be doing this kind of thing.
Andrey (31:08):
Until the lightning round. And we began that with, raise
of $1 billion.
And that is from London based AI startup wave, which
makes AI systems for autonomous vehicles.
This was led by SoftBank, who has thrown a lot
of money around, they previously raised around 300 million.
(31:31):
So seems like a pretty big round.
And it's interesting because we have fewer and fewer players
in the self-driving space.
It seems like Waymo, cruise, Tesla are really kind of leading the
pack there. But wave now has a billion of dollars, so
they'll keep working on it.
Jon (31:50):
Yeah, I don't have anything else to add to this story other than it's
interesting to me that you have two of the biggest players in autonomous
driving named Waymo and Wave.
Three of the five letters are identical to the starting three.
I don't know, it's just you think you'd want to differentiate a bit.
Andrey (32:07):
Yeah, well, it's a fun detail.
And the next story also about self-driving cars.
This one, on the opposite end about a setback.
So motional has delayed commercial robotaxi plans
and made restructuring.
The startup motional is a joint venture between Hyundai and Aptiv,
(32:28):
and they are apparently pausing their commercial operations and
delaying the launch of the driverless taxi service until 2026
due to restructuring.
There will be layoffs. It doesn't.
We don't know too many details right now.
They have some autonomous taxi rides and.
Las Vegas and deliveries in Santa Monica.
(32:49):
Those will be halted.
So, yeah, seemed like a pretty significant setback.
They, did get an investment of 1 billion as well
from Hyundai. So they will be continuing to be active and working on
this. And perhaps this is kind of a strategic thing.
Hard to say. But regardless, another player in
(33:11):
the space trying for the same thing and I guess
delaying their plan to try and launch until 2026.
Jon (33:19):
Yeah. I don't know exactly what's behind this.
Another setback in autonomous vehicles, but you could imagine
that with was it GM that recently had the big issues where they were
hired of cruise security? Yeah, exactly.
And so you could imagine at the board level or something at Hyundai,
them saying, you know, these kinds of things that we're seeing happening
(33:39):
at our competitors like GM, there's too much risk.
We've got to make sure that this is really buttoned up, or let's see what
happens with Waymo and the regulatory situation before, you know, we take
more risk in this kind of situation.
I don't know exactly what's going on, but you could imagine something like
that.
Andrey (33:54):
And one, story in the section, The Rise of Chinese
AI unicorns Doing battle with OpenAI.
This is a bit more of a summary story, not anything particularly news
related, but interesting to me.
It, kind of highlights how where now for Chinese AI, startup, GPU,
AI, moonshot, AI, minimax and point one that AI, all
(34:16):
of which I'm sure we've mentioned in previous episodes, and all
of them have surpassed $1 billion in valuation
and are competing with things like OpenAI.
Some of them are not necessarily working on frontier
AI models. So moonshot, for instance, focuses on digital assistants
(34:36):
to summarize text for students and office workers.
Minimax is targeting Viv, targeting the gaming market
with anime themed characters point 1 or
0 one that I, of course, has been developing AI models.
So yeah, they are big players in China.
(34:57):
We may not hear about them a lot in US media,
but, definitely interesting to kind of keep it in mind
that this is happening in China.
Jon (35:09):
The only thing I have to add here is that the thing that got the biggest
chuckle from me on your episode last week was you saying,
Andre, that China, that the country did not release this model.
You see, that's all that China releases.
Andrey (35:23):
Yeah. That's awesome. Use me.
And now on to the projects and open source section.
And the first one is, Paper Prometheus.
Do an open source language model, specialize in evaluating other
language models.
This is, a collaboration between six different institutions.
So Kaist, AI and MIT, various universities, algae
(35:47):
research. And as per the title of the paper,
this is related to evaluating language models.
So they claim that they figure out a way to closely
mirror human and GPT four judgments.
Basically, you can have, high correlations.
And unlike Confucius, one which were released last year,
(36:12):
they focused in that previous model on basically giving a score
one out of five here.
This one is greatly expanded.
So it can do pairwise ranking, which is what this,
arena does. It lets you compare two different answers and pick which
one you like better.
And as with Prometheus one, there is a lot of infrastructure
(36:36):
that goes on this way.
Just highlight how you kind of have user defined evaluation criteria.
That is how they had scores 1 to 5.
And in this paper we go a lot into how they expanded it.
So now support also pairwise rankings.
Have they also have
(36:57):
an expanded dataset with a lot of examples of pairwise
ranking and also verbal feedback, of
outputs. And as per a title, once again they released
some models, code and data, all open source.
So pretty promising effort I think, to
(37:17):
help with that evaluation challenge where you can go
beyond benchmarks, use these sorts of labs to evaluate, you know,
which I think is a pretty, standard approach that people are doing.
Jon (37:30):
I can't understate the importance of innovations like this.
As someone who manages a team that develops limbs,
fine tune XLA labs for particular use cases and then deploy them into
production, it is.
You know, you talk about on the show all the time, the issues with
benchmarks, those are so flawed that it
(37:51):
seems like you've got to be looking to something beyond that, like the
chat bot arena that we were talking about earlier, put out by this,
interestingly, and in a parallel.
So the same, faculty member at Berkeley, Professor Julia Gonzalez,
who was, one of the faculty members behind that, chat bot arena
also in March of last year, in March of 2023,
(38:14):
was one of the faculty on the release of vacuna, the,
the model that was a fine tuned llama model and it out competed
llama and what they did for their evaluation, they said
at that time it was like only partly scientific, where they were
using GPT four to compare side by side,
(38:36):
let's say, how does our vicuna do against llama?
How does it do against GPT 3.5?
And so you have these head to head tests similar to what humans are
evaluating in chat bot arena.
You allowed GT4 to do it, for the vicuna evaluations
and for us at our company, Nebula.
That is what we do most of the time because having
(38:56):
people either your own customers who you
might annoy if you have them, kind of doing evaluations, if you bug them
with that kind of thing all the time, or even if you try to internally say
like, you know, let's ask our business development team or our product
team to be evaluating this model that we built and compare it against GPT
four or compare two different versions of our model.
(39:17):
But this gets it is so unscalable
to ask humans to evaluate your models, because
there are so many situations as a data science team on a daily
basis, even as we're checkpointing a model, as we're
training it, we want to know, are we overtraining and are we starting to
overfit? Potentially.
(39:39):
Or have we reached a point where this model that we've trained is, is is
mature? And so these kinds of computational pairwise
evaluations, to be able to compare two models head to head is so important
and is.
Is the future of this space, if it isn't already today.
And so I really appreciate, folks like the researchers behind Prometheus
(40:00):
to developing the specialized model for evaluating other language models.
And you can bet that our company will be using this right away.
Andrey (40:08):
And exciting. Good to hear that.
You will benefit from this next paper.
And this is now, and yet another big
lamb being released and pushing the frontiers of
what release models can do. The paper is deep seek V2
as strong, economical and efficient mixtures of experts.
(40:29):
Language model.
This is, much bigger than the previous release from
deep Sky. Deep sea was, previously
67 billion parameters.
This one has 236 billion parameters.
But as with other mixture of experts models.
(40:50):
The reason they say this is an economical and efficient model
is that, those many, many parameters of a neural net, only
21 billion parameters are activated for each token.
And so when you look at a graph, we have in this paper of optimal
activated parameters versus performance.
Now they claim kind of a top spot in terms of four of
(41:14):
a number of parameters. You get really good performance on the ML
you benchmark, which is kind of one of the leading ones that people
still use to claim very really good.
So it's roughly on par with Lama 370 B
while using way fewer parameters.
And there's various details on how they do that.
We use an multi-head latent attention mixture of experts.
(41:38):
They train on 8 billion tokens, with various tweaks,
and they actually released like a 30 page paper summarizing a lot of
details. So to me, a pretty exciting paper, just because
a lot of this is still a dark art, and this has a lot of details
on how it works, and I do release the model checkpoints on GitHub.
Jon (41:59):
Yeah, I think I'm not 100% sure what I heard you say there, but it's,
8 trillion tokens.
I think I might have heard billion.
Andrey (42:05):
A trillion. Yeah, yeah.
Jon (42:07):
And yeah, that highlights.
It goes back to what we were talking about earlier with this overtraining.
So this rumored, whatever it is, I guess we might find out on Monday what
that, GPT two kind of chat bot variant is.
And if that ends up being a small one, that's, like, trained on a huge
number of tokens. This is another one of those scenarios where you get so
much out of a model by, going far beyond what the chinchilla scaling
(42:29):
laws would recommend.
And, another interesting thing here is that the chart is you're saying,
you know, the ML, you asterisk already there in
some world where ML you is a perfect, indicator of
how great a model is. It is interesting to me that for deep V2,
they have done exactly the same chart.
(42:51):
Which you can see in the paper. It's figure one in the paper that's linked
to this episode. They, they created the exact same chart as
Mistral did for the mixture of eight by 20 to be release.
And the point that Mistral was making was, hey, look at us like we
occupy this top left corner of this
(43:11):
chart where you have activated parameters on one axis and performance
on ML, you on the other axis.
And these two mixture models eight by seven B and eight by 22 B occupy
the top left corner and then deep seeded deep Secret comes along and says,
F you guys, we're going to occupy an even further corner in the
top left. And so yeah, it's yeah,
(43:33):
it's interesting to see how people position themselves relative to each
other. It does make a lot of sense.
It would be nice if we lived in a world where I don't know, you was 100%
reliable. Because then this kind of release would mean
that you have a no brainer to go for if you want to have the
best performance with an open source model simultaneously without the cost
(43:55):
of running A37 DB in production.
Andrey (44:00):
That's right. And, one more thing I'll mention.
They do have that economical detail in
the title of the paper. And that's because in addition to this
efficiency, they also do say that they save 40%
of training costs, reduce memory usage, and increase generation time
throughput by almost six times.
(44:22):
So lots of tweaks to their previous approach.
Lots of nice improvements.
And, overall seems like a really strong model unto
the lightning round. First up, Open Voice V2 evolving
multilingual voice cloning with enhanced style control and cross
lingual capabilities.
I think we're gonna probably want to go quick here, so not too much to
(44:46):
add to that. This is coming from MIT and Michelle that I and
Shanghai University and, yeah, it's, really
nice model to be able to do voice cloning in multiple languages.
Jon (44:59):
Nice. Maybe you'll be using that soon alongside having it last week in AI
theme song at least. Maybe in your outro you'll have you could be
spitting out last week in AI in Spanish, French, Chinese,
Japanese and Korean.
Andrey (45:14):
That is something we'll mention.
Actually, translation is a big use case for these kinds of things.
Yeah. Next up, Granite Code models, a family of open foundation
models for code intelligence.
This is coming from IBM.
You can measure them too much, but they do a lot of stuff in AI.
And with this release, they trained these models on
(45:35):
116 programing languages, and they compare to a whole bunch of other
open, code generation models like Code Llama.
And as always with these new stories, it's the best.
And they do release it under a super open license, Apache 2.0 for
both research and commercial use.
So, yeah, lots of progress still being made on lamb
(45:58):
specifically for code.
And next we got Huggingface launches a robot
open source robotics code library.
So Huggingface, made some news earlier this year, just a couple months
ago by hiring from a Tesla.
And the first fruit of that effort is now coming out
(46:19):
with this open source lab robot package that they say
is basically meant to be sort of like where Transformers package
Transformers is a package that is used very, very widely for
doing work in natural language processing with, with transformer
architecture. And they want to position this as that type
of package. But for a biotics, where it's a toolkit that is a
(46:41):
comprehensive platform that has, you know, support for
getting data sets, for simulators, for different types of robots,
for, training models, for having pre-trained models,
just lots of stuff.
And, yeah, it's someone who has done work in robotics
during my PhD. I think it's pretty exciting.
(47:02):
We really don't have this sort of unified,
kind of primary package. What people build on there is kind of a
mishmash of different ones. People have sort of tried to do this over
years and hugging face, getting into that ring and releasing this,
I think, has a pretty strong chance of supercharging AI research for a
lot of organizations.
Jon (47:24):
Nothing makes me more excited today than this
emergence or this, this transfer
of large language models which have over the past few years, made
unbelievable impact in software, then emerging into
hardware. The more that we see robotics being
(47:45):
able to be infused with AI lens and these kinds of open source projects
make that easier and easier.
In the past several months, the single piece of news
that has impressed me the most is out of covariate.
So Peter Abeles company where they
released RFM one Robotics Foundation Model one, and
(48:08):
the specific idea behind that, similar to the robot,
is to be bringing LMS into the physical world.
So autonomous vehicles being another place that you could imagine LMS
and AI being able to make a huge difference in the physical world with
robotics, that there's eventually going to be an infinite amount
of, of real world capability
(48:32):
infused by AI. It's more expensive, as you guys say, on the show.
Time and again, hardware is hard.
So it happens a bit more slowly than software.
But when you're working with actual real world physical things
as opposed to just bits, there's, you know, a lot of potential there.
And I'm super excited.
Andrey (48:50):
And the last story for the section last week, we had no open source
stuff. This week we have a lot, which is kind of cool.
The story is vibe evolve and new open.
And heart evaluation suit for measuring progress on multi-model
language models coming from RAC.
AI survey release 269.
(49:11):
Ultra high quality image text prompts and ground truth
responses. designed to be difficult.
So on the 50% of a hard set, all frontier models
fail to arrive at a perfect answer, leaving
a lot of headroom for progress.
And they say what this is meant to compliment u
(49:34):
MoU is, multiple choice, question.
And here they have like a golden answer.
That is the ideal answer.
And they have an evaluator rigor core that gives you a rating
from five. So kind of similar to Prometheus in a way, but for
specifically multimodal AI rather than a just chat bots and
(49:56):
yeah, exciting. Again, lots of work being done on evaluation because of
just something that is just needed.
And now on to research and advancements.
And we start of course, I think we have a big news of
the past week and that is AlphaFold three.
As is so often the case, the major research news in the media,
and I think kind of fairly also just, a mock the community
(50:21):
is coming from DeepMind and this one is AlphaFold
three, the, next generation of er, AlphaFold models.
So in the past we focused on, protein synthesis,
and analysis. Now they expand it to other capabilities
within the kind of the biomolecular, interaction
(50:44):
analysis. So this is going way beyond my knowledge
base. But just reading out of the abstract, it is
a model that is capable of structure, prediction on
complexes, including proteins.
And you click acids, small molecules, ions and modified
residues. They say that it significantly improves accuracy
(51:07):
over many previous specialized tools.
So far, greater accuracy on protein ligand interactions
and protein nucleic acid extractions.
Again, I don't really know what that means, but probably pretty cool.
And the sort of exciting bit from an AI
(51:27):
researcher perspective is they do this by
kind of simplifying their approach.
So in the paper it passed in AlphaFold two.
It was a pretty complicated architecture with some
very kind of custom built parts for the task.
It wasn't the sort of general purpose model,
(51:50):
as we see with foundation models, typically it's just kind of a
transformer have nothing specifically built for the task, and that's
why they're able to scale and do lots of things.
So in the paper, we're going to how we essentially have simplified the
architecture, trained it on many tasks.
And despite removing these things that are,
(52:13):
you know, meant to make, the model do
better at a specific task, we're still able to train it up to
be about as good.
One more detail. They do say they will not be open sourcing
this unlike we did with AlphaFold.
This is done in collaboration with,
(52:36):
the kind of more commercial,
arm of alphabet Isomorphic Labs that
is in partnership with DeepMind.
So it will be interesting to see if they try to commercialize it.
Whatever. But yeah, exciting news once again from the mind doing
research on AI for science.
Jon (52:57):
Yeah. Google DeepMind. Up until the ChatGPT.
Hubba hubba hubba. Woohoo!
That's not really a word, but I think if you know the word I'm trying to
say.
Andrey (53:06):
Yeah, yeah, yeah.
Jon (53:08):
Until that happens, I think it's safe to say that Google DeepMind
was unequivocally the world's leading AI lab.
And their approach to I was different.
So folks like Ilya Sutskever at OpenAI
said, I think scaling up is, in and of itself, going to have all these
emergent capabilities. Let's just take this transformer architecture
(53:31):
and ten x, the amount of data that we're using, the number
of artificial neurons in the architecture, let's 100 exit, let's thousand
exit. And that leads from GPT two to GPT three to
GPT four. More and more emergent capabilities.
And so they're kind of taking that engineering focused approach of
let's take this attention mechanism and scale it up really, really large.
(53:54):
See what happens. Google DeepMind might have been
surprised by how effective that scaling approach was, that
engineering approach, because what they were doing with
things like AlphaGo, through to AlphaZero, they
were systematically creating they're
(54:15):
trying to make their way towards artificial general intelligence.
So an algorithm that has all of the learning capabilities of, human
by starting narrow like a single game, like go and then saying,
okay, let's see what we can do to have an algorithm that can not
only beat the world's best go players, but also the world's best chess
players and the world's best shogi players.
(54:37):
And then a year later, let's add in Atari video games
alongside that.
And so their approach was to, you know, chip away generally.
Generally, generally adding more and more capabilities
until eventually you reach this AGI algorithm that can do anything.
And so that seems to be the same kind of approach that we're doing here,
(54:59):
where with AlphaFold, AlphaFold two, the
kind of the press headline at that time was that they solved protein
folding, at least for some types of proteins.
And so similarly here, DeepMind has taken that same kind of approach that
we saw from them, going from a single game to multiple
games to games in completely different kinds of modalities.
(55:21):
You know, adding in board games and video games here, they started with
proteins, absolutely crushed that and then said, okay, well, let's
see what we can do with other kinds of sequential molecules.
So proteins are made up of strings of amino acids.
And here they've said well let's see what we can do with strings of
nucleotides which form DNA or RNA.
(55:43):
And then that gives you so much more training
data, a lot more diversity of data.
And just as we saw with the shift from, say,
AlphaGo to AlphaZero.
Andrey (55:56):
Adding in.
Jon (55:57):
More games, instead of having the kind of catastrophic
forgetting that plagued early attempts at generalization, we're seeing
more and more that cutting edge labs like DeepMind in particular,
are able to take these different, kinds of data, these different
modalities, and be able to have a single algorithm that outperforms
(56:18):
so and so AlphaZero, which could play more
games than just go, was better at go than the specialized
AlphaGo model. And similarly, here with AlphaFold three, we're seeing the
same kind of thing where AlphaFold three is outperforming AlphaFold two,
which was already the state of the art at protein folding by taking into
account other kinds of information, by being able to model other kinds of
(56:41):
data like DNA and RNA.
And then in this case, unlike, you know, there's no obvious,
at least immediately, like maybe as you get closer to AGI, then there's
is some kind of way that you can think about how being good at chess would
make you better at playing pong on Atari or something.
But interestingly here with this innovation, there is built
(57:03):
in these interactions because in a biological system, DNA
is interacting with RNA, which is interacting with protein.
And so building right there you have yeah, this generalization
providing quite a bit more utility.
And yeah it's really exciting to see.
Andrey (57:20):
And now on to the next paper.
Very different but also. Very exciting.
It is. Ex Lstm extended long short term
memory. This is coming from a variety of people
organizations in Austria and with the last offer being
Sepp hoc writer who is the developer of Lstm.
(57:42):
So short history for people who may not know LSTMs but know
Transformers. LSTMs are a form of recurrent neural
nets, and they were sort of the main thing being used in natural
language, basically until transformers for a couple of years
and they achieved a lot of cool things.
But architecturally, they were sort of abandoned because
(58:04):
they are kind of tricky to train and tricky to scale up compared
to transformers. They have some of these inherent limitations
architecturally. And so this paper, zdm is presenting
a variation and extension of LSTMs
that aims to fix that.
In a sense, it introduces some pretty fancy
(58:27):
new, modules.
So just to get into a little detail, they have memory cells of
LSTMs that have memory mixing and exponential gating.
They also have ML, LSTMs that, add the
ability to do more parallel training.
A lot of kind of nitty gritty details where
(58:51):
this, extension method would take probably half an hour to get through.
So unfortunately, we won't be getting into a lot of details.
But the end result of these architectural improvements is
that at least in the experiments on this paper, they show
pretty comparable characteristics to transformers in training.
So we're training at 2.7 billion parameters
(59:15):
on a bunch of tokens.
They show kind of V golden outcome,
which is you can have a scaling law that shows that as
you go from not super many parameters to a ton of parameters,
you have a smooth decrease in perplexity on next token
prediction. And in fact, when I compare the characteristics of
(59:39):
ACM to things like Mamba and Lama and other
architectures of neural nets that do language modeling,
in this paper they have a best, performance
at every sort of scale of model.
Very much related to what we've been seeing with Mamba as far as
exploring alternatives to Transformers that incorporate some of your
(01:00:02):
ideas from recurrent neural nets, to
Transformers. Attention.
All of that, and potentially a big deal.
I mean, the results are pretty impressive.
And this is coming from one of the pioneers of
natural language processing, neural nets.
So generate a lot of discussion and will
(01:00:26):
be very interesting to see if people continue to build on it.
Jon (01:00:29):
Yeah, for sure. So, two German researchers, at least
one of them, Hope Ryder was involved in this paper.
But Hulk, Ryder and Schmidty were first published on LSTMs in 1996,
which is wild to think about where we were at that time
in terms of neural networks scale.
So that's the same era as Le Net five, which was
(01:00:52):
the first ever commercial application of a neural network.
And so that was Yann LeCun, Yoshua Bengio and others at AT&T Bell Labs at
that time, who created this one net five architecture that was
commercialized for the US Postal Service in order to be able to recognize
handwritten digits, for zip codes.
So postal codes in the US and so be able to add some
(01:01:12):
automation to routing of mail.
And that was kind of the peak
of neural networks for a long time.
We then went into an AI winter, where to get funding
for neural network research.
You pretty much had to be in Canada.
No other federal government was, supporting that kind of research.
(01:01:36):
Until the AlexNet moment out of again, Elliot's.
Let's give her Jeff Hinton, working on that AlexNet paper and
Alex Shevchuk, of course, the third author on that paper and the namesake
of AlexNet, that in 2012 brought kind of deep learning
into this, into acceptance that not only is it super powerful,
once you scale up compute wise, but it's also highly generalizable.
(01:02:00):
And so anyway, so a little bit of a history lesson there to say that this
kind of architecture, the Lstm dates back to before that
last AI winter, that last deep learning winter.
And so this kind of approach generally from Hulk writer and Schmidhuber
has been around for decades.
And it's interesting that they have been able to with this paper
(01:02:21):
make changes because.
If you tried to use an Lstm alone
before we had transformer architectures, in that Dasani at all.
Paper attention is all you need.
Before that, LSTMs were kind of the best way, the best
known way of trying to link many words
(01:02:42):
together in a sentence or in a paragraph, and be able to make linkages
between those to be able to have some kind of attention.
The transformer completely blew the Lstm out of the water such
that I'm not aware of commercial applications of it today, but
hell creator Schmidhuber, other researchers have continued to push along
on this Lstm idea, and it's interesting to see that we're now getting to a
(01:03:04):
point, at least according to this paper. And again, as you already stated
earlier this episode, as usual, it is the best on the
benchmarks that they that they just selected.
And so it's hard to know exactly where it stands.
But, yeah, I mean, even if it's performing comparably to Transformers
or to, to mama, these other kinds of
state space models that are coming out now, it is exciting because
(01:03:28):
it would be great. And it wouldn't be surprising if in the coming months,
in the coming years, we're able to get architectures beyond the
transformer that allow us to have that same level of attention
capabilities that the transformer has without the really heavy compute.
Andrey (01:03:41):
That's right. And there are you know, the results are really
impressive, are some limitations worth noting.
And they do not appear in the paper. So some of the components of this
architecture, the system is not parallelizable,
parallelizable computationally due to some of the details
of how it's implemented. They say they do have an efficient Cuda
(01:04:03):
implementation with optimizations.
But still, when you deploying these things at scale, when you're trying
to optimize the cost of compute, something like this may
not necessarily be the best, but
at the same time, contrary transformers, they
this kind of architecture has constant memory complexity as opposed
(01:04:26):
to for dramatic increase in usage
with respect to the size of input.
And that is similar to things like Mamba that just have these
better scaling characteristics in terms of how much compute you need
as you try to input, you know, a crazy amount of text,
so will remain to be seen whether this
(01:04:49):
is like a new mamba and everyone starts working on this and
go crazy, but definitely exciting in terms of still seeing
potential successors to Transformers, at least in some context.
Jon (01:05:02):
Yeah, we're really trying to have our cake and eat it too, aren't we,
Andre? With this kind of research where you're like this extended Lstm,
x Lstm, you're like, hey, on a on a relatively small
scale scale, we're able to get way better performance than a transformer
on memory. But the transformer, one of the beautiful things about it
is that it does scale so well.
(01:05:22):
So that architecture was designed specifically to be parallelizable
across many GPUs from the beginning.
And so yeah, trying to have our cake and eat it to not use too much
memory, be highly parallelizable, get the same kind of attention that we
can get with Transformers. But, you know, there's so many labs trying
to do this kind of thing, and it is so important in order to make AI be
(01:05:44):
something that is broadly accessible and not super expensive, and not
requiring more and more and more nuclear power plants to be set up, to
run AI centers. So, yeah, really cool.
Andrey (01:05:54):
And just one last thing I'll mention is they do say in
your paper that conceptually, zdm is somewhat
clustered and related to things like retention, RW, V,
and EGR into various groups have been exploring
the idea of making RNNs work, at scale.
(01:06:15):
And this is an ongoing topic of research, so we shouldn't make it seems
like this is like the one and only attempt at this.
It's been going on for a couple of years and there's been some
promising results already.
But this is exciting, in part because it's coming from a pedigree
of original inventors of science until Lightning Round.
(01:06:36):
And the first paper is story diffusion consistent self-attention
for long range image and video generation.
They present a semantic motion predictor that
in particular helps with consistent video generation.
And they also have examples of kind of multiple image
(01:06:58):
generator shows. They kind of present an idea of a comic book where
you have a single character that is consistent across,
the frames of a comic book. And this is one of the challenges with,
image generation is, you know, if you say generate X,
you will have a different outcome every time.
And there has been many papers on how to make consistent outputs.
(01:07:22):
For, you know, creating comic books, creating illustrations, whatever.
And so this is presenting a new way to do that
and has really nice looking smooth videos.
And, these are comic book panels.
So again, you can go and check out the link for those example
(01:07:42):
outputs. I wish I had the time to actually edit videos to add
all these, example, images
of what's happening here, but pretty cool bit of research here.
Next up, chain of art empowers Transformers to solve inherently serial
problems. This is more of a theory paper that
(01:08:03):
honestly would take me a while to understand, but the
gist is that they show that, kind of
theoretically the idea of chain of art, which is, as you mentioned
many times, prompting your language model to list out
its reasoning steps before providing an answer to a given query.
(01:08:25):
They show that this general approach
for a certain category of problems that inherently require
serial computation. So multiple steps of thinking
enables transformers to solve those types of tasks that
if it were not doing, chain of art, if it were doing,
(01:08:48):
just an immediate output of whatever
answer you want, it just would not be able to get it right versus for
these types of problems, it could get it right. So for people who enjoy
reading dense theory papers, I'm sure this is an exciting one.
Jon (01:09:04):
An analogy that I sometimes make and I don't know, there's
probably theoreticians out there who would say, there's all kinds of
mistakes I'm making in what I'm about to say, but I think of chain of
thought. Prompting is a improvement on
the kind of fast thinking that humans do.
So there's a famous book by the economist Daniel Kahneman
(01:09:26):
who actually recently passed away.
And so his most famous book is called Thinking Fast and Slow.
And this is based on decades of experiments that him and Amos Turkey did.
And so these two thinking systems, there's,
system one, which is fast thinking, which is what we're using all the
time. I'm using it right now. Words are just coming out of my mouth.
(01:09:46):
And I'm kind of just hoping that the next word that comes out is useful to
some listeners out there. Whereas the slow thinking system to this
is where you're sitting with pencil and paper, and you're
making sure that, you know, you've got all of your assumptions, right as
you work through a mathematical problem or you're writing some computer
code. And in in order to make this analogy
(01:10:09):
very easy to understand, as you say, learn how to drive
a car. You start off with that slow system to thinking where you're like,
okay, foot on the brake, turning the, the key
in the ignition. And now I'm, you know, going into reverse and pressing
decelerate. You're having to to do all this very deliberate
(01:10:30):
thinking. Whereas once you have learned how to drive a car, you're
listening to the last week in a podcast as you're driving and you're
having a conversation with a friend and drinking out of your coffee all at
the same time, and so it moves from this, your
own brain becomes adept at shifting things from the slow processing to the
fast processing chain of thought.
Prompting like this, to me, is like making the most
(01:10:53):
of that fast thinking.
You're prompting the generative AI model, which is just
predicting next token like you are when you're thinking, when you're
speaking in real time or thinking in your mind, you know, in language, in
real time. And so chain of thought prompting is like making the most of
the fast kind of thinking that humans do.
(01:11:14):
What I'm excited about, in combination with this
kind of chain of thought, prompting and generative AI is approaches like
alpha geometry as well as Q star,
which are specifically designed to be much more like the deliberative
thought that you have when you're doing math, when you're learning how to
drive a car, when you're computer programing.
(01:11:36):
And so that kind of, that kind of slow thinking
in combination with fast thinking, including chain of thought, fast
thinking could be a really interesting combination towards realizing AGI.
Andrey (01:11:48):
And just one more paper.
Also a bit more of a theory paper with a bit more of an applicable
front, and one of generated a lot of discussion.
It is k a n co merger of
Arnold Networks.
And again we'll go a bit quick because it is sort of dense.
The gist of it is of a present, a whole new paradigm
(01:12:12):
to how you build your neural nets.
So typically the idea of neural nets is to have
at the very at the most basic level, multi-layer perceptron.
ONS where you have some weights, on
edges between neural network units, where the units do
some function on a combination of its inputs.
(01:12:34):
And this paper presents a whole new alternative that basically
puts that paradigm on its head, where you have functions on the edges
and the nodes just sum up the outputs of those
inputs, and they go into a whole lot of detail
of kind of, conceptually why this has some advantages
(01:12:57):
in terms of interoperability and, the
potential expressivity of this sort of a whole new paradigm
for how you build your neural nets.
Again, a bit more theoretical in the sense that probably people will
not adopt this at scale, I suspect, but
interesting if you like, these sorts of very conceptual
(01:13:19):
papers. And onto to a policy and safety section
that with the first story being.
US lawmakers unveil bill to make it easier to restrict exports
of AI models.
We've been talking a lot about export controls for GPUs and hardware
to China in particular.
That has been in the US, and now a bipartisan
(01:13:42):
group of lawmakers unveiled a bill that would actually make it easier
for the Biden administration to impose export controls on AI
models, make it hard for software to be shared,
essentially.
And if approved, the measure would remove some roadblocks to
regulating export of open source AI.
(01:14:05):
That apparently there's already some precedent for that,
ability by the white House.
And there's some more details in the story.
Interestingly, apparently China, many in China have
been building on top of Lama, in particular as the
starting point to making their language models.
(01:14:25):
So evidently that would be why they're, the U.S.,
lawmakers are seeking to do this.
So, yeah, very interesting conceptually to think about actually having
export controls for open source models.
Jon (01:14:39):
It seems like one of those stories that Jeremy would be really great at
going into further analysis on, he really knows
what's going on on Capitol Hill.
The one thing that I would say here is that it's too bad that,
we live in a world where geopolitical conflict is real.
I mean, I'm not pretending that it doesn't exist, but it's a shame,
(01:15:01):
because it seems like we're so close to having so many of these emerging
technologies like AI, nuclear fusion, quantum computing, being
so close to realizing them and having them being able to bring abundance
to everyone on the planet.
But these kinds of things, which I totally get it, makes perfect sense
in the real world that we live in with geopolitical conflict, but
(01:15:24):
it's just a shame that we can't have everyone getting along and
everyone collaborating, on open source, on hardware,
instead of having to have these kind of competing groups in one region,
having to feel like they are needing to catch up.
And it's interesting because it seems like mostly like a futile exercise.
Anyway. So, you know, the U.S restricts
(01:15:46):
access to some hardware or to lama architectures to
say China and China, then invest
a whole bunch of redundant capital in recapitulating the same
kinds of advances separately, and eventually seems like they'll catch up.
So it's like, I don't know, it seems like a lot of a lot of wasted
(01:16:07):
potential, a lot of wasted capital and, you know, slowing down
the, the capacity for, for everyone all across the planet to be,
you know, living a higher quality of life.
But yeah, I mean, I'll take my rose tinted goggles off.
Andrey (01:16:23):
Yeah. And I think it's worth noting, you know, there's been
a lot of framing of this AI
race between China and the US in particular over the years.
And it is not entirely a fair kind of way to pose things.
There's a lot of academic collaboration between these labs.
(01:16:43):
You know, there's a ton of research coming out of China that people in
the US read and build upon, and vice versa.
So this kind of notion that these are enemies
isn't the full picture, even as there is very much a real,
situation where the US law is seeking to
(01:17:06):
basically combat the ability of China to make advancements.
And, I think with this kind of law, it's would
be surprising if it can actually be effective to actually block the
ability of open source models to be used in China.
But it does speak to making big moves, basically
continuing to add to pressure, as we've seen also with hardware export
(01:17:29):
controls. Next up, OpenAI's model spec outlines
some basic rules for a AI.
So OpenAI AI has released the first draft of a proposed framework,
which I call Model spec, to propose basically how AI
tools should respond. It's a bit like you could say like an ethical
framework, almost, where it proposes three general
(01:17:53):
principles AI models should assist developer and and
user. This should benefit humanity and
in the case of OpenAI, they should reflect well on open AI.
There's also specific rules here, such as following
chain of command, comply with laws respecting creators rights,
(01:18:14):
protecting privacy and not generating not safe work
content. And.
Has been positioned at least online.
What I've seen as trying to make it very clear what principles
are guiding OpenAI's development.
So it was maybe a bit opaque.
And what I've seen positioned is if,
(01:18:37):
ChatGPT doesn't want to reply to you with this
model spec, they can say, okay, this was a mistake and the model
should have behaved differently, or this was what you want to do
according to our model specs.
And, yeah, it's very interesting to see, given the impact
of OpenAI and just how influential they
(01:19:00):
are now, seeing this released and having more clarity on
their internal development processes.
Jon (01:19:06):
And probably to some extent, at least things like reflecting
well on OpenAI, following along from the
trauma of last year.
Yeah. Not surprising just to see this thing in there.
Does this remind you in any way, Andre, at least like in some kind of
vague analogy, is it reminiscent to you of anthropic constitution?
Andrey (01:19:27):
Very much so. I think Constitution is making very clear
on what do you want your model to do, in a sense,
versus maybe a broader in scope almost.
It, is, kind of half
technical, half PR almost.
(01:19:48):
I think it, it probably is at least partially
motivated by the big PR disasters at Google in recent months where
the model started doing, you know, really largely
criticized behavior of generating, different races in historical
context. So this model spec is kind of clarifying exactly
(01:20:10):
how they want their models to behave and why they might,
refuse to do certain things, but do other things aren't of lighting
around for story. Robot dogs armed with AI targeting
rifles undergo U.S. Marine Special ops evaluation.
So the robot dogs are literally the sort of things you may have seen
(01:20:32):
already from Boston Dynamics and, these ones developed
by Ghost Robotics specifically.
And they are equipped literally, there's a gun mounted on
top of it from defense tech company Onyx Industries.
And this is kind of small scale testing at this
point. There's two of these things that,
(01:20:55):
they are experimenting with.
This organization did clarify that weaponizing the dogs are just
one of the many use cases being evaluated, and that it
apparently adheres to all Department of Defense policies concerning
autonomous weapons, presumably meaning that you do want to have
a human in the loop and not have it be fully autonomous.
(01:21:20):
But regardless, I think pretty notable in that we still don't
have fully autonomous systems.
We have some examples of drones being semi-autonomous, but we are still
not in an age where robots are being deployed in battlefields and
doing a lot of a work of kind of a traditional soldier.
And if these sorts of systems of dogs with guns,
(01:21:43):
you know, undergo variation development within
several years, it is not kind of out of the question
that they will be a large factor in battlefields,
in warfare. And that is something that maybe we are forgetting
in this moment.
You know, there's a lot of concerns about disinformation,
(01:22:04):
misinformation. But, you know, it's it's
just a matter of time until we have kind of scary robot soldiers.
Jon (01:22:13):
Yeah. I think 2023 was the year where there were supposedly
some cases in the Ukraine, Russia, conflict of
drones working behind enemy lines,
where so if you create, you know, there's
there's this, there's warfare in this counter warfare
(01:22:34):
to create drones.
Obviously that's, you know, the first step. So you have drones and you're
sending them over enemy lines, you're monitoring them, you're having
loitering munitions that attack them.
And so the your opponent then says, okay, well, let's create all kinds of
jamming to prevent the remote control operator from operating
(01:22:55):
this drone effectively.
But then that same drone operator says, okay, well, if I lose
the ability to control my drone in enemy territory,
what should we do? And 2023 was supposedly the first year
where in the Ukraine Russia conflict, you know, is the first place
anywhere in the world that we're aware of.
(01:23:15):
Drones started to be able to act on their own after
losing. Connection to their controller.
So, you know the jamming is effective.
It prevents you from being able to see what your drone is doing.
But you still, you know, the drone has some expense, it has some munitions
on and it's behind enemy lines. And so it's using it.
It's using maybe open source.
It would be using it. So we're just saying like all these kinds of tools
(01:23:37):
that we're developing to try to create a better world
for the most part Python code machine vision libraries, PyTorch
would be being used, to have machine vision systems
work behind enemy lines.
Once you've lost control of that drone and it's, you know,
hopefully that machine that like open source based machine vision system
(01:24:01):
is accurately detecting the enemy as opposed to
just, you know, someone tending to their turn ups or something.
And yeah, it's, obviously a,
a minefield.
And it seems inevitable that we're going
(01:24:22):
to see more of this, which is, yeah, a scary thing.
And it's not inconceivable that some years from
now, most of the, yeah,
most of the fighting could be being done by autonomous vehicles as opposed
to by humans, making decisions and.
Yeah, and obviously lots of concerning things there.
Andrey (01:24:44):
Yes. And as we've covered in the past, there have been efforts in the
UN to have some sort of policies with regards to autonomous weapons,
not go nowhere. They have been blocked by countries like the US now.
Definitely visa are still in early development phases.
So not likely to be out there within the
(01:25:05):
next year or two, but still, we haven't seen
too much news on this front. And this is an example that this is
actually happening. And another story on OpenAI.
They are releasing a deepfake detector
to disinformation researchers.
So this is a tool being released to a small group of disinformation
(01:25:28):
researchers meant to detect valley free outputs to distinguish
between real images and synthetic images generated by
their tool. And OpenAI says that this will have 98.8
accuracy, although it will not be able to
detect images produced by other generators like Midjourney instability.
(01:25:50):
So good news to be able to see what images
are from here or not.
But this whole area is still kind of unsolved.
It seems to be.
Jon (01:26:02):
Yeah, it's a relief to hear that. And I guess it's unsurprising you have a
lot of experience in machine vision, Andre. So you may be able to speak to
this as to how you end up kind of having signatures in the generated
images or generative video that allow it to be identified as
from a particular generator. And I'm grateful that we're able to do that
with images, video, maybe audio, because it's proved
(01:26:23):
to be since the release of GPT four, extremely difficult to identify
generated text.
But people have been, for all of history, able to generate fake
text. And so we're kind of prepared to take with a grain of salt
things that people say or things that people write, but images or video,
up until the last a year or so, we have been able to,
(01:26:46):
you know, to to have the saying seeing is believing.
Whereas in the last year that statement is basically no longer true.
And so it's great that, we are able to detect and yeah, it's interesting
that it's kind of fundamentally different because of these kinds of
signatures that we can detect in it.
Yeah. I don't know if you have more kind of technical knowledge on that.
Andre.
Andrey (01:27:06):
Yeah, there's been research coming out, for a while now on the ability
to embed invisible watermarks, basically some
things that humans would not be able to perceive even in the pixel
space of the image. But that could be used to verify
whether this is coming from a specific model or not.
And I'm sure that this tool relies on that.
(01:27:28):
And we've been covering there's been more and more announcements of
things like meta saying that will add watermarks
to their generations, partially based on some pressure
from the, executive branch of a U.S government.
And we some of policies institute there.
(01:27:49):
So definitely some progress being made.
But it's still very hard when you release a model that's open source,
like stable diffusion that is for bad actors,
easy to mess with and make it so you can't detect images.
So it's not really a solvable problem fully,
but at least for things like Dall-E free, you can
(01:28:13):
do these sorts of techniques onto our sections synthetic media
and art. And the first story is audibles.
Best of AI voiced audiobooks tops 40,000
titles. So last year, Amazon has announced that
self-published offers in the US, who make rare books available
in the Kindle store, are would be able to use a new
(01:28:36):
tool in beta to release audiobooks on audible
with AI generated narration.
And as per this, news article that has now resulted
in 40,000 titles being released on
audible and that, you know, as you
would expect, has led to mixed, responses,
(01:29:01):
with some audible users being unhappy that there is this
onslaught of audiobooks while independent
and self-published offers who basically will not be able to afford to
hire a voice actor to do a professional narration, benefiting
from this and being happy that they can use the tool.
(01:29:22):
The article is pretty detailed.
It goes into some conversations with actual voice
actors, and it doesn't seem like jobs are being impacted quite yet.
But of course you can see the concern
that maybe it would impact that industry soon enough.
I wonder, John, I know you wrote a book, Deep Learning Illustrated,
(01:29:45):
and, I don't know if it makes sense to do an audiobook.
Probably you do have illustrations and add book, but I could also
imagine as you write things, perhaps a new book, wherever you would
consider doing this sort of thing or with your podcasting experience,
just do your own narration.
Jon (01:30:03):
Yeah. It's interesting.
I have thought about an audio format for a book like Deep Learning
Illustrated, and my guess is that it doesn't really work
because not only does Deep Learning Illustrated have lots of
illustrations, which yeah, okay.
You know, you could maybe even kind of narrate the images and try to make
them make sense. You're definitely stretching the kind
(01:30:26):
of visual emphasis of the the kind of pedagogical approach
that I took there. But even worse than the images is that
I tried to minimize equations.
And so, you know, there's not a.
There still are quite a few equations like this.
There's got to be 100 or more equations in the book.
(01:30:47):
Those would be very difficult to kind of visualize
if you're just listening to somebody reading out equations.
But even worse than the equations is that the book is full
of Python code hmhm.
Andrey (01:31:03):
So that wouldn't make sense.
Jon (01:31:05):
Yeah, I mean, that would be some really, really dry,
listening where you're like, okay, import
data frame. You know, it's like, yeah, it would be pretty,
pretty difficult to listen to.
And at least with the way that I did it, it's
(01:31:25):
the code is really interdigital right
in there integrated right in there with the text
and with the code and with the visualizations.
And so it's hard to imagine a book like that working well in an audio only
format.
Not related to this idea in general is that, one of my
(01:31:45):
best friends, his name is Zach Weinberg, and he was a guest on Super Data
Science in episode number 646.
I brought him on that, episode to kind of be
like a layperson, giving their experience of how ChatGPT
has impacted them, professionally.
And he said to me.
You're so lucky, in a way, to have written
(01:32:09):
a book before the generative AI era, because now,
anytime anyone writes a book, there's going to be some skepticism as to
how much you actually did yourself.
And that'll happen more and more and more and more.
And so I guess, yeah, I'm I'm lucky to have squeezed one in there
right at the end of the of the human writing era.
Andrey (01:32:30):
Next story. TikTok will automatically label AI generated content
created on platforms like Dall-E three.
So yes, related to that detection story, although
in this case they will be using a technology called Content
Credentials that is being developed by the coalition for
(01:32:50):
Content Provenance and Authenticity, which was co-founded by Microsoft
and Adobe. So in this case, these images will have some metadata.
You don't need to do analysis of the image to guess whether
it is AI generated or not.
Rather, these platforms will just include information to
save the tour generated on the given platform.
(01:33:12):
And now TikTok will be building
on top of a standard to make it clear whether
images are generated or not.
And speaking of that, a couple, maybe slightly more fond stories
related to AI generated images.
The first one is that Katy Perry's fan made AI
(01:33:34):
image is so real, at full the world into thinking she was at
the Met Gala.
So yet another example. We've covered some of these before of things
going big where they are in fact AI generated by people
this many people thought were real, and in this case
it was very vague images of Katy Perry being dressed
(01:33:57):
in some very fancy dress at the Met Gala.
There are some fun outcomes of this particular one, where apparently
Katy Perry's mom even texted her and said, wow, you were
at the Met Gala. This is a gorgeous dress.
And Katy Perry then responded saying,
that's AI generated beware.
(01:34:19):
And so yeah, and this article goes into some details on how a
fan of Katy Perry, who runs a fan Instagram developed dress
and how it took actually some work. They used a Microsoft tool.
They used a second tool to event post-process it, things like that.
But yeah.
Now basically don't believe what you see
(01:34:42):
is the lesson that we're getting.
Jon (01:34:45):
That seeing is not believing.
I got to say, obviously we can't in this audio only podcast be showing
you, these jazzes, but it does look spectacular and it does
look pretty compelling. It would be very difficult.
I don't I don't think I would be skeptical
if I saw those.
Andrey (01:35:06):
And I would have to do some very close analysis to notice some
of those details, as is the case in general.
Maybe if you look at the background carefully, you could detect some
hints of AI, right?
Jon (01:35:17):
Yeah. They're actually at least in one of the ones that we can see here,
there's like a whole slew of photographers that are like appropriately
dressed and they all seem to have five fingers on their hands.
Andrey (01:35:29):
That's right. As we can tell.
Jon (01:35:30):
Yeah, yeah.
Andrey (01:35:31):
And speaking of deepfakes, the next story is South Korean
woman falls for deepfake Elon Musk and loses five,
50,000 in a romance scam.
So it goes into some details on how this person was contacted on
Instagram by someone posing as, you know, Musk.
And of course, this person didn't immediately believe that story.
(01:35:56):
But there were a lot of interactions where first,
this supposed Musk told some stories, and eventually even
they had a video call with a deepfake of Musk
convincing, this person to then give up some money in
the scam. And we've covered some stories like this in the
(01:36:16):
past. I keep saying this, but if this is something you need to be aware
of, scamming is going to be a big thing with AI.
And be careful people.
Jon (01:36:27):
Yeah, it's something that not only you as an individual need to be aware
of, but I for one have several times and I
hope that this message is really gotten to them.
I've told all of my loved ones that if
you ever get a phone call from me asking for money or anything,
like they'll call me back, you know, or make sure it really
(01:36:49):
is my number because, you know, you and me are in the same boat, Andre,
where there's a lot of we have many orders of magnitude more
samples of audio available publicly on the internet than needed
to simulate our voices effectively.
And it'd be very easy to, to scam our voices.
And, yeah, it's definitely scary.
(01:37:09):
Things can happen.
Andrey (01:37:11):
Yeah, all of us gotta watch out now.
And, related to this particular story, one last one
that John was aware of, I didn't actually see this.
The story is why a young Russian woman appear so eager to
marry Chinese man. And it's a similar kind of story
where there are deep fakes of these Russian women who
(01:37:32):
apparently can speak Chinese fluently and then
similarly, essentially develop a romance scam
to, get Chinese man interested.
Jon (01:37:45):
Well, I think that this is even more so than a romance scam.
I think it's it's supposedly it's a,
it's a it's a government thing.
So this is, this is an effort to.
Yeah. Like, it's it's not so much about individually targeting an
individual and getting money from them, but it's using real,
(01:38:10):
Russian looking people. So, for example, there's a real woman here, Olga
Boyack. I'm probably mispronouncing your surname.
She's a Ukrainian woman studying in America.
And her likeness was used in generative
AI tools, where she's speaking perfect Mandarin with the Kremlin
in the background, comically.
(01:38:31):
And there are dozens. She found dozens of accounts using her face.
Where, in these videos, she's glorifying
China and, you know,
complaining that Russian men are drunk and lazy while praising Chinese
society and technology.
They these women with names like Natasha and Sophia in
(01:38:54):
their fluent Mandarin, are saying things that like, for a Chinese husband,
we'd be delighted to cook and wash your clothes and bury your children.
And so it's it seems like it's, a propaganda thing,
more so than, than a targeted scam, but, yeah,
nevertheless, I don't know. There's something about this story that.
(01:39:15):
It's I don't know. It's like the confluence of so many different things.
It's like geopolitical issues, romance, generative AI.
Anyway, it makes maybe kind of a fun story to end today with.
Andrey (01:39:26):
I think so, yeah. And with that we are done.
And thank you, John, for co-hosting while Jeremy takes a well-deserved
break.
Jon (01:39:36):
Yeah, my great pleasure, Andre.
As I said at the outset, as a rabid listener of
this podcast, it's surreal to be able to come on and
yeah, always have a lot of fun, and I look forward to listening to more
episodes in the future.
Andrey (01:39:52):
And as always, we are very appreciative of our listeners and fans.
So thank you for listening.
As always, I got a blog. We do have a newsletter last week in
that I and as always, if you are a big fan,
it would help if you give us a review, share it with your friends,
that sort of thing. But more than anything, we do
(01:40:15):
hope you enjoy the podcast and keep listening.
Speaker 3 (01:40:19):
You. Listen, you.
It's time to cruise
towards us.
Unidentified (01:40:28):
A chance to try.
If you take a break to.
Just so you.
Speaker 3 (01:40:44):
Just prove.
Unidentified (01:40:45):
For some reason until
you.
Speaker 3 (01:40:52):
Say. Till next time.
I want.
Unidentified (01:41:00):
But for the time being.
Speaker 3 (01:41:04):
Yes. My son.
She learned.
Unidentified (01:41:10):
Last here?
Yes. Don't you?
Speaker 3 (01:41:17):
That's is.
Last I heard you.