AI at the speed of thought

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
(00:00):
Alright, well, hello everybody.

(00:03):
So it's time for a brand new podcast episode of the, what do we call ourselves for the
Generative AI Meetup podcast?
But basically we just go and chat about AI and just random things.
So with your hosts, myself, Mark and Shashank.

(00:25):
So together we run a meetup in the Bay Area, some may call it Silicon Valley.
We've been running the meetup for about a year and a half now and it's a weekly meetup
and we meet a lot of really cool people.
And we talk about a lot of interesting topics and some of the people that we talk to and

(00:49):
some of the interesting topics that we learn about.
We try to share that with you here on the podcast.
So we wanted to bring our meetup from Silicon Valley to the world.
So wherever you are in the world, thank you for listening.
So there's a lot of news this week.
We didn't have an episode last week, so a lot of things to talk about this week.

(01:13):
So probably the first thing we want to talk about this week is cerebrus.
So I think it was a few months ago we interviewed Matt from cerebrus.
Matt, I don't want to miss for not this last name, Matt Sue.
Matt Sue from cerebrus.
And it was a fantastic interview.

(01:35):
So like a two hour long interview, he goes like deep for like the first hour and talks
all about their chips.
But they just came out with a brand new inference solution.
So inference, like using the models, basically it's like how fast does the model respond?

(01:57):
It's basically instant.
I think it's like one of the fastest inference solutions that has ever been made.
It's incredibly impressive.
I've played around with it a little bit.
Amazing.
Yeah, Shashank, I don't know if you tried it yet.
I have not tried it yet.
I was a little sad because Matt did invite us to try out the beta before it was released.

(02:24):
But just a lot going on.
But I did check out their blog post and looking at the speed.
So for context, cerebrus is yet another chip company, but what makes them different is that
they focus on wafer scale computing.
I actually have a giant wafer sitting here next to me that I wanted to show Mark.

(02:49):
So most of the chips are made from like a large wafer that's about maybe like 10, 12 inches
or so.
Yeah, when we're talking about wafer, I started to cut you off to Shashank.
Like the wafer is like, I didn't really have a big, yeah, it's about the size of a freezer,
like a dinner plate.
It's flat.
Yeah.
It's perfectly flat with a lot of chips etched onto it with a really, really expensive laser

(03:16):
that this Dutch company or Danish company makes.
And the GPUs are cut out of this large circular wafer and you get maybe tens of chips depending
on the size of the chip.
But what cerebrus does is it uses the entire wafer as a single chip and maybe like cuts off

(03:40):
the ends of it to make it like a large square that is contained inside this big circular
disk.
And so they have the biggest chip on the planet.
It is maybe, I forget the exact numbers, but like five times size, five to ten times
a size of like an NVIDio chip.

(04:00):
Don't quote me on this, but it is much bigger than the NVIDio chip.
And initially they were using this for training.
They're biggest customers where some of the Middle Eastern coalitions, groups of countries
that have banded together and raised a lot.
That's right.
So the G42 has raised a lot of money and they have large supercomputer clusters and which

(04:26):
are made up of a bunch of cerebrus nodes and they've historically been using it for training.
And the last I spoke to Matt from cerebrus who comes to our meetup pretty regularly.
He was talking about how they partnered with Qualcomm to help with inference.
So once the whoever G20 let's say has trained a large model for their specific use case,

(04:49):
then they deploy it onto these edge devices, your mobile phones or IoT devices, whatever
it may be, that are running Qualcomm Snapdragon chips, which most Android phones run.
And that is optimized for inference.
But you know, behind the scenes, secretly they've been working on an inference solution

(05:09):
with the same large massive chip.
It is so freaking fast.
For, I mean, the previous fastest solution was GROC with the Q, which was also started
by previous former Googlers who worked on the TPU at Google and then they decided they
wanted to spin off and do their own thing for a bit and they came up with an LPU, a language,

(05:34):
processing unit, which got a lot of press and it was really, really fast.
That blew my mind.
That was like 10 times faster than NVIDIA GPUs.
And this thing, I think it's like twice as fast as GROC.
So it's like blazing, blazing fast.
It's instant.
And I can't wait at all for the biggest models.

(05:54):
Yeah, it's wild.
Like I tried it out and I typed in my question and like you just see, just like I type it
in.
As soon as I hit Enter, you just see a block of text response.
I mean like when you wait for chat to be T, it's like, I mean, it takes like a few seconds
to get the response.
I think like when I use Claw, it takes a few seconds.
Like, oh, I'm thinking about it.
With this, it's just like, no, you get the response immediately.

(06:17):
And it really just makes me excited for like all of the use cases that you could use this
for.
Like just having fast inferences is so nice.
And not only is it fast, but it's also cheap too.
So I think for reference, I want to say Clawed, if you pay for that, it's like, I want to
say it's like $15 or $20 for every million tokens that you use.

(06:42):
How much?
I think it's like $15 or $20.
Whoa.
For a million tokens.
Yeah.
Whereas this is, I think it's 10 cents.
It's a 60 cents per million token.
Oh, this is 60 cents?
Yeah.
Okay.
Per million token.
Yeah.
I mean, so 60 cents was like $20.
I mean, like, that's wild.
That's wild.
Yeah.

(07:03):
So I think that like you're going to see a lot of use cases in this for like, agentec workflows.
So like, you know, for like agents.
Also for context, open AI is pricing for GPT 40 is $5 or $15.
They have two pricing models.
I'm not sure what the difference is.

(07:24):
It's like input versus output.
It may be input versus output.
Yeah.
So it's five or 15, depending on input and output.
Okay.
So yeah, inference is output.
Yeah.
So $15.
Oh, wow.
For a million token output for GPT 40.
This is 60 cents.
But to be fair, they are talking about Lama 370B.

(07:46):
Well, we don't know like exactly how to compare the two like Lama 370B and GBD4O, but they're
not that far apart, I think.
Let's see.
Maybe we can compare to GPT 3.5 turbo.
Lama.
No, I think it's better than GPT 3.5 turbo because this is, wait, is this Lama 331?

(08:09):
3.1 70B.
Yeah, I think 3.1.
Just for the sake of comparison, GPT 3.5 turbo is $6 for a million tokens output.
Wow.
So this is still 10 times cheaper than a worse model from OpenAI.
Yeah.

(08:29):
I mean, that's just silly.
I mean, order of magnitude.
Order of magnitude.
Or of magnitude.
Amazing.
I mean, whatever remote OpenAI had, it's lost.
No.
No reason OpenAI can't use Serubris for their inference.
Good point.
This is more competitive to Nvidia, I would say.
So maybe I should like start buying some Serubris.
Yeah.

(08:50):
Sure.
Yeah, let's drive up the secondary market shares for Serubris and help out our friends and
Serubris.
Not invested in vice, not at all.
I actually don't own any Serubris.
Why do I?
I think about it now.
I think about buying it.
Yeah, don't take our advice.
We're not like financial advisors or anything like that.
We don't know what we're talking about for this.

(09:13):
Yeah, just don't buy it.
I mean, like, buy if you want, but don't buy it because we told you to.
Do your due diligence.
Yes.
But thinking about how they achieved both this high throughput and low cost, it has to
be the way for scale.
Because if you think about Nvidia chips, the GPUs, they have smaller capacity, so you

(09:35):
have to split up the task into separate batches or something and then have each different GPU,
do some computation and then tabulate the results.
So we have a lot of bottlenecks, a lot of networking, a lot of memory caches to store

(09:55):
the intermediate results.
I would assume that this just does away with all of that.
Yes.
I think so.
But also, I remember Matt during the podcast, he was mentioning that Nvidia needs to be kind
of a general purpose, where apparently in LLM, you get a lot of zeros.

(10:16):
So if you multiply any number by zero, you're going to get zero, right?
And there is apparently that optimization done on a lot of the, or done on the three
bishops where, you know, if they see a zero, they just know they can skip some things and
the outputs can be zero.
And that will save them a lot of compute as well.

(10:36):
Whereas like Nvidia's GPUs, it's got to work for everything, right?
It's got to work for video games, got to work for, yeah, training, it's got to work for,
yeah, inference.
I mean, this is like a purpose built chip.
I assume it's purpose built specifically only for inference, which I think that you're
going to get a lot of performance gains, like, yeah, because you're on one big chip and

(10:56):
also because you built it only for inference.
It's like, you can't like probably play like Minecraft on this.
Yeah, I guess they have a separate chips, one for training and one for inference then.
I assume so.
Yeah.
I mean, I don't know if that's public knowledge, but I would assume that that's probably
the case.
Yeah, I do remember Nvidia had some kind of a trade off.

(11:17):
They had some course focused on training, some focused on inference and they did some intermediate
approach that a lot for both.
Yeah.
And also, I think that like having this really fast and cheap inference is going to make

(11:38):
it open to real time use cases.
So like, you know, we don't really see it where there's like real time translation.
Like, you know, if you use like Google translate, it's like, it's a little bit slower than real
time.
But I would imagine that right now, if I could, you know, speak and then do my text to speech,

(11:59):
or the speech to text, right?
And then I could feed that to the LM and I want to output into a new model.
I could probably do like use this for real time translation.
And I think it could help like anything like real time like a, for example, if I'm using
like a customer support agent that I'm talking to like on the phone, like I don't want to

(12:23):
like say like, hello, like representative, right?
And it just says like, you get to wait like, you know, five seconds.
It's like, oh, thank you for calling.
It this, it can just like respond naturally.
And I think that is going to be like a really exciting use case where people are going to
be able to use something like cerebral for real time LM use cases, which we don't have

(12:49):
right now.
Yeah.
Let's think about some real time use cases.
So customer service, live support, any live experiences makes sense where you want to get
feedback immediately.
So some of the Google IO announcements had this version of Gemini where you just take
your pixel phone and you have the video camera open in like the Gemini app and you're just

(13:15):
showing things and it recognizes exactly what you're looking at.
You talk to it, you're asking questions and it's giving you feedback immediately or even
open a eyes the voice GPT 40 with human natural voice demo where you're talking to it in
natural language and it responds and you're able to interrupt it in real time and it like

(13:38):
takes that feedback and responds quickly.
That would be so much more fluid and less robotic than what we have today where you have
to keep talking.
The second you stop, it takes that as an input and then starts responding and you can't
pause it and you have to wait for it to finish or you have to hold that mic button and keep

(13:59):
talking until you finish and then release and then wait for response.
It could be like a streaming bi-directional input output which would be really cool.
Yeah.
I mean, the thing is it doesn't need to be that fast.
It just needs to be like, not as fast as you can talk or maybe in the future, it'll be as
fast as you can think.

(14:19):
Oh yeah.
Yeah, right.
It's like once you connect this like your brain, right?
You could get, you could output your thoughts to the LLM as quickly as possible, just like stream
of conscious.
I feel like it's faster than that.
It's faster than you can think, yeah, maybe.
I think so, yeah.
Well, I mean, it's definitely faster than we can form words.

(14:42):
Yeah, but the thing is, is like, I tend to form words at the exact same speed that I think.
Maybe you can think faster than you speak.
I can definitely think faster than I speak.
But I can't.
I'm processing all this in real time.
Interesting.
Yeah.

(15:03):
Well, it would also allow for other modalities.
So text is really fast, but like vision would assume be much slower.
And then if you add vision and audio, maybe like other signals.
Yeah, I think so.
But you know, the thing is, the technology is there to do real time audio processing and

(15:32):
real time speech output.
Like sure.
That is there, right?
So I think that like it's not like a huge technological leap to say like, hey, like I have this text, I'll
put it in real time.
Because it's like, you're going to have the text basically immediately.
Like, let's say it's like a few hundred milliseconds.
I think it's what is it?
Like, under 200 milliseconds is like, imperceivable.

(15:54):
Yeah.
There's like some sort of metric for that.
And under like a few hundred milliseconds, like the human wouldn't be able to tell.
So let's say like, you're able to get the LLM inference and like, let's say like a hundred
milliseconds.
Then you have another 200 milliseconds to start outputting responses and like the amount
of words that you can speak per minute isn't that fast, right?

(16:16):
So it's like, if I could have an LLM inference and I can get it, I don't know, let's say just
the throw out some numbers.
Let's say like a thousand tokens and a hundred milliseconds.
Well, I can only maybe speak like, I don't know, just a hundred words a minute, 60 words
a minute, something like that.
Well, now it's like, if I'm speaking a hundred words a minute, but I just generate a thousand

(16:37):
tokens and a hundred milliseconds, well, now that's like far faster than real time.
So then you can have like a real like life like conversation.
So like even like something simple, like you could probably replace all the drive throughs
at these a fast food restaurant with LLM's.
Like, you know, why do I need to talk to somebody when like the LLM could probably just like

(16:59):
understand what I'm saying.
I think it can.
Yeah.
Yeah.
So thinking about a couple of their use cases, we were talking about agents for a couple episodes.
That's definitely something that requires long outputs, multiple runs of the LLM, multiple
runs of multiple LMs, which you know rely on each other's inputs outputs and being able

(17:25):
to run a complex agent work flow and maybe had like a hundred different steps execute and
still respond and like a quick timeframe.
That would be really cool.
That's really cool.
Yeah, really cool.
And also, this is like the worst it'll ever be.
It's just getting better.
Right.

(17:45):
So, you know, one thing that I've been playing around with is, so Olama, I think it was a
week or two ago, they came out with Olamic tools.
So basically it allows.
So I think we've talked about Olama on the podcast a few times.
Olama is a tool not to be confused with Olama, which is the open source, Olams from Facebook.

(18:09):
This is Olama, which is a tool that helps you run open source LLMs.
So they came out with a new thing which allows you to use tools.
So basically you can tell the LLMs say like, hey, LLMs, like you don't normally know what

(18:30):
time it is, right?
But now you do.
I'm going to tell you that you have access to a clock.
So if anybody asks you what time it is, like you can use the time in the response.
So then what I can do is I can say like, hey, Olama or hey, like Lama 3.1, like what time
is it and it'll tell me the time.
So I was playing around with that.
I was trying to make like a little tool to explore the tool creation on Olama.

(18:56):
And I found that sometimes there was a little bit of stochastic randomness.
So sometimes I would ask like, hey, what time is it and it would tell me the time.
I'd be like, what time is it?
It's 12.1.
What time is it?
It's 12.1.
And then I ask again, like, what time is it?
I have no idea.
I'm not sure what time it is.

(19:17):
I'm just a large language model.
I have no idea what time it is.
So it's like, I think that you would get less issues like that on these bigger and better
models.
Because this was a model that was around locally on my MacBook Air.
I don't remember which model I was using.
I think I was using the, I think it was the FI model, which is like the 1.5 billion parameter

(19:37):
model from Microsoft.
Because that one runs, you know, fast on my MacBook Air.
But if I could have used Cerebris and I, if I told Cerebris, hey, I have access to
a clock, telling what time it is.
And I was using the Lama 70B.
It would probably get it the majority of the time I would assume.
Yeah.
Speaking of smaller models, I was reading this announcement by AI21.

(20:03):
It's this, I don't know.
It's like another Israeli AI NLP company.
And I think they released like this.
What was it?
I took it one.
Sorry, my bad.
It's AI2, which is a different company.

(20:24):
It's the Paul Allen Institute.
Wait, so there's an AI1?
You said AI21.
Oh, 21.
And there's an AI2.
And I got this mixed up.
But yeah.
So AI2, Paul Allen's from Microsoft, who's their late founder.
They released a tiny, tiny model, like 1 billion parameter, mixture of experts.

(20:47):
So actually, they haven't mentioned how big the full model is.
But at any given point, it's a very sparse model that only activates 1 billion parameters
at a time.
And there's like a cost versus performance graph.
And this is the best performing at the lowest cost out of any other models.

(21:07):
So usually there's like a 45 degree line where as models get better and better, they cost
more and more.
So all the Lama models are like way on the right.
But this one is way on the left.
Very cheap, but very high performance.
So I could imagine, especially with Syriebras doing really well at sparse models, we could run

(21:34):
large mixture of experts models that have very, very low cost, really good performance,
and take advantage of all the good stuff that Syriebras have, just as a side note.
Yeah, that's a good point.
So I'm just trying to think, if it was a 1 billion parameter model, how much a RAM and compute
what I need to run something like that?

(21:56):
Because I think there's a real thumb, isn't it?
Every billion parameter, it's like you double it.
What is that?
Like two gigabytes?
Something like that.
Yeah.
It sounds right.
Because my MacBook Air, I think it has eight gigabytes of RAM.
So this would only take up two gigabytes of RAM.
So that's pretty good.
I mean, that's less than like a Chrome tab or something.

(22:17):
Yeah.
Mine has 16 gigabytes of RAM, and I've been able to run the seven 8 billion parameter models
easily.
11, 13, it can, but it struggles, and occasionally like it just, my computer dies.
So, that's nonsense.
Yeah, so this, you could probably run it on like almost anything.
Yeah, phone.
Maybe you watch.

(22:38):
I guess, watch.
Yeah, I don't know how much RAM is in like an IFA or Apple Watch.
I would say a gigabyte or two?
Yeah, sure.
Probably like some even like low end, like low end tablets, like even maybe like a raspberry
pie or something like that.
That's nuts.
Yeah.
So we've been talking about tiny, tiny models.

(22:59):
What about some of the big, big models, long context windows, clawed?
So clawed released a thing for enterprise where they support half a million tokens of context
window.
Half a million, that's pretty big.
500,000.
Yeah.
But that is smaller than Google's model, right?

(23:21):
Because I thought Google, their Gemini Million, was a million, right?
Yeah, that's right.
Yeah.
It's, it's, it's in the ballpark.
Yeah.
Yeah.
Not, not too far away.
No, not too far.
It's on the same, like, it's not an order magnitude difference or anything like that.
It's like linearly like, yeah, double.
I mean, that's not like crazy.

(23:41):
It's pretty much, but 500 million, or would you say 500,000?
500,000.
500,000.
That's a lot smaller than 100 million.
Oh.
Yeah.
We're getting big.
Yeah, really big.
What's 100 million?
Yeah.
So this company called Magic, Magic.dev.
They announced a model with a 100 million token context window.

(24:05):
So what, what is Magic.dev?
They are a company that is, from their website, working on frontier scale code models.
So they're trying to build some sort of coworker.
I think they're really trying to help with, like, code generation.

(24:27):
I don't know much about it.
They're a little bit secretive on the exact details that they give.
Because I don't actually know if you can use their model yet.
That's my understanding is like, you can't actually use it.
If somebody from Magic is listening and you can, well, then I'm sorry.

(24:48):
Reach out and let us use it.
We'd love to try it.
But I think, you know, the thing with, like, if you're going to try to do programming, like
oftentimes, like, a code base is, like, significantly larger than a, that you could fit in a context
window, right?

(25:08):
So, like, like, Android, the open source operating system.
I don't know how big it is, but I would imagine it.
I think that, like, if you try to install on your computer, it's like hundreds of gigabytes.
Yeah.
50 gigabytes by itself.
And I think it might even be, I think that's maybe like the base, but I think that, like,
in practice, you probably want something like a couple hundred gigabytes, right?

(25:33):
So it's like, if we're going to start trying to make like a change in Android, and that's
50 gigabytes, like at the minimum, I mean, you're going to need like a giant context window
for that, right?
If you want to fit that thing in.
So I think that, I mean, Android is a little bit of an extreme case.
I mean, I don't think most, most programs are significantly smaller than Android.

(25:54):
It says Android is made up of 12 to 15 million lines of code.
Yeah.
So, I don't know.
We could do some math.
If actually, how many tokens would be a gigabytes?
See, because like, I guess one token that would be like one byte, I assume.
Right?
So roughly, is that right?

(26:16):
Is it a token of byte?
No, probably more than a byte.
No, one byte would be like one character, right?
Let's think about lines of code versus tokens.
The paper on, or the block post on magic.dev says, assumes a line is about 10 tokens.

(26:36):
Okay.
Sure.
So, and you said, Andrew, how many?
15 million.
So, we've need like 1.5 million, or we've need 150 million tokens.
Okay.
So, they have 100 million, and we need 150 million.
We're almost there.
Almost there.
Almost there.
Wow.
Almost there to work on the biggest chunk of code that I can think of.

(26:58):
I mean, there's, I don't think there's pretty Microsoft.
Yeah, maybe like Windows.
Yeah.
But those are our closed source, so we can't, we don't know exactly how big those are.
But I don't think there's too many code bases that are bigger than that.
Facebook, maybe Google search.
Yeah, could be.
But we don't know.
It's too secret.
Yeah.

(27:18):
So.
But yeah, I mean, we're going to a point where almost anybody, by the way, I think,
would be able to just like put in their entire code base into the L.I.
and be like, hey, build me this feature.
Yeah.
So, looking into their blog posts, they seem to have taken some clever hashing principles

(27:38):
to store the context of the code in some kind of a sensible grouping.
Because you know, you don't want to store like the structure of the for loop in different
parts of the code and return that as a result.
Because like, maybe that doesn't really make sense.

(27:59):
Maybe you want to store the context around the for loop that does something more meaningful
or like as part of the project in the hash.
So if you're searching for stuff, then it returns a bigger chunk that has more information.
So it's almost like they're like running like some sort of notification algorithm on it

(28:23):
kind of.
So it's like, it seems very similar to the rag approach, except they have some, they coined
it hash hop where they hash different chunks, which link to other chunks, semantically.
And the sense of whatever semantics code has.

(28:46):
So I don't know if I like 100% understand it.
But whatever it is, it's cool.
So magic, that's awesome.
Like 100 million.
That's a big number.
It's really cool.
Yeah, so we applaud you.
And then you said there was one other one company that was making a really big context
one too as well.
Not specifically a big context window, but Nvidia has released a paper which promotes an

(29:15):
approach with rag to rival the performance of models with large context windows.
So they tried to overcome some of the limitations of early LLMs using rags where you got to get
the chunk size wrong, chunk size right because rag retrieval augmented generation looks through

(29:42):
a vector of chunks that you've made out of a given document.
But that's only useful if each chunk is of a certain size, large enough to hold meaningful
information.
Whereas if you break up a meaningful paragraph into too many chunks, then the meaning is kind

(30:04):
of lost across too many different chunks.
So when you retrieve specific chunks to answer a question, it may not contain like that
full information.
Yeah.
So for example, just to go off on this a little bit, if you were working in rag, let's say,
I took the book Harry Potter, right?
And now Harry Potter, well, with 100 million context, you could just put the whole thing

(30:26):
in.
You can ask questions about it, right?
But for a lot of these models, they have a smaller context window than the entire length
of like a book, right?
So let's say I took Harry Potter and I decided I'm going to chunk it by page.
I'm going to say like, hey, LLM, like find the relevant page.
So maybe it might be able to tell me like a factual information where like all the information

(30:47):
is on one page of Harry Potter is like, oh, like Harry Potter, where does he go to school?
It's like, oh, he goes to school at Hogwarts.
Because that would be like one fact that you could probably have with one inside like a single
page, right?
But if there was something that was more like wide spread, like some sort of like story arc

(31:08):
that you know, had that couldn't be fit into like one page, which I can't really think
of an example right now.
But if there was something like that, I don't know, like how did, if you ask like how
did Harry progress over the first book between like the beginning of the end?
You wouldn't be able to really answer that question through Rag.

(31:29):
But if you were able to put the entire book into a context window, you might be able to actually
answer that question like, you know, kind of like character development over time.
Yeah, so this paper from Nvidia, it fixes the limitations of traditional Rag with something
that they call order preserve Rag, where they store the chunks related to each other, which

(31:56):
supposedly helps preserve the context of these separate chunks.
So, yeah, cool research being done to extend the context through a lot of other hacks and
interesting strategies.
That's incredible exciting.
And there was one other thing that you were mentioning that was really exciting around gaming.

(32:18):
Oh yeah.
So this was a really cool model.
I thought it was mind blowing.
So we have the whole game of Doom generated by a diffusion model that was trained on long
as like who knows several hundred thousands of hours of gameplay of actual Doom.

(32:41):
And it renders each frame dynamically through the diffusion models based on like the controller
input and the existing current frame.
Okay, so to make sure I understand this correctly, like the fusion model is a thing that we typically
use to generate like images, right?
Like stable diffusion, like I put an image prompt, it's like, I want a horse jumping over

(33:04):
like a fence and it would generate that image, right?
Yeah.
So, it's not generating image and it's not even generating a video either because like
it is generating an image at what one frame at a time.
Yeah.
So I mean, like I guess it's generating images at like maybe 60 images per second.
And also like 60 images per second that is generated.

(33:26):
And also I would assume it's taking in some metadata, maybe like controller input, which
direction is like my mouse moving.
Yeah.
Like am I looking up down, left, right?
And it's generating like images based off of that.
So.
And randomly creates enemies that spawn and like send attacks.
It creates like messages when you approach a door that says, oh, this door requires a red

(33:51):
key or whatever.
And like it animates the movement of your hand with the gun in front of you as you're
walking and bobs up and down.
You can switch weapons, you can like punch, it'll do the punching action.
So it's really cool.
This demo is a minute and a half long.

(34:12):
So I don't know how many frames per second.
This actually says 20 frames per second.
So.
Okay.
Like a thousand, two hundred, a thousand, five hundred frames, let's say.
And it's cohesive.
This whole minute and a half gameplay is sensible.
It doesn't have too many artifacts and they handed this over to a couple of gaming enthusiasts

(34:38):
and they couldn't identify it.
They couldn't identify which one was auto generated and which one was real significantly
better than like a random score.
Wow.
It's really close.
So what we're saying now is as opposed to LLM generated images, they're just going to

(34:59):
be generating like complete video games that I can play.
I mean, that's pretty cool.
But not to, I think so.
Let me preface.
I think this is a wonderful accomplishment and super cool.
The only thing is though, I wish it was a game that didn't already exist.

(35:21):
Sure.
Right?
Because right now, like we don't know, like, I mean, the thing is, if I trained it on
June and then it generates like doom.
It's like, you're just like generating like was already in your training set.
So it's like, I mean, like it's cool, but it's like, it's less impressive to me than like
if they generated some sort of game that never existed.

(35:43):
Right?
Like if they made like doom, but like, I don't know, like it was not like, it didn't look
like doom, but it was played with like dinosaurs or something like that.
Like I'd be more impressed because it'd be like, hey, look, that's something that we haven't
seen before.
Like, or if you could train on like multiple video games and then it could be like, hey, it
looks like you like to play role playing games that are first person shooter that are located

(36:09):
in space, but everybody's pirates.
Like I want to see an LLM create something like that.
And then I could play like a game that nobody's ever played before.
Now that'd be, I think that's where it's going.
But you know, just the fact that like it regurgitated a game that's already in his training set,
like I mean, it's cool, but like, I don't know.
I want to see it like generates something new.

(36:31):
I mean, I agree.
But I think this is a proof of concept that is paving the way for that future that you're
thinking of.
So I can think of a lot of potential possibilities with this.
So you build a single game, let's say, doom, doom 3D.
And then it works for one single platform, one version of one single platform.

(36:56):
With this, you feed it in a specific game and boom, you're able to generate images and
like, you know, a video that can be ported over to any platform.
I assume, you know, as with previous image models and, you know, diffusion models and even

(37:16):
before that, we had like style transfer, transfer learning.
I assume you can like, switch the textures and visuals to create like a different theme.
It could be like space aliens like you were talking about a western cowboy theme.
And then, you know, take that line of thinking a little one step further and change the character

(37:37):
and change the gameplay a little bit in sort of a first person shooter.
It's like a racing car game or a third person game and a side scroller.
I'm sure a couple papers down the line will get to that future.
Yeah, I think so.
That's an exciting future.
I mean, it could be that in the future, we don't even have like game developers and maybe

(37:58):
like the game develops are working with the LMS, like try to prompt it because I don't
know, like, I'm not sure if it may be it will like given enough time, but I think it'll
be a while before LLMs are like making games that are actually like really good.
I mean, like Doom is a good game, but I mean, when did it come out?

(38:19):
I came out like 40 years ago.
Right.
I think it was like the 80s or something like that.
But I mean, we're still playing like floppy bird or whatever the latest one hit wonder
game is.
It's low fidelity.
It doesn't have to be that complex.
It's running on your mobile phone with pixelated graphics and people still find value in
that.
Yeah, sure.
Right.

(38:40):
But like I want red dead redemption.
Triple A high connection value that takes like years to develop and has a budget bigger than
some blockbuster movies.
Yeah.
Yeah, I want that.
So I think the people who are working at those game studios, their jobs are still safe.
Yeah, they're probably using a different set of tools to create textures more easily, create

(39:03):
a game dialogue maybe, create like 3D models and use all of these AI tools to aid that process.
Yeah, I mean, I think that's that makes sense.
And I think it'll be cool.
If you could have like maybe like AI models in game where you could have like a little bit
more randomness, but I think there still is something to be said about having everybody

(39:26):
play the same game.
Yeah.
Like even if like we could generate like a brand new game where everybody got something
different, I think there is something to be said about like a shared experience where
like, hey, like we all saw the same movie.
We all played the same game.
We could talk about it.
If everybody gets their own same game, we're just like going to be further divided into

(39:48):
like our own little bubbles where we just like we just surround ourselves with things
that we like.
And I mean, we're kind of getting there a little bit now with like social media and
whatnot where everybody sees their own content.
Like if you know my Twitter or X is different, it's going to be different than your Twitter
or X because it's, you know, I followed a different set of people.

(40:09):
I probably, you know, look longer at like certain images than like you would at certain
things.
And we're just going to see something different.
And like I don't know, I guess it's like a commentary on the world that like, we're going
to, like if that's where this is going, we're going to see our own video game like we're
going to be able to lose some of that shared experience.
I'm with you there.
I think it's a very evocative thought to think about a future where we're separated from

(40:34):
like this collective consciousness, this pop culture that we all get to partake in.
And instead we're living in our own little silo, you know, like the loneliness epidemic
that's having everyone in their own little box in this concrete jungle.
It's even more siloed.
And that's kind of a sad thought.
But you know, if it brings us joy and happiness, maybe that is the dystopian future that we're

(41:01):
headed to.
But I see your point.
I think this was an argument that a lot of people made when the vision models came out,
like Sora, we were like, oh, what if we're all sitting, turning on our own personal Netflix
channel and watching these amazing movies, TV shows or whatever.

(41:22):
But you know, it's cool and we can't share this with anyone like who do we talk to about
and like no one else knows what we're going through or what we're watching.
And it's a little more disconnected from reality.
Yeah, very true.
But do you know what everybody could all gather around the water cooler and talk about?

(41:44):
Is this podcast right here?
So we're trying to make that shared experience and you know, all of you guys who are listening,
feel free, tell your friends, rate us on all your podcast players, Apple podcasts.
You can you can leave up to five stars.
So you know, please do.

(42:05):
It'll help.
We're trying to, you know, juice those numbers.
So whatever you can do, share with your friends.
You know, please do so.
And if you happen to be in the Bay Area, which I know a lot of you listening are, please come
to the meetup.
We'd love to talk to you.
But anyways, yeah.
So and speaking of the meetup, it's starting soon.

(42:28):
So we got to we got to run.
But yeah, thank you so much for listening to the podcast.
Thanks everyone.
Until next time.

All Episodes

Episode Transcript

Popular Podcasts

Stuff You Should Know

Dateline NBC

CrimeLess: Hillbilly Heist

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}AI at the speed of thought

Episode Transcript

Popular Podcasts

.css-r6mb8g{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:1;overflow:hidden;}Stuff You Should Know

Dateline NBC

CrimeLess: Hillbilly Heist

All Episodes

AI at the speed of thought

Stuff You Should Know