Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
(00:00):
Welcome to the Generative AI Meetup Podcast where we dive deep into the essential issues
(00:06):
in and around Generative AI.
With your hosts, Mark and Shashank.
Together, we run a GenIEI Meetup in Silicon Valley and have more than a decade of experience
in tech.
We've had the privilege of meeting the smartest minds and learning cutting edge ideas, and
now we're bringing those conversations straight to you.
(00:27):
All right, Mark.
We got some exciting news from OpenAI this week.
How do you feel about their new '01 Pro' model?
Yeah, that is pretty interesting stuff.
So OpenAI is doing their 12 days of AI.
(00:48):
I think they're currently on day two.
It's been a few days since we've had a podcast last, but today, as we're recording,
it is Sunday December 8th.
And actually, this is a little bit interesting because this is our first time trying to do
this podcast recording remotely.
So Shashank right now is in India and I and in the Bay Area.
(01:12):
So if there's any graphical glitches, not graphical, but audio glitches, you'll know why.
It's because we're doing this remotely.
But we're going to do our best to not interrupt each other.
Yes, Shashank, to answer your question, GPT Pro.
It's super interesting.
So OpenAI, they came up with this new thing, which is not $20 a month, but they decided
(01:38):
they were going to 10 extra pricing and do a $200 a month plan, which will give you unlimited
access to all of OpenAI's best models and tools.
So it has unlimited advanced voice mode.
It has unlimited GPT 40, 01 mini.
(02:01):
But the thing that people really care about is unlimited access to the OpenAI's 01 Pro
model.
So basically, this is a model, which I don't think is that interesting of a foundation model,
but it's a model that will do some self-reflection and think harder about a given problem.
(02:27):
So the idea would be is if I said, "Hey, I'm a chat GPT 01, I want you to cure cancer."
It'll actually think about it for a while.
It might do some research.
It might hit the internet, do the research papers, all these things.
And it'll actually come up with some sort of results.
Now, it's not going to necessarily cure cancer, but it's going to try.
(02:50):
And I think that is super cool because the problem with a lot of these original foundation
models is they will just make a response super quickly.
And oftentimes, the response will be good.
But what you oftentimes want is you want to make sure that these models will return a
(03:12):
correct response.
So I think that if there is some self-reflection about what you're doing, you may be able to
get a better response.
So even though it's expensive, I don't necessarily advise anybody who's listening to pay for
it.
Well, actually, I think that back.
If you are just maybe a scientist or researcher, an enthusiast, and you want to see what
(03:40):
opening I could do, yeah, earn the $200 and try it out.
I didn't want to spend the money, but if you want to spend the money, get back to us.
Let us know what you think.
Yeah, that makes sense.
Like, you know, the way I think about it is they started with GPT-1, 2, 3, 3, 0, 5, and
(04:03):
1, 6, to 4.
I think they kind of plateaued in the capability of the base model.
And since then, they've just been trying to tack on some agent behavior, some chain
of thought reasoning, some step-by-step thinking in the same base model to try to get more
(04:24):
out of it.
And the GPT-4-0 model that's available to everyone today kind of is a taste of what this
pro model can do.
It's just that the pro model has no cap on how much it can think.
So like you said, cure cancer, solve world hunger, figure out, you know, with JavaScript
(04:50):
framework to use in 2024, like these complex challenging tasks, it can just throw a lot of
compute at it.
And yeah, I honestly don't know who this would be a good use case for, because the underlying
model is still plagued with all the challenges that we've seen LLMs have.
(05:14):
It hallucinates.
It doesn't have access to the internet.
It sometimes gas lights you.
Actually, I take that back.
The access to the internet is being fixed.
It can pull new information.
But it's still a limited intelligence underneath.
(05:36):
But I guess, you know, if it can prompt itself by breaking down the complex tasks into simple
enough chunks, maybe the pro model can solve meaningful things.
So what was the example that they showed?
I think they talked about figuring out a deadlock in some C++ code, right?
(06:00):
I'm not sure if you took a look at any of the examples that they had, but some of those
seemed pretty good.
Yeah, so I would imagine that if you're writing some C++ code.
So for those that don't know, a deadlock is a kind of an advanced concept in computer
science, but it's when you would have two programs trying to run at the same time and they
(06:26):
will interfere with each other.
I don't know how that's to describe it, but if you have single threaded computing, it's
pretty straightforward.
So single threaded means one thing runs immediately after the other, but oftentimes when
you have two or more things running at the exact same time, you have to get those two
(06:48):
things communicating with each other.
So you don't duplicate work.
So oftentimes having multiple threads run at the same time is really hard to debug, not
just for regular programmers, but in my mind, a lot of people have a hard time with multi-threading.
(07:12):
I have multi-threading and so I have a hard time doing multi-threading programming and oftentimes
when I am trying to fix a multi-threaded bug, it might take me hours if I can even figure
out because sometimes it only happens when it's a full moon and every third time that because
(07:34):
it is so hard to sometimes debug.
So if we could have it where maybe you could put in the source code into GPUT 40 Pro, have
it maybe run the code, try some examples, figure out that this deadlock only happens on the
(07:56):
full moon on Sunday that it could figure out the issue and then say, "Hey, why don't you
add a lock or a VTex or something here to fix the issue and then you're good."
Yeah, for me, I have a really hard time debugging these things so if that can make my life easier,
(08:20):
I'm all for it.
So one thing I found interesting is in the announcement on OpenAI's website, they gave
10 grants to medical researchers at leading institutions in the US to try to test this out
and push the limits of this model to see what it's capable of.
(08:42):
And they're all PhDs working on cutting edge research in medicine for discovery of new genes,
for rare diseases to extract different knowledge about human conditions, research aging, dementia,
(09:04):
and cancer immunotherapy.
So literally solving cancer, so it's kind of exciting to see these LMS be used for more
challenging tasks.
And I think I'm kind of hopeful, maybe not to solve it single-handedly, but to replace
(09:27):
maybe an intern that some of these professors would use or a grad student or an under-student
who would do some of the drudgery to replace that and give some of these senior researchers
dozens of lowly agents that can do grant work for them.
(09:51):
Yeah, for sure.
And I think the thing that's interesting about this is because it does some self-reflection,
it's going to make a lot less mistakes.
So one problem that I have is I was working on a project yesterday, one of the side projects
that I'm working on, and it's the evolving machine learning.
(10:14):
And I was using Claude 3.5 Sonnet, and then I said, "Hey, this line doesn't make sense
to me because I'm using 3.5 Sonnet to go and write neural network code for me."
And so basically what my workflow is is I'll have it write the code.
(10:38):
And then before I actually copy and paste the code into my workspace, I'll go and review
the code and try to make sure I understand every single line.
And if I don't understand something or something seems weird to me, I will go and ask it.
I said, "Hey, why did you make this line?"
And then oftentimes it says, "Oh, you're right to question that line.
(11:00):
That line doesn't actually make sense."
I wrote that in error.
And if I was somebody who didn't know how to program, I would probably get stuck, right?
Because I would not know to question certain logic.
I would just take it and then if it had subtle bugs, it would take me forever to figure out
(11:23):
what's wrong.
But I think that if there can be some sort of self-reflection, maybe you could say, "Hey,
why don't you review your own code or try to run it?"
And then self-reflect on, "Did this line actually make sense?"
And then I would assume that if you do that, you'd probably be able to get significantly
(11:46):
higher quality code that just has less erroneous stuff.
Because the code that it wrote actually made sense in general, but for the particular use
case that I was doing, it didn't make sense.
But I needed to prompt it to say, "Hey, for this use case, it doesn't really make sense."
And it agreed once I set it and then it made some really good suggestions.
(12:10):
But I think that if it can do that self-reflection itself, that's going to help a lot.
Yeah, I agree.
And I think these models are getting better because maybe a couple months ago, if you said,
"Oh, hey, I don't think that's right.
I think pigs actually do fly, and that is reality."
(12:34):
Then the Ellen would just kind of take it and run with it.
But I think they're developing a sense of conviction with what is factual, what may not be
factual.
I do think they have a couple other internal chain-up reasoning loops that they abstract
(12:57):
away from us before they give us the output.
And that creates maybe more conviction, more, I guess, truth.
But looking at some of the benchmarks for this pro model, it doesn't seem like an
(13:18):
order of magnitude better than the O1 model.
Some of the benchmarks, for example, competition code, which is, I assume, coding challenge, it
just jumps from an 89% to 90% to one percentile difference.
(13:38):
But some other things, it jumps from 64 to 75, which is pretty meaningful.
What I would like to see, which I don't think is captured here, and it's kind of hard to create
a benchmark, is complex, open-ended reasoning tasks.
Like the ones that these medical researchers are trying to use.
(14:05):
If you can create a benchmark that asks multiple models to do some kind of business analysis
on multiple companies competing in this AI space and compare different kinds of chip
architectures or different kinds of models, that is a really complex task.
(14:27):
And I think the jump from O1 to O1 Pro will probably be multiple double-digit percentage
points.
Yeah, that makes sense.
So I would push back a little bit and say that going from the 89% to the 90% is a big deal.
(14:49):
So on one hand, I agree with you.
It's only one percentage point.
But I think that, if you think about it, I like to think about baseball sometimes.
Imagine you are a baseball player and you're batting 400.
You are one of the best batting players in the world.
(15:11):
You're getting paid millions of dollars.
And then imagine you're somebody who bats 300.
So you're only, you're still probably a really good baseball player.
Actually, you're going to be one of the best baseball players in the world, also.
But you get one less hit per every 10 bats.
So if you get one additional hit, you're going to be paid maybe $100 million versus maybe
(15:37):
$2 million.
Now $2 million is so lot.
But you're one of the greats if you're batting 100 versus 300.
You're just still one of the best.
So I think people will pay a lot more for marginally better.
I think that going from 89% which is really, really good to 90%.
(16:02):
That's really, really, really good.
So I mean, sure, we're not at the 99% for some time, right?
But I would argue that even if you're at let's say 98%, if you go to 99%, you're going from,
I mean, I don't want to put this.
It's just you're almost way better, right?
Because the people who are number one are often only slightly better than the people who are
(16:28):
number 100, right?
But the people, everybody remembers number one.
So it's kind of like this.
If you are the best, you get all the rewards almost because everybody wants the top dog.
Everybody wants the JP Morgan to bake with because they know that that bake is too big
the fail.
(16:49):
But if some other local credit union does the same amount of stuff that JP Morgan does,
no one, people don't want to use them because they're not JP Morgan or number one, they don't
have $7 trillion under management, right?
People want the people want to work with the people who are number one.
(17:14):
Anyway, that's kind of the point.
I don't know, but I think that we should get dismissed these small gains once we get into
that 98% tile.
I agree with you.
I can see the big dogs, the hedge funds, the multi-billion dollar, pharma companies throwing
(17:36):
as much money as they have on these models to try to get the most out of them.
But what do you think about the underlying model itself?
So I feel like some of the models have kind of saturated and every other company is building
a similarly capable model.
(17:57):
But this new layer of agent behavior is still pretty nascent.
This whole field, we're just kind of trying to figure out what does it mean to build an
intelligent agent.
I think as awesome as this '01 Pro model is, it's still relatively new.
(18:18):
It's a little baby.
I think it's going to rapidly improve in the coming months, the next year, 2025, is going
to be the year of agents.
And I'm excited for all the other companies to push this field forward and to see what kind
of agent behavior, Google, Amazon, Nvidia, not to mention all the Chinese companies, then
(18:42):
all the tiny research houses like Mistral and the open source world, Meta, comes out with.
That's going to be really exciting.
That's true.
So speaking of Google and Asia, did you see about this new genie-2 model?
I briefly looked at it.
(19:05):
From my understanding, it is a multi-model foundation model that not only understands
more than just text, it understands.
They were kind of vague about what specifically does, but it understands video images, maybe
(19:30):
3D models, and from their website, it's supposed to help train the next generation of multi-model
models also.
So maybe create training data if I understand correctly.
Yeah, that's kind of my understanding.
So genie-2 is a new model that's created by Google.
(19:56):
It has the grandiose title of a world model, a foundation world model.
So basically, my understanding is that it's kind of like a model that can interact with
the world.
Think video games.
Think something like Grand Theft Auto, GTA 5, GTA 6, if that ever comes out.
(20:24):
There aren't a lot of AI models that can interact with the world.
I mean, there's some we have self-driving cars, but it's still a really hard problem for
AI's to play video games that interact with the world.
So I met a guy who I think he actually was working at Google that he told me that he was
(20:45):
actually working on robotics at Google.
And he said that, yeah, you know, Mark, there is a big problem that we have and it's training
data.
So apparently they go and they pay somebody somewhere to go and put a shirt on a hanger
50,000 times.
And they have a bunch of sensors recording that.
(21:06):
And then they go and then just watch this person put the shirt on the hanger.
Just put the shirt on the hanger put the shirt on the hanger.
You just, you probably get some sort of carpal tunnel just by doing what is it repetitive
stress injury just for putting the shirt on the hanger this many times.
But it's needed because if you want to have some sort of robot that can go and clean your
(21:27):
house, you need to have the training data.
So I think that this is Google's attempt at getting training data for their robots.
So there's not very many companies working on full scale robotics.
Tesla is one of them figure AI is another one.
I think Boston Dynamics is another one.
(21:50):
I know Nvidia has a thing for their project group, which does something similar, but this
is another one.
This is a world model that you can have a agent.
But when I say agents, I'm talking about think that 3D robot.
Think your little Tesla Optimus bot playing GTA 5, right?
(22:13):
That's kind of what I imagine.
And then this is a world that can interact with.
And then you can generate that training data and see how does your model, how does your
agent, how does your 3D robot interact with the world.
So on their website, they show the they show a robot kind of listen to Tesla Optimus bot
(22:36):
walking around in Egypt or maybe it would be walking around in some sort of city city
skate.
So I think this is a really cool thing that will enable eventually us to pass the coffee
test.
If I could look at the coffee test, what's the coffee test?
(23:01):
Yeah.
So the coffee test, I believe, is a thing that was made by Steve Wozniak, where the idea
is is you're able to have a robot, which can go and then just make you a cup of coffee.
Or just put this robot in the house and then go and then figure out how to make a cup of
(23:21):
coffee.
So if you think about it, that's actually one of the hardest problems as a benchmark.
But I like it because it is very, it's something that you can wrap your head around.
It takes two seconds to describe the problem, but they actually solve it is super hard.
Because you think about it.
Think about how hard it is to make a cup of coffee.
(23:43):
And if you use maybe to express the machine, some sort of coffee maker or French press,
you might have to do such things as figure out, okay, where is the coffee cup?
Maybe you have to go and search through the cabinets, figure out, those are plates.
Okay, here are the cups.
Now you have to go figure out how to grab the cup, put it off the counter.
(24:04):
You don't want to break the cup.
Then you have to maybe find the coffee ground.
It's even assuming you just bought instant coffee, but you have to figure out, okay, now I
found the cup.
Now where is the coffee?
I have to go put the coffee ground into the machine.
Oh no, the machine didn't have water.
Okay, let's go find what the water is.
(24:24):
Okay, now how do we get the water from the sink to the machine?
Now the coffee machine might be across the room.
So I think I'll find some sort of pitcher to put the pitcher, the water into the pitcher.
Oh no, the water is cold.
I have to go boil the water, boil the water, put it into the machine, put the coffee
(24:44):
grounds in, turn the machine on, put the cup in front of the machine, now break the cup.
There's a lot of things that go on.
And then that also means that you have to maybe navigate around the kitchen, not hit the
dog or the children running around, not lock anything over.
It's a really hard problem that I think that this is one of the first steps or to actually
(25:08):
solving out how to figure out how to create and solve the coffee problem.
I agree.
And especially being here in India right now, seeing how different life is, that honestly
even making a couple coffee here to my parents liking is a super human challenge.
(25:32):
And I can see a completely different being that is not accustomed to our specific nuances
with our specific flavor of coffee machine and grounds and everything.
It is a non-trivial task.
And going back to Genie 2, it seems like it is a sibling of the Vio video.
(26:02):
Vio generation model.
Because it creates 3D game like environments from scratch.
So this entire, so the blog post that they have from last week, it shows a bunch of different
kinds of video game environments from robots running around to like a racing simulator to
(26:29):
like a horse jumping around to like a boat beating around in the water with fluid dynamics
and complete like robust 3D characters with like a samurai or a robot rotating.
And it's very reminiscent of like the Sora demos too, which kind of have like a memory
(26:56):
about what is going on the scene when there is has like these object permanence kind of
things in builds where you look left, you look right and the scene is very consistent.
So I'm so confused about what the use case is because they're targeting it to like people
(27:25):
who want training data to train robotics like you mentioned.
Also like prototyping for understanding game development for deploying agents, whatever
that may be like robotic agents in the real world.
But on the other hand, you have like Vio, which is Google's video generation model.
(27:49):
And we were talking about this a little bit before the podcast.
So you know, Sora has been hyped for a while.
And I remember because they released it on the same day as Google's announcement of like
Gemini's million token context window to you know, ostensibly like get some of the hype
(28:12):
away from Google and get people more excited about open AI.
But we still don't have Sora.
And on the other hand, we have like these two super cool models from like Google.
Well, the Genie 2 model, it seems like that's still a research arm.
I don't think it's publicly available to consumers.
(28:32):
But Vio is publicly available like a full on video generation model high quality 1080p
with a range of like styles.
That is like wow.
I could not have imagined like a year or two ago we would have this right now.
Oh, for sure.
(28:53):
It's amazing.
And I have to say that I think that open AI is a hype.
Right.
So they announced Sora.
But I think honestly, they sort of shot themselves in the foot a little bit because I think they
announced it before it was ready.
(29:16):
And now sure, maybe some researchers have used it, but I've never used it.
I've never met anybody who's ever used it.
It seems maybe there's three people that have actually used the Sora model that are outside
of open AI.
And I haven't talked to any of them.
So as far as I'm concerned, Sora is vaporware until it's launched.
(29:37):
But now since open AI kind of showed their hand, right, everybody, it was everybody has been
tried to catch out.
So there are now a few different video generation models that are launched Google, they have the
Vio and actually, Amazon.
Yeah, Amazon.
(29:58):
Yes, they just came out with a video generation model that I actually was able to use.
And Amazon has been like sleeping giant in this AI race, but now they've caught up.
Amazon has completely caught up now.
So Amazon, I felt a little bit bad because so I worked for Amazon and I felt like we were
(30:21):
a bit behind.
But now I think that we are in the running with all the big dogs.
So last week, Amazon had its developer conference and they announced their new Nova models.
So the Nova model is comparable with everybody else.
(30:44):
It's on the same level as GPT-40, Google's Gemini model, Anthropic.
It's better at some things.
What's it other things?
You know, it's it's comparable.
It's it's there.
But it's really cool because it has some nice features.
One, it can do image generation.
That's sweet.
Two, it can do multi-modal input.
(31:09):
It supports a huge context window.
300,000 tokens.
So I know Google has their model that is what is it, the million token context window, but
it's not the top of the line one.
So here we have 300,000.
That's really big.
That's I think even bigger than an Anthropic.
(31:31):
What is Anthropic?
100,000 or 120,000?
I think it's two.
But yeah.
Yeah.
Key 20.
So this is 300.
So slightly better.
So that's big.
I like that.
200 languages.
And also they can do video generation and image generation.
In my opinion, the image generation is some of the best that I've seen.
(31:52):
I think it's comparable.
Like the flux models.
I was super impressed.
I was playing with it a little bit.
And I thought the image generation was really, really good.
So I generated a model of cats playing poker.
And it had a beautiful image of cats with a signal on a poker table, chips everywhere.
(32:18):
Now if you did a little bit of pixel peeping, you could see, oh, maybe the cards aren't
perfect looking.
I think I saw that if you zoomed in on the card that you couldn't quite make out, oh, is
that a seven of diamonds or is that maybe a five or so.
It is a little bit hard to read.
But from a distance, it looked amazing.
(32:39):
The eyes looked great.
Even the hands looked pretty good in my opinion.
So I think these Nova models are really good.
And then what you can do is you can take that image that you generated from Nova.
And then you can go and then use that as a reference image to create the video.
And the video, it's not a long video, but it looks good.
(33:02):
And that's before Sora came out.
If you said that, okay, Amazon would take a year after Sora was announced to come up with
their video generation model.
That wouldn't seem about right.
So it's kind of a MeToo product.
But I mean, it's not even a MeToo product because Amazon now has a video generation model.
(33:23):
But OpenAI still doesn't have Sora.
So as far as I'm concerned, Amazon is crushing OpenAI in terms of foundation models and video
foundation models compared to OpenAI right now.
And also Sora, or sorry, not Sora, but Nova from Amazon is way cheaper.
So she's talking about her running through some numbers.
(33:46):
And we found that OpenAI, or sorry, Amazon's Nova.
It only costs $3.2, $3.020 for a million output tokens and only $0.80 for a million input tokens.
So what were we looking at?
I think it was $10 or something.
It was $10 or $15 for OpenAI's GP4 up model.
(34:11):
So this is a third of the price.
Right?
Almost a quarter of the price compared to OpenAI.
(34:31):
Well, our features has video output and it's a third of the price.
Yeah, why wouldn't you use Amazon?
Yeah, I kind of I mean, I'm sure that OpenAI has got all the money.
Because at this point, they're all neck and neck.
OpenAI, Google and Anthropic were the three popular ones.
(34:52):
Then we had LAMLA, which got added into the mix.
Then there was XAI, which also is catching up.
I don't think it's on par with the big dogs yet, but now we have Amazon too.
And I really think they're all just different.
It depends on what flavor of LAM you want, what kind of personality you want,
(35:15):
solving your specific task on your specific domain.
And also sure, pricing matters if you're dealing at really large scale.
And you want to make some trade-offs.
And you really got to just run these models on your specific problem set and
(35:35):
see which one is better suited.
So I think you're right.
Amazon has caught an OpenAI with its base model.
It doesn't seem to have as much of a mode.
But we're still on, what was it?
Day two of their 12 days of AI and outspence.
(35:55):
So who knows, man?
Day 12 might be a mind-blowing AGI announcement for all I know.
Probably not.
Yeah, sure.
But I think I, for one, am excited for
agentic behavior from all the other companies.
(36:16):
You know, you've seen how much better the '01 model is, not even the pro, just like the '01
model that we got a couple months ago, how much better that is compared to the regular
GPT-4 model.
And if some of the other companies, I'm especially looking forward to claw on it.
Because they seem to be the most pricey and the best model, best for programming tasks,
(36:44):
anecdotally the best for reasoning tasks too.
It has a pretty big context window and if we can get like agentic behavior out of Anthropic,
I think they might be better than the '01 pro model too.
But regardless, this is such a new field.
I think people are still trying to figure out what it means to be an agent and fast forward
(37:10):
like six months, a year down the line.
And we're just going to be getting unlimited intelligence that we can just run constantly
and point at any given task and have it run with it and come up with amazing insights while
we sleep.
(37:31):
Wow.
Yeah.
You know, Coniman?
Yeah.
Oh, for sure.
It reminds me of that book.
Why is it thinking fast and thinking slow where there's the system one thinking and the
system two thinking?
Yeah, that sounds right.
(37:52):
So I sort of think that we're going to have both, right?
So I think that for a lot of tasks, it's okay to do something fast, right?
Maybe if you're, I don't know, doing customer support, you're maybe trying to automate the
McDonald's drive it through, right?
(38:14):
So if somebody said, hey, give me a big Mac, you don't need to go and think about that for
two hours, right?
That'd be a problem if you did.
Everybody be annoyed.
You want to be able to say, okay, we want the big Mac, you got the side of fries and then
that's it, right?
You maybe have a little of the agent that takes the customer input and puts that into the
(38:38):
ordering system, right?
But if I want to say, hey, help me debug this code, it's okay.
I can wait 10 minutes while it tries a bunch of stuff, right?
That's fine.
I mean, it would be nice if we could do it faster.
I can patient, I can wait.
And I think depending on the task, we're going to want both, right?
(39:01):
We're going to want your super fast models.
You're going to want your GROC, your cerebris, your sabanova.
You want those models to just raw tokens per second.
That's the thing that we want.
You want your 10,000 tokens per second.
(39:21):
But at the same time, you want to go think about it for a while.
And maybe under the hood, there's 10,000 tokens per second, but it's maybe doing a Bayley
Inn self-reflective.
So it might take a while, right?
And that's okay.
And I think it will be okay if it's so 30 minutes to two.
Okay, as far as I know, I don't think open the eyes is using cerebris.
(39:43):
I don't think any of the big companies are using cerebris, except for maybe like the Middle
Eastern Coalition, the G20, and some private cloud and private companies.
But once we have a ridiculous token per second, and we're not that far off from 10,000 right
now.
(40:03):
So cerebris is like 2,100 tokens per second.
Not that far off.
Once we get to insane speeds, agentic behavior, this pro model is not going to take that
much longer than the regular GPT-4 model takes today.
What do you think that is going to mean for the rest of society, for some of the entry-level
(40:30):
jobs, some of the entry-level knowledge work, business analysis, the researchers, the grad
students, or even programmers, like entry-level programmers, six months to a year down the
line.
(40:52):
Yeah, so before I get into that, just to mention cerebris, I mentioned it, but cerebris is a
chip company.
They actually based out of Sunnyvale, we have a really cool interview with Matt from
(41:16):
cerebris, a great one he's a regular to meet up.
So if you're not for those cerebris, take a look at that, Matt Sue is the episode about
that, but cerebris is really cool.
They make some of the world's fastest chips.
So they do really fast inference, and they make the world's largest chip you can possibly
(41:37):
make.
So there is a, well anyways, that's that, you can get into it, but super-dustin company,
but then Shishank to answer your other question about what this means for the world.
Well, I think that it's going to do a couple things.
One, I think it's going to make people a lot more efficient, and I think that if you're
(42:00):
already an expert, it's going to make it so that you don't need to hire that new grad
students.
And hire that junior developer.
You're going to be able to be a lot more efficient.
And I think that there is a litmus test that you can do to yourself.
So if you're listening to this and you're saying, hey, am I at risk?
(42:22):
What you should do is you should try and give the LLO some of the tasks that you do on
a daily basis.
If you say, hey, this is far surpassing what I normally do, then you may be at risk, right?
So if you are somebody who goes to stack overflow and copy and paste things, or you just go
(42:46):
to the chat to PT, you ask it to, I should say, let's pick the role of a junior developer
as an example.
So if you're a type of developer and you have a task, you just go and then copy and paste
things from the internet, you just go in and blindly copy and paste the things that you
(43:08):
generate from the LLM.
And you think that the LLM does a much better job than you could do, and you get stuck if
the LLM breaks and you don't know what's wrong.
Well, you're probably going to be replaced and you should be concerned.
You should be very concerned.
(43:29):
But if you are the type of programmer who goes and maybe uses this to accelerate what you're
doing, you go through and you understand that you push back against the LLM, I think you're
going to be fine for a while because these LLMs do make a lot of mistakes.
(43:50):
They do have hallucinations.
I think that while it's true, they're improving.
At some point, all our jobs are going to be replaced.
For a lot of companies, they can't accept hallucinations.
If you are a bank, it's not okay for every line of code not to be understood.
(44:17):
That's not okay.
If you're JP Morgan and you have trillions of dollars under management, you don't want
somebody who just copied and pasted some stuff from the chat GPT.
No, that's completely unacceptable.
They will pay more for somebody who actually knows what they're doing.
So I think that's a little bit stressed.
(44:39):
Depending on how much you use it to help you, it will help you determine if you are replaceable.
But if you're copying and pasting things, you're boiling a flight developer, you should
be afraid.
And maybe start learning plumbing or some of the other trades.
Because if you don't know what you're doing, I promise you the trades are going to be safe
(45:01):
for a super long time.
This is what we said, it's going to be really hard to create a cup of coffee.
A lot of people can create a lot of humans can create a cup of coffee fairly quickly without
much training.
I could spend five minutes and show somebody had a cup of coffee if they had never had a
cup of coffee before and they'd probably be able to figure out.
But you might be in school for a couple of years to be able to be an actually good plumber
(45:26):
or learn how to renovate a house or do electrical work.
Those trade jobs.
Those are going to be safe.
But I'm actually a lot more worried about the clerical work.
So I didn't told the story about the past, but I had a friend who worked, I won't say the
name of the insurance company, but he worked for an insurance company.
(45:50):
And this was actually before AI, but it was an insurance company and they did claims.
So he wrote with a team of people, he wrote some software for this insurance company.
And as soon as the software was done, it made it so that people could fill out their claims
(46:10):
much more efficiently and they didn't need, they could do all due to the app and they didn't
need to talk to it.
And then once the software was complete, the insurance company fired 4,000 people.
So all the people that were doing that, if they were just going and manually just raw data
(46:32):
entry, they were all replaced by the app.
So I think that if you're just doing something that is you're following the process, you're
doing the repeatable thing, you're not actually thinking too hard, you're just blindly following
the instructions, you're not actually using your brain to think, you will be replaced
and it will be sued and it could be any day now.
(46:55):
So if that's you, you should be very, very afraid.
If that's not you, you're fine.
You're going to be fine for a while.
Yeah, I think like it.
Maybe for a new world.
In the short term.
Yeah.
As opposed to it could be replaced with organizations paying for it.
That's what I've been doing, capital.
They'll be paying to acquire chips and paying for the cost of electricity to run models
(47:21):
or get access to models, get whatever, pay the license fees for these models and then again
run them on these chips.
So the beneficiaries are going to be chip makers, infrastructure companies, cloud service
providers and the makers of the models maybe.
But honestly, even that, given how comfortable all of the models are getting from all the
(47:47):
different FAN companies and OpenAI, I don't think they're going to have like a significant
moat.
The open source community is catching up too.
It seems like the chip makers are the ones who are winning here and the stock market kind
(48:09):
of reflect that.
Yeah, I think that's true actually because I think that a lot of these companies like OpenAI
(48:32):
or like anthropic, I think they are, sure, they're highly valued now.
But I said before I'll say it again, the moat around these is going to zero and I don't
know if a company like OpenAI will actually be able to process from their models.
(48:55):
I think they're in a unique position right now where people are willing to pay $20.
There's even $200 a model for the Pro model and a lot of the same thing is true for anthropic
or mistral, Gemini, whatever.
A lot of people are willing to pay $20 a month right now.
(49:17):
But I'm not sure if people will continue to want to pay that much, especially because
Open Source, because a lot of these models are just becoming free, right?
From the GBT40 model, you can use it for free a lot of times.
I think a lot of people that are allowing you to use the lot of models you can use it
(49:41):
for essentially free, I'm not sure if people are going to want to pay for that up their
level based model.
Maybe some people will pay.
I'm sure in the future there always will be some people who are willing to pay for the
best.
But I think for a lot of other people, it's going to be a nut, right?
I can get AI search results from my Google search and that is enough.
(50:09):
I think for a lot of people, even for complexity, for complexity of the competitor to Google, I
think, yeah, they have the complexity pro, but a lot of people are okay with just the
base free one.
Yeah, for sure.
Yeah, for sure.
So we're going to these companies are going to need to figure out some of their business model
(50:30):
other than just charging for it.
For example, yeah, for plus they started showing ads and then I think Google is going to show
ads a lot of these companies are just going to start injecting ads in the LLN.
So maybe that's what opening I will do.
I agree.
We'll start showing.
I don't want to say it's just something.
It's just something.
But I think that it's just not going to be sufficient.
(50:51):
It's still work there.
But start charging for a dollar a month.
I think the space is still so new.
This is like the birthplace of the internet or how it was back then.
I think this is going to rapidly change.
I think agents are going to take over.
I think robotics is still so new.
(51:11):
We're going to have physical embodied beings carrying out our will through some kind of
multi-modal agent.
We're going to have companies like Devon AI that software engineer agent that is worth hundreds
(51:33):
of millions of dollars as opposed to hiring a single engineer for a couple hundred grand.
It'll be the battle of agents.
People have amazing agents in different verticals, not to mention physical embodied agents.
You did mention that trade skills are going to be safe for a while.
(51:57):
But I don't know man, exponentials sneak up on you.
We may not be too far away from a world where physical robots can be very dexterous and
take over any kind of task that a human can.
(52:17):
Exciting times.
I'm excited for the other 10 days of OpenAI.
I think the other announcement was fine tuning.
Supervised fine tuning I think?
Sorry.
What was it?
Do you remember?
(52:39):
Yeah.
It was reinforcement fine tuning research program.
I don't know.
Fine tuning.
You found the authentic models.
I have to check the O1 Pro on your specific domain and build like the equivalent of a Devon,
(53:00):
a super smart software engineer agent, but in your domain.
So like law, or medical, or accounting.
So it's going to be exciting, crazy times, and it's still so new and we have a lot to see.
Yeah.
(53:28):
For sure.
I agree.
You're right.
But I think that when it comes to physical things, there's a lot of things that prevent exponentials.
And I think that's why it's taken so long to get self driving cars.
(53:50):
So for example, we've seen demos of self driving cars for maybe the past 10 years or longer,
right?
And we still don't have self driving cars all over.
I mean, sure.
Actually, we fair.
I did take my first Waymo.
I was in LA last week.
It was great.
Took it a nice, 10 minute drive.
(54:13):
Drunk through the city.
I was very impressed at how it handled a lot of difficult intersection.
But it's still in really limited areas.
So it's, I think right now it's in San Francisco, LA, Phoenix, and I think they just announced Miami.
So it's coming.
It's getting there.
(54:33):
But it's still not available everywhere yet.
I can't have a Waymo pick me up from the airport.
And even if it's possible, I think the laws and regulations won't allow it.
So in order to do a lot of these trade jobs to get back to why I think that the extra mensuals
(54:54):
will take a while for the trade.
If you want to be a plumber and you don't want to be sued, there is some actual professional
certifications that you need in order to be a plumber.
And I think the same is true for an electrician.
And you can't just have any random Joe Shmo go and do plumbing work for you or electro.
(55:15):
I mean, you could, right?
But if you want to be certified, it helps to have all those certifications.
Now, I'm not an expert at exactly what all those certifications are, but I know that there
are licenses that you need to have in order for the trades.
And I think that is giving me another thing, which is going to be a barrier to entry for
the robots.
(55:35):
So maybe at some point people will say, hey, it's okay.
We can just have the whatever plumber robot.
I think that will happen eventually, but the legal and regulatory framework needs to be
there.
And I think that if the human is even 10% worse than the robot, people will probably still
(56:03):
trust the human more than the robot.
And I think that the robot will need to be an order of magnitude better than the human in
order to actually replace the plumber or the electrician or even the taxi driver is sink.
Right now, I don't know how many way mose maybe none have got into an accident.
(56:24):
Maybe there's some high profile cases where they're giving the accident.
But if the human driver is decent, people still may not trust the way.
I mean, I think I would trust the way no, I was very impressed with how good it was.
But I think it'll be a while.
I mean, it might take another 10 years plus for self-driving cars to completely replace
(56:45):
all the automobiles.
And I think we it might take another more than 10 years, maybe 20 years plus for the robots
to replace the trades.
So I think that honestly, if you're in the trades, as you're getting started, if you're
in high school right now, and you're looking for something, be a plumber, be a plumber,
(57:05):
be an electrician.
I think your job is going to be safe for your entire career.
Now, it could be wrong.
The shank could be correct on this one, the exponential could just be there.
That will catch up.
But I don't think so.
I think that you might still be able to have a four year, three year in the trade.
And then sure, I see what you're saying.
People still may use the robots.
I guess they'll have to wait and see.
(57:26):
I think you'll be safe.
I think it'll be safe for a while.
I do think that people are getting used to AI, especially now with Chat GBT.
I think self-driving cars didn't have the mass adoption that they thought it would, unlike
Chat GBT, which fastest growing product in the world.
(57:48):
Now it has multiple competitors.
And I think people will adjust to this new reality of an artificial being, answering questions
for them and helping them throughout every step of the way.
And they might be okay with it being a little stupid, being a little incorrect, hallucinating,
(58:12):
some things.
Because this is the new normal.
And it is still useful.
Despite its shortcomings, it's still very useful.
But again, six months to a year down the line, who knows what we're going to see.
Yeah.
(58:33):
I agree with you.
But think about this, right?
If I ask a question for a research paper and Chat GBT gets slightly wrong, it doesn't matter.
No one died.
But if I have a robot that's driving me around or doing my electrical work, and if I do
(58:57):
the electrical work and it messes up, and my house burns down, well, that's not okay,
right?
And I think that the tolerance is the martin for error is okay if you get it wrong when
you're running an email summary, right?
Just kind of how, as we sort of mentioned before in the podcast, where I said, hey, the
(59:19):
difference between 90%ile and 99 is a world difference.
I think that in order for the robots to replace the cars, it to replace the plumbers, electricians,
all the trades, you're going to be, need to be a lot closer to the 99%ile than the 80%
(59:40):
ile.
And honestly, I think that getting that last 10% improvement is going to be where probably
80% of the work is going to be maybe even more.
And until we get to that point, I don't think that the robots are going to replace the jobs
yet.
Maybe it'll make it so that plumbers can work with robots, all that stuff.
(01:00:04):
But I think that for these high-stake things, you need to have it where you're operating
that 99%ile, better than humans, superhuman, before people are able to trust it.
And just for stakes, there are certain domains where that last 10% of robots that burn, that
really crucial.
The strategy is never going to burn like a doctor.
I don't want a child to take a doctor anytime soon.
(01:00:28):
Same thing with the electrician.
Same thing with the driver.
I need my driver to be focused, 100% accurate.
No hallucinations, please.
Yeah.
Yeah.
So anyways, yeah, but it is an exciting time and I'm ready for it.
(01:00:56):
But I think we've been talking for a while.
So we could probably end it now.
And yeah, I think this was a good one.
So thanks everybody for listening and we'll catch you in the next one.
[Music]