Scaling ChatGPT: Inside OpenAI's Rapid Growth and Technical Challenges | Evan Morikawa - Dev Interrupted

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Evan Morikawa (00:00):
it took a lot of tweaking to kind

(00:01):
of find what the rightutilization metrics were and
then how to optimize those.
And all of those werereally critical to
getting more out of it.
But for us, everythingwas framed in terms of.
Every improvement that wehave represents more users we
could let onto the platform.
As opposed to say like, oh it'sdriving down our margins or

(00:21):
making it faster, we kept Costand latency as relatively fixed.
And the thing that could getto move was we got to get
more users onto the system.
What's the impact ofAI generated code.
How can you measure it?
Join linear B andThoughtWorks global head
of AI software delivery.
As we explore the metrics thatmeasure the impact gen AI has on

(00:44):
the software delivery process inis first of its kind workshop.
We'll share new data insightsfrom our gen AI impact report.
Case studies into how teamsare successfully leveraging
gen AI impact measurements,including adoption
benefits and risk metrics.
Plus a live demo of how youcan measure the impact of
your gen AI initiative today.

(01:06):
Register at linear beadat IO slash events.
And join us January 25th,January 30th, or on demand
to start measuring the ROI ofyour gen AI initiative today.

Ben Lloyd Pearson (01:16):
Hey everyone.
Welcome back to Dev Interrupted.
I'm Ben Lloyd Pearson,director of Developer
Relations here at Linear B.
I'm pleased to have EvanMorikawa joining us today.
welcome to the show, Evan.

Awesome.
Thank you.

Yeah, so full disclosure, Evan
and I worked together in apast life at an API company.
Uh, you may not rememberthis, but you were actually
a big part of why I decidedto join that company.
What?
I did not know that.
Yeah.
Wow.
Yeah.
You had saw your video onYouTube and, uh Oh, wow.
Yeah.
Yeah.
So cool.
Yeah, we've got a little bitof a history, so it's really
great to to just have anopportunity to catch up with

(01:49):
you, talk about some of newstuff that you're working on.
I know per, you know,personally I was a little
bit envious when I saw thatyou were going to open api.
I, 'cause I was like,that sounds so cool, . And
here we are talking abouthow cool your work is.
So.
let's kick it off.
Yeah, I mean, I mean,opening eye, I feel like
needs no introduction.
I mean, I feel like everyoneis talking about it.

(02:09):
it's gone viral, at thecenter of the conversation
around AI and LLMs.
the release of ChatGPT haskicked off a global phenomenon.
Uh, and I, I want to walkthrough that story, uh, in
particular, you know, what ittook for you to scale ChatGPT
when you had that viral moment.
and especially, I, I, I knowI've heard a little bit that
there was a shortage of GPUsthat also affected this.

(02:30):
I, I do want to dig into thatand you know, I just wanna see,
learn a little bit about howyou've been flexible and nimble
while this industry has justlike rapidly shifted around you.
Yep.
So let's just kick it off.
Where, where does thisstory all start for you?
Like how did you get towhere you are right now?

Yeah, so I mean, as you mentioned, I was working
at an API company before.
And at the time,that's all OpenAI had.
In fact, that was the lightbulb moment, was learning
that three years ago or so.
Now OpenAI was startingthis apply to team.
And Applied was allabout bringing this crazy
technology safely to market.

(03:06):
And at the time, that wasjust a single developer
facing API around thevery first GPT 3 models.
which is good, because upuntil that, like, it's still
the case that I do not have amachine learning background.
OpenAI was much moreresearch lab focused.
But then here now, it's startingup this brand new small team
doing APIs and products.
And I was like, ah, I'vebeen doing APIs and products.

(03:28):
Maybe I can helpcontribute here.
I think there was also thatfeeling, too, that, you know,
I feel like I had a reasonablesense of how computers
worked by that point in mycareer, except for this.
This still is this kindof magic, certainly at the
beginning was this magicblack box, like, okay, I'm
gonna see what that's...
All about as well.
I also knew of like some ofthe, the, the founders through

(03:50):
just the, the broader networkand yeah, reached out and that
was the start of it there.
Uh, when I joined,applied was very small.
There were only basicallylike half a dozen engineers
working on all of the.
APIs and systems for that.
and that steadily grew as wewere trying to work on iterating
on these language models.

(04:11):
I think the next big, thebig really first release or
push of any kind was whenwe tweaked these to write
code and worked with GitHubto launch GitHub Copilot.
So GitHub Copilot actuallyinitially ran through our
servers at launch, becauseit was very difficult
to run or deploy these.

(04:31):
That was definitely thefirst time we had any
experience running thisat any sort of scale.
But still all the way through,basically up until ChatGPT.
And still to this day, havethis very core of an API
business that powers all theseother AI powered applications
that a lot of people aretrying to build on now.
and then along came ChatGPT.

(04:54):
It's actually kind of aninteresting story, because when
ChatGPT launched, it wasn'tnecessarily sure whether or
not it'd be a scary thingor a totally normal thing.
You know, at one, on one hand...
The model that waspowering it, GPT 3.
5.
That had been out forseveral years at that point

(05:15):
in various iterations.
people could already signup for free through the
developer playground.
And in fact, really noticed alot of people like playing with
the models in that kind of way.
So, in some senses,there wasn't that much.
Different about it, youknow, maybe a new UI.
Also the model had been improveddramatically to make up things

(05:36):
less and be more conversational.
But you know, on the flipside, this is also the first
time we would ever be offeringanything without a wait list.
This would be a free touse application out there.
And yeah, that definitelychanges things as well.
Um, you know, actuallyon launch day, I think it
launched on a Wednesday.
It was like November 30th.

(05:58):
And we, kind of by design,sort of thought this would be
a low key research preview.
Just a blog post, atweet, and nothing else.
And actually on launch day,nothing crazy happened.
Like some people came andused it and never passed
number five on Hacker News.

(06:19):
We had all the capacity weneeded, traffic tapered off.
We're like, great, quicklittle launch, like move on.
You know, actually, at the time,we were preparing internally
for the launch of GPT 4, whichwas coming up next, so this
was, this was actually a wayfor us to experiment with a
lot of the recent fine tuningthat had gone into the older

(06:41):
models to, like, make themsafer and more conversational.
It was the nextday, or rather 4 a.
m.
the next morning, whenour on call starts to get
paged because traffic isstarting to really rise.
There was this graph, we weretrying to figure out what was
going on, and all the trafficwas only coming from Japan.
And we were very confused,we actually thought we were

(07:02):
getting like DDoS'd or likeattacked or something like that.
but no, they had just wokenup first, it was starting to
virally spread through Twitter,and then yeah, and then by
the time the morning of theEast and the West Coast picked
up, it was like very clear.
That we were gettinghammered here.
unfortunately there, you know,normally you can just like

(07:23):
throw more servers at theproblem, but yes, there is
like a very finite supply ofGPUs here, so there's really
not much we can do about it.
The, we did have somecontingency in place for this.
The idea being thatwe could throw up a.
Like, WeAreAtCapacityPage,this kind of like, bouncer
model, if you will, right?

(07:43):
Like, oh, the club is full,when some people leave,
we can let more people in.
Uh, that was actually explicitlydone because we also did
not want another waitlist.
like, no one likes a waitlist,so we're like, oh, we can...
Try it this way, butunfortunately that
WeAreAtCapacity page was upa lot for the first while
while we were trying tolike scramble to get, to

(08:05):
get more capacity online.
And it also just like fixed thelong tail of other stuff too.
GPU capacity was definitelya dominant concern,
but we also had all theother scaling problems.
Kind of other everybodyelse in engineering has too.

Yeah.
Wow.
That's a fascinating story.
And, and trust me, Iremember those, uh,
capacity pages quite well.
, so yeah.
So we, I wanna talk about a,a little bit before we get
into the GPU stuff and some ofthe scaling issues about this
black box that is ai becauseit is really how it feels to
a lot of people, and many ofthe productized versions of it.

(08:39):
That's, that's really kind ofhow they, they perform, right?
Yeah.
And, you know, there'sno shortage of content on
the web that describes howgenerative AI and LLMs are
going to do both wonderful andhorrible things to all of us.
And, you know, as thisinitial hype wave wears
off, I think we're reallystarting to see like concrete
use cases that truly bringa lot of value to people.

(09:00):
And, you know, I'm thinking,you know, just from my
personal experience, likemy grammar checker software
giving me more intelligentadvice about my writing.
And it can't do everything, butthere's still some things that
it just saves me so much time.
so, you know, I think itwould be really interesting to

(09:21):
hear from an engineer that'sactually building this stuff.
Like, about the biggestmisconceptions, the biggest
misunderstandings that you'veseen, related to all of this.
Uh, and, and specifically,if there's like one or
two things that you canclarify for the world.
About AI, like whatwould those things be?

that's a good question.
The, certainly onemisconception is that it
is definitely, certainlynot as it exists today.
This completely omnipotentsystem here, right?
It is, there are quirksabout how this thing works.
It's actually quite helpfulto remember somewhat how
these things are trained.
They're trained by predictingthe next word, for all words

(09:58):
and phrases on the internet.
And that, though, is in itselfdeceptive, because this is
much more than a Mad Lib systemor an autocomplete engine.
Because it has turned out thatin order to be able to predict
the next word, you kind ofneed to know a huge amount
about society and structureand context and culture and
things like that as well, too.

(10:19):
But at the same time,they're also very steerable
based on the context thatyou give it ahead of time.
You know, when GPT 3 firstcame out, this was, in fact,
the title of the paper isabout few shot prompting.
Few shot here basically refersto the idea that you just give
the model a handful, like threeor four examples of what you
want it to do, and that kindof eggs it into the right...

(10:40):
Direction.
in some ways this isnot too unfamiliar.
I think, you know, if yougo on Google, old Google at
least, if you type some, if youtype a question one way, you
get Yahoo answers, but if youtype it a different way, you
get like a scientific paper.
so kind of in the sameway, you can steer it in
the direction of things.
the one thing that's alittle weird about this,
though, is this has leftsome very seemingly black box

(11:02):
types of prompt engineeringinto, into this right now.
Uh, there were some papersrecently that have been getting
a lot of press that simply...
Say, if you ask in the prompt,literally take a deep breath
and think step by step, itdoes much better on a large
like category of tasks, that'ssomething there feels . Wow.

(11:26):
Like kind of wrong on one hand.

It feels very humanistic though, doesn't it?

Well, that's a, that's actually the great point.
On the flip side though,if you kind of assume that
the models are going toapproach kind of a human
type level of intelligence.
It's worth to ask yourselfif you throw, if you threw a
relatively competent human inwhat you're asking it to do
with as much context as yougave it, how would they perform?
Mm-Hmm.

(11:51):
and it's not unreasonable tothink that these models kind
of mimic that because theyare mimicking human speech.
We've seen it on the internet.
so yeah, in, in fact, if youare, I actually think some
of the people who are the,could be the best at prompt
engineering are like teachers,engineering managers, tutors,
people who are used to likeasking the right questions and

(12:12):
setting up the right contextfor people to like, help them
arrive at the right conclusions.
Uh, and if you kind of thinkabout it that way, you get
like weirdly better results.
across there.

Yeah.
And I think you're,you're actually partially
answering my next question.
So, you know, in your, in youropinion, like what are the
situations that are ideal forgenerative, generative ai?
Yeah.
And alternatively, likewhen would you steer someone
away from it as a solution?

Yeah, yeah, yeah.
some places I think hasabsolutely been helpful.
I really just started totap at, right now certainly
software engineering, coding,boilerplate type things.
That was the first placethat we internally really
dogfooded any of this was whenwe ourselves Started to use
Copilot as an educational tool.
I think that is still, despitehow much has been talked
about, still an underrated,undertapped ability here.

(13:00):
The idea of a personalizedtutor everywhere you go.
Like, TAs, think aboutuniversity, the professor can
talk at you all day, but itwas the TAs where I really
learned things, and thosefollow up sessions, because you
could ask follow up questions.
And you can frame it in away that makes sense to you.

(13:20):
that kind of iterativelearning, I mean this is where
I personally use it the most.
Like the thought of readingany paper without this thing
on the side, or withoutbeing able to like, just
being able to like ask it forconcrete examples of things.
Being able to rephraseand reword and take
follow up questions.
That I think is goingto be a huge deal.
there are some places, on theflip side of this, This, though,

(13:43):
uh, yeah, like it still has,we have not in any way, shape,
or form solved this, like,perfectly verifiable problem.
This should not be yourend state for medical
advice on a huge slewof the topics right now.
It should not be the thingthat is, you are trying to
use to cite case law foryour own trial, for example.

I think a lot of people
call this hallucination.

That's right.
Now, on the flip side,there's actually law.
I think it's a reallyinteresting area as well, too.
Especially some of theside effects of this, like
the power of the embeddingmodels that we have.
you know, it is very goodat saying what patents
are similar to this one.
What cases aresimilar to this one.
And in ways that are muchmore than just do they

(14:26):
have similar keywords.
But the fact that these modelsdeeply semantically understand
what's going on, it can helpyou find and search like that.
Like that's, yeah, those typesof search abilities will get
dramatically more powerful.

Yeah, and I know personally, one area
that I've really found a lotof benefit is I have to very
quickly understand a lot ofnew technologies as a part of
my day and work with thingslike, you know, I don't, I
don't, I haven't done a lot ofRegex in the past, but I find
myself doing a lot of it today.
And just getting that likeintermediate understanding,
like immediately without havingto like parse through a bunch

(15:01):
of resources across the web.
Like Yep.
I, I mean, I can't even, Ican't even add up the number
of hours that that saved me.
Yep.
You know, so Yeah.
It's really great to do that are

probably the best example of that Yeah.
Yeah.

Yeah.
Yeah.
That's been wild.
It is actually kind of blownmy minds at how quickly, you
know, 'cause I mean, reg Xs islike, is not complicated, but
it can be time consuming if youdon't work in it frequently.
Yep.
You know, so.
Yep, yep.
Yep.
So then as you know, as anengineering manager, like,
like what expectationsare you setting with your
team in regards to theuse of generative ai?
So, you know, it soundslike you're an early
dog fooder of copilot.

(15:33):
you know, is, is, is that, areyou are as an organization or
are you like, systematicallyadopting tools like that and
you know, what are the, likechanges that you've seen?

yeah.
So getting this really.
We definitely want tomore and more have this
help us be productive.
I mean the, we actually havea research team called the AI
scientist team, which is verymuch long term about being
able to make this, this work.
at the same time though, theThere's a pretty wide gap

(16:03):
between what works just straightoff the bat from the prompt
in ChatGPT and like an actualtool you'll use day to day.
Like, yeah, some people dida quick plug in to like hack
in before VS Code, but Imean it still takes a lot
of product work to put thewhole experience together and
make it work really nicely.

(16:24):
I think this kind of immediategeneration of like developer
productivity tools, it will takea fairly large investment to...
Like, really put itnaturally into a workflow.
That's actually why I'mvery optimistic about the
coexistence of both a tool likeChatGPT and the API ecosystem.
like, ChatGPT is about, wethink it can be useful in

(16:45):
lots of different places.
In a much more kind ofgeneric sense, but there
are also a huge number ofindustries where like being
in flow matters a huge amount.
Developer tools, law, right?
Medical systems, like all theseother places, you'd also need
and want a lot of these like.
integrated applicationsas well too.
but yeah, like we absolutelybelieve this will make us and

(17:07):
everybody else substantiallymore productive um,, over time.
For sure.

cool.
So let's pivot a little bitto talk about what I think
is gonna be the most uniqueaspect of this discussion.
So, you know, in recentyears, there's been something
that has brought gamers.
Crypto enthusiasts andLLM experts together.
And that is the frustrationover this shortage of GPUs.
Right?
, I tried to upgrade my P PCa couple years ago, . So,

(17:33):
you know, before we get intothat shortage, I, I think
it, it would be beneficial tojust step back for a moment.
Yeah.
So can you describe likethe technical role that GPUs
play, in the open AI techstack, uh, and you know,
why are they in so importantand how do you all use them?

Yeah, absolutely.
So at the end of the day, whenyou ask Chachi PT a question,
it's taking your text and it'sdoing a huge amount of matrix
multiplication to kind ofpredict the next word, right?
That's what all thesehundreds of billions of
model weights are for.
And at the end of the day,we're basically doing one math
operation, multiplying andadding a lot, which, like we're

(18:09):
talking like quadrillions andquadrillions of operations
a second here to do this.
So, the ability for a GPU tobe able to GPUs are just many
orders of magnitude faster here.
For a sense of scale, the latestGPU that will be running this
NVIDIA H100 that everyone's beentrying to get, uh, that can do

(18:31):
about 2 quadrillion floatingpoint operations per second.
your laptop CPU probablycan do on the order of a
couple hundred billion.
So we're talking likethousands of times
difference in speed here.
So yes, you can run thesethings on CPUs, but the
performance differencewe're talking about is 100x.

(18:53):
So it's, especially formodels this size, it's like
really important to do that.
The other thing that'ssignificant is that the
models are so large, theydon't fit on just one GPU.
We need to put them onmultiple different GPUs.
That's actually where thingsjump dramatically in complexity.
Much like the rest of this worldknows that yeah, your simple

(19:15):
single-threaded applicationmakes sense, but once you run
it massively concurrently ona, on a globally distributed
system, that's where thehard problems come from.
Similarly here, once you runthese on multiple GPUs, things
get a lot more difficult.
now you really care about howfast your memory bandwidth is.
You really care about howfast your interconnect, your

(19:36):
like network bandwidth isbetween GPUs, between boxes.
and it's gotten to thepoint now where every
single one of those...
These metrics can become abottleneck at various points
in the development cycle.
So we really care aboutall of them, and we
really maximize them.
And anytime there's a newor faster interconnect,
that usually almostdirectly translates to

(19:59):
improved speed for chat GPT.
That directly translates,if you can make it run twice
as fast, that's twice asmany users as can access
it on the same hardware.
Or that's twice as largeof a model as you can run.
on the same hardware today.
so these really make a, makea huge difference in, in what
it's capable of going forward.

Nice, so I think that describes
pretty well the impact that,you know, a shortage of GPUs
would have on your company.
So, beyond a page that says,hey, we're really busy right
now, come back later, whatother strategies did you all
take to adapt to the suddeninflux and the lack of this
hardware resource that you need.

Have to do a little bit of everything.
We were working very closelywith Microsoft, who subsequently
was working closely withNVIDIA on this to build that
capacity here, but also itwas about making the most of
the resources that we had.
So optimizations werehugely important here.

(21:01):
And this is the long tail ofsort of your classic put it
through a out the parts thatare slow, optimize those.
but for us, those optimizationsare all across the stack.
It's from low level cuda kernelcompiler optimizations to
sort of more business logic.
How are we batchingrequests together?
How are we maximally utilizingthese, uh, these things?

(21:24):
it took a lot of exploration,like we discovered that we, we
started with a very basic GPUutilization metric, you know?
From, from whatever the Nvidiabox bits out, we found that
it was actually misleadingbecause we were not like,
it could be doing more mathwhen the same time it was
on, or we were actuallyrunning outta memory instead.

(21:45):
So it took a lot of tweakingto kind of find what the right
utilization metrics were andthen how to optimize those.
And all of those werereally critical to
getting more out of it.
But for us, everythingwas framed in terms of.
Every improvement that wehave represents more users we
could let onto the platform.
As opposed to say like, oh it'sdriving down our margins or

(22:07):
making it faster, we kept Costand latency as relatively fixed.
And the thing that could getto move was we got to get
more users onto the system.

Ben Lloyd Pears (22:17):
Gotcha, gotcha.
And you know, I've alwayskind of felt that that, GPUs
were chosen for LLMs, mostlyoutta convenience because
it was the hardware that wasavailable at the time that
was most closely adapted tothe, the needs of that Yeah.
Community.
So are you looking at otherhardware options out there?
Like are there thingsthat are coming up in the
market that you think havepotential to replace GPUs,

(22:39):
specifically for generative AI?

Yeah, well, so even though we all call
them GPUs, like these are notthe graphics processing units
of your desktop PCs anymore.
Uh, machines, and especiallywith Google calling theirs TPUs,

(23:04):
the kind of, this is why it'ssort of like AI accelerators,
more the generic trend now,but I still call them GPUs.
I mean at this point theyare hyper specialized to
do this exact one typeof matrix operation.
Um, another, another concreteexample here that they've been
optimized only for AI thingsis doing lower precision math.

(23:27):
Most people, when they havea floating point number, you
get 32 bits to preserve it.
We do math with, you can do mathwith 16 bits, so with 8 bits.
which means you can just like,do more at the same time.
And now they havededicated circuits to
do that on the upcomingGPUs that are coming out.
So in a lot of ways theyare super specialized.
the other thing thoughis the software stack.

(23:50):
NVIDIA has, their CUDAstack, their software stack,
the kind of compiler layeron top of there has been
hugely specialized to that.
It's currently verydifficult for people to use
AMD or Intel or the othermanufacturers out there.
Actually, there is a product,OpenAI Triton, which is
explicitly designed to tryand better abstract that.

(24:12):
and that's potentially ahuge deal because the ability
to use other hardware muchmore easily is definitely
a big thing, uh, will be abig thing for this market.
But right now, yeah, Nvidiahas an incredible hold on this.
It's reflected in theirshare price right now.
Yeah.
Um, a lot of it is becausethey own the hardware,

(24:33):
the software stack, and alot of the interconnects.
for example, we, weuse InfiniBand, which
is a like ultra high.
Bandwidth interconnect.
That company that developed atMelanox is also owned by Nvidia.
So they like really have thestack top to bottom here.

Yeah, they, they definitely got in early.
'cause I, I remember playingaround with LLM tools, uh,
years ago and like Nvidia wasthe only option in the market.
There really wasn't anythingelse to, to look at.
Yeah.
And I remember Cuda inparticular, that was around
the time that it was reallytaken off and, yep, yep.

Yeah.
I actually should notethough, these are.
It's been very, verydifficult for the chip
manufacturers to even, like,get the right chips, though.
another example, despiteNVIDIA's dominance here, their
upcoming chip, this H100, iskind of widely known, well,
within this specific subset ofthe industry here, that it has,

(25:26):
it doesn't have enough memorybandwidth relative to how much
compute they added into it.
So it's very, it'sgetting increasingly
difficult to utilize this.
But the reason that happenedis because they didn't
know about how large themodels were going to get.
They didn't know, it wasvery difficult to predict
this on the scale of likechip development cycles.

(25:47):
So, right, ChatGPT isless than a year old.
And one year insemiconductor manufacturing
design is nothing, so.
Yeah, it's verydifficult for anybody to
predict, uh, to do this.

so I wanna, I wanna transition a little
bit into talking about howyou approach scaling, uh, the
engineering function at Openthe Eye as this was going on.
So, you know, the rapid success,no secret at this point.
your, your leadership hasbeen very open about sharing
the challenges of this,this rapid sudden virality.
And, you know, it's one thingto deal with sustained growth

(26:18):
over a long period, but whenyou, when you deal with, like,
this sudden surge, it's, it'san entirely different beast.
Cause, I mean, not only do youhave to deal with potentially
much higher peaks, but youdon't know how much of that
is going to stick aroundfor the long term, right?
So, you know, walk me throughhow that played out for your
engineering organization.

Yeah.
so for instance, staying nimblehas been a huge piece of this.
One thing that is like.
very much helped in by designis doing everything we can to
try and treat everything likea tiny early stage startup.
this originally wastrue in terms of raw,
like, head count here.
Yes, there are a huge numberof people that contributed

(26:58):
to the research and the modeltraining, but at the end of
the day, the kind of productengineering, design, like,
parts that is applied wasonly several dozen people
when the chat GPT launched.
So it was still likea much smaller group.
Even still, we intentionally setup ChatGPT as a more vertically
integrated sort of separateproduct team within Applied.

(27:23):
If you think of Appliedand the API as this three
year old startup, ChatGPTlooks, feels and acts like
a ten month old startup.
And concretely that was in theform of we intentionally started
on a separate repo, separateclusters, different controls,
taking on a little bit of that.
kind of tech debt andduplication at the start to
like really like optimizefor iteration there.

(27:47):
Whereas gradually the APIstarted to optimize a little
bit more for like stability andSLAs and stuff like that too.
Now this is of course changing,like ChatGPT also has like huge
stability and SLA concerns.
We are kind of workingto build out more broad
platform teams as well.

(28:09):
Fast iterating was important.
The other kind of keypiece about this was
having the research teamsdeeply embedded here.
So, while I talk about Applied,because that's the group
I'm in, in reality, ChatGPTeffort heavily involved a
huge chunk of researchersfrom various research teams.
They were the ones who wereactually constantly tweaking
and updating and fine tuning.

(28:31):
The models based on end userfeedback as well here too.
So keeping these as veryvertically integrated, like
teams with both productengineering design and research
was also super important.

Yeah, so what would you say is like
the biggest improvement thathas come out of this from
an engineering perspective,from you know, just dealing
with all of this scale?

the biggest improvement that's come
out, actually our abilityto work together as like
a single research team.
Product group, there was anearly fear that it would be
like the worst case scenariofor us would be the type of
place where research trainsa model, throws it over
the wall, go productize it.

(29:13):
And it was likethis one way street.
And we spent a lot of effortmaking sure that that was not
how we developed anything.
But, you know, that wasall like abstract until
you like are actually inthe trenches developing.
And like really workingon a product here.
and of course the realityis about like, just like it,
it's very messy to begin with.
You just kind of like haveto tweak it as it goes along.

(29:35):
But now that there's a muchstronger focus around sort of
these clear products we havewith this clear API product
that we need to build, wehave this clear consumer
app that we're focusing for.
I think like thathas helped a lot.
Really integrate the researchand product and engineering and
design and kind of this one uh,

push.
And, and I imagine thatprobably gives your
engineers an opportunity tolearn more about how this
stuff is created, right?

Yes.
it has definitelybeen necessary.
So it has been the caseactually that everybody in
applied and engineering didnot need to have a machine
learning background to do this.
I do not have a PhDin machine learning.
but that's, that's fine for now.
Uh, certainly the, the interestand the, to pick up and
learn a lot of things alongthe way is important, but.

(30:21):
Yeah, a lot of our, most of ourproblems are product problems.
They're engineering problems.
They're distributedsystems problems.
They're kind of classic likethat, but at the same time, it
has been really important foreverybody to at least get a
reasonable understanding of howall these models fit together.
Because a lot of theengineering considerations
are deeply tied with the waythese are structured, the

(30:45):
hardware that we're using,the way things are deployed.
Those all definitely matter.

Yeah, this might be a tough question to
answer, but if you could goback in time to Evan a year ago,
and tell him, hey, your productis going to go viral someday,
are there any changes that youwould have made in your approach
to respond, or to anticipate

that?
you know, maybe not,because I, I'm inherently
a bit of a skeptic when itcomes to things, like, I
would not have believed...
My thing would go viral.
Also because I believein not prematurely
optimizing the system.
We have this deeply iterativemodel baked into here.

(31:31):
So I actually would have beenafraid that we would have spent
this huge amount of time makingsure the infrastructure was
load tested up the wazoo, makingsure the product was perfect
before it even got there.

So you're saying just swing for the
fences and deal with whathappens after the fact.

Now, I should note, though, that iterating
a product quickly does not,especially here, it's very
important that that doesnot compromise the kind
of safety mission that wehave to begin with as well.

(32:10):
Like safety, our safe, notbeing happy enough with our
safety systems, with the redteaming results that we're
getting, that is the primarything that will delay launches.
That is the non-negotiablebefore we can ship something.
Yeah.
so that, that is the placethat we can, should, and
would like spend evenmore time iterating on.

(32:31):
But here again, a lot of thephilosophy around this is
that, we kind of see the.
Safety systems are the redteaming layers coming in layers.
Like we have a very active,network of experienced
red teamers who willgo in and test stuff.
We have very controlled rolloutsthrough various stages to try
and like catch everything.

(32:51):
But it's still the case thatthere's no way to catch all
of it until you actuallyget it out into the world.

Absolutely

not, yeah.
Having a lot of people thinkingand working and doing this is
a big part of what it takes tomake these things move forward.

So, I got one I want to talk about and
it kind of brings us back fullcircle to how this conversation
started and that's APIs.
So, you know, we mentionedthat, the part of generative
API that really excites me isseeing it pop up in all the
tools that I use everyday.
you know, in, at LinearB,you know, we've trained
ChatGPT to use multipleaspects of our platform.

(33:35):
So we can do things like askit questions, it can write
configuration files for usthat are highly specialized.
And, uh, you know, that's, like,to me, that's, like, the real,
like, fascinating use cases.
but, you know, obviously forthat to happen, there's got
to be a really strong APIfor developers to, to build
functionality on top of it.
And, you know, I think since weboth worked at API companies,
we understand that, you know,it really comes down to making

(33:58):
sure the API is performant,and that it's built for
real world use cases ratherthan theoretical situations.
what role generally speaking,do you think APIs or, or the
APIs for your products aregonna play in the success
of Open AI in ChatGPT?

Yeah.
The, there is absolutely noway, despite how I think we
have a very talented team.
There is no way our one teamcan out-innovate the vast,
some of all of the creativityand startup energy and like
company focuses that it has onmaking this work right now, as
I mentioned earlier, like the.
Ability to have this, tohave AI deeply integrated

(34:36):
with everything you're doingsomewhat seamlessly and
transparently, that's wherea lot of real power is.
One of the best places for us tosort of try out new ideas first.

That's actually a great point, because
I wanted to ask, like what kindof feedback are you getting
from developers in the field?

That's one of the most important pieces about it.
Consumer apps, it's verydifficult to get feedback
on unless you start doingreally aggregate stuff.
But yeah, if you sit downand you like talk with the
developers building on the API,we really learn where the actual
problems are in the system.
The gap between some acool demo and something

(35:20):
that's useful is massive.
And that is true here.
That is true in every industry.
It is especially true here,esp, like the more hype that
there is, the further that gapbecomes and that only, that
gap can only be closed whenyou're like really working with
companies and developers likefiguring out the hard way and
learning all the quirks of it.
It is definitely the casethat the API ecosystem has
discovered far more quirksof the models than our

(35:43):
own research teams have.

so what, what are some of the engineering
challenges that have beenunique to the API versus the,
the more general purpose tools?

we have lots of similar challenges to other
API companies, you know, likedatabase connection limits,
networking shenanigans.
the GPU constraints though,I would say those are,
those are quite different.
We've needed to jumpimmediately to tons of
clusters all over the world.
That was mostly donebecause we were chasing

(36:12):
GPUs wherever we could have.
So we are like a, wesuddenly found ourselves
multi cluster, multi region.
On the flip side though, we'vealso spent a fairly large
amount of effort making itsuch that there really aren't
that many special unique thingsabout Our deployment stack.
we've been using stockAzure Kubernetes service,

(36:34):
we use a lot, we use likeDatadog, we use a lot of just
like off the shelf tools.
I think that's actually beenreally important, that's
really helped our developmentteam stay small, it's meant
that new hires come in, kindof know what they're doing.
the more we can do to tryand treat things as a Just
another service that takestext and spits it out the
other side, you know, a cubeservice is a cube service.

(36:55):
People know howto deal with that.
It's a, at leastit's a known unknown.
I'm trying to, like, minimizethe unknown unknowns here.
but, yeah, at the same time,the scaling characteristics
of this are very strange.
I was mentioning that when,yeah, talking about these, like,
utilization metrics were hard tofigure out, initially there too.
And the scaling challenges ofthis, I think, are also going

(37:17):
to get pretty, pretty nuts.
Like, we...
the, the, the ambitions forthe scale of capacity that we
need to ramp up to, it's bothusage going up, the models
getting bigger, the modelshaving all these different
types of modalities forthem, yeah, that's all just
getting started right now.

(37:38):
yeah, like we found ourselvessuddenly When Dali came out,
we're like, oh, we're suddenlynot dealing with text anymore.
Now it's all, everythingthat is image processing and
image generation, now we'rein audio as well with all
speech in and speech out.
Um, so, so yeah, the, the,it's become a lot more complex.

Awesome.
They're all very fascinating.
I'm really happy we got to,to learn about all this from,
from inside the organization.
And I have a couple of justquick questions, because
they're things that I thinkour audience is really
going to be interested in.
The first is, what is themost interesting or useful
adaptation you've seen sofar of one of your products?

So, I think this is still emerging yet, but like,
there's a lot of work of peopletrying to build longer form
agents right now on the system.
That's been a huge focus ofa lot of startup activity.
As I mentioned, a lot of thelike, law applications I think
are very, very, very, verylike, have a huge potential

(38:38):
to feel like that industryshould feel very different.

I would love to have a lawyer in
my pocket, just saying.

The other one of course is like the
education side of things.
We have a real effort to figureout how to use these tools.
To me it is kind of analogousto, I don't know, math
classes had to do somethingdifferent when my calculator
showed up everywhere.
but we will, figuring that outand sort of like, getting in a

(39:07):
world where people start to usethese as tools and can like,
help them be just dramaticallymore impactful at things,
I think that's a huge deal.
I actually really likedSpotify's feature they
launched recently, whichwas using, text to speech.
So they release these podcasts.
In other languages,with the voice of the

(39:29):
original podcasters.
So yes, you can like listento Lex Freeman in Spanish,
but it is clearly his voice.
And if you think about howmuch work or money it used
to take to dub something,or to hire somebody to do
that, like, that's insane.
the other partnership I thinkis really cool is Be My Eyes.
They've been using the GPT 4with vision capabilities to

(39:51):
help visually impaired people.
Take a photo of your closet.
Like, what, what shouldI be wearing today?
yeah, that's, that's nuts.
You can, you gonna have a devicein your pocket that can deeply
meaningfully and semanticallydescribe what you're looking at.
And if you're a visuallyimpaired person, that's,
yeah, that's obviously,that's massives a huge deal.

awesome.
Uh, well it is been reallygreat learning, uh, the
inside perspective fromyou on OpenAI and, and all
the engineering work thatyou're doing over there.
Uh, if people wanna learnmore about you or the work
that you're doing, where'sthe best place to send them?

yeah, so I'm, I am E zero M on Twitter, and
actually, I, I actually pointincluding a lot of our new hires
to open AI's blog, which bothis a mix of like the product
and research releases too, butkind of that whole arc gives
a pretty good view of the.
Kind of the state of whatwe're doing, but also state of
the kind of the industry too.

Awesome.
And I know personally, I'vebeen silently watching your
LinkedIn too and seeing a lotof your posts about all the
cool stuff that's happening.
So, so yeah.
Well, it was really greathaving you here today.
I'm glad we got a chanceto catch up and talk
about what you're doing.
So thanks for showing up.

Likewise.
Thank you.

All Episodes

Scaling ChatGPT: Inside OpenAI's Rapid Growth and Technical Challenges | Evan Morikawa

Episode Transcript

Popular Podcasts

Stuff You Should Know

24/7 News: The Latest

The Joe Rogan Experience

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}Scaling ChatGPT: Inside OpenAI's Rapid Growth and Technical Challenges | Evan Morikawa