Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Hannah Clayton-Langton (00:04):
Hello
world and welcome to the Tech
Overflow podcast.
I'm Hannah Clayton Langton, andafter several years as a
non-engineer working in techcompanies, I decided that I
wanted to understand more aboutwhat was going on under the
hood.
Hugh Williams (00:15):
Yeah, and my name
is Hugh Williams.
I'm a former Google vicepresident, also a vice president
at eBay, and a pretty seniorengineer at Microsoft.
And my job here in the podcastis to help demystify tech for
the smart listeners out there,and I guess also for you.
Hannah Clayton-Langton (00:27):
Exactly.
So we're the podcast thatexplains technical concepts to
smart people.
Hugh Williams (00:31):
So we're here in
uh person, finally, again,
Hannah.
Hannah Clayton-Langton (00:34):
Yeah, so
Hugh has made it to London.
He's looking pretty good forthe jet lag.
And we're recording anin-person episode today, which
is something we love to do butdon't get to do enough.
Hugh Williams (00:42):
Yeah, and we're
gonna get a chance again on uh
Friday as well.
So two in one week.
It's gonna be pretty exciting.
Hannah Clayton-Langton (00:47):
It's
awesome, and I'm especially
excited because today I foundout is Ada Lovelace Day.
So those who listened to ourfirst episode on computer
engineering will know that AdaLovelace was the first computer
programmer, or often known asthe first computer programmer,
despite being around in the1800s.
And today is a day celebratingand commemorating all of the
(01:09):
women in STEM and all of theirachievements today and all the
amazing things I'm sure thatthey will bring to the industry
in future.
Hugh Williams (01:15):
Fabulous.
And we're going to dig into adeep computer science topic
today.
So I think it's uh quiteappropriate given it's Ada
Lovelace Day.
So large language models, parttwo of our uh AI series, Hannah.
Hannah Clayton-Langton (01:26):
Large
language models, AI.
I think before we go into thedetail of LLMs, as I'll be
referring to them, let's justremember what we took away from
our last episode.
Hugh Williams (01:37):
So we we spoke a
lot about the field of
artificial intelligence beingaround since the 1950s, very,
very broad field, but I think wespent most of the episode last
time talking about machinelearning.
Hannah Clayton-Langton (01:45):
Yeah,
and one thing that I think is
super important to lay out isthat when we talk about AI, most
people are actually talkingabout large language models or
LLMs, uhKA ChatGPT, which is oneof the consumer products.
There's a few others likeClaude out there.
I took away last week thatartificial intelligence, much
broader than that.
Plenty of applications thatwill be in use in all of the
(02:08):
technology that we useday-to-day.
And when we're talking aboutChatGPT or adjacent products,
that's when we're really justtalking about LLMs.
Hugh Williams (02:16):
Yeah, exactly.
And LLMs, Hannah, as you know,are part of machine learning.
So we've got artificialintelligence, that big broad
field, machine learning's partof that.
Large language models are atype of machine learning.
So that's our topic for today.
Hannah Clayton-Langton (02:29):
Awesome.
And another point that I findparticularly neat here when we
talk about AI and LLMs is thatLLMs are the first time that AI
isn't like hidden away underlayers of computer.
And it's basically exposed witha user interface, of course,
directly to consumers and folksto use as they see fit, which is
a bit of a revolution.
We talked about like the iPhonebeing sort of an adjacent
(02:51):
revolution in our last episode.
Hugh Williams (02:53):
Yeah, I think you
know the smartphone is really
uh the first time everybody hada computer in their pocket.
And I think these largelanguage models are a bit the
same as the first time consumershave had access to AI because,
yeah, you're right, Hannah.
You know, AI reallypre-the-large language models
was something that largecorporations use to achieve
tasks.
So whether it's you know doingfraud detection on your credit
card, whether it's ranking atGoogle, uh recommending the
(03:15):
clothes you should buy, whateverit is, sending you emails.
Um, those things were AIsystems built by large
corporations to do one specifictask.
And this is the first time thatconsumers have had access to AI
in their pocket.
And also it's a very differentkind of AI, right?
Because it's a it's ageneralized AI.
So this is something that cancarry out lots of tasks,
probably even tasks that thepeople who built it didn't
(03:36):
design it for.
Hannah Clayton-Langton (03:37):
And that
generalized point is
interesting because, and I thinkit's a good segue into some of
the technical detail, becausemost folks are using Chat GPT or
equivalents as sort of areplacement for Google.
But one thing I took away fromsome of our prep for this
episode was that the way thatthis technology works is
fundamentally different to theway Google or another search
engine works.
(03:57):
So maybe we start with that asour segue into the technical
stuff.
Hugh Williams (04:00):
Yeah, spot on.
I think that's a really goodpoint, Hannah, because if you're
using something like Google,what it's doing is it's
processing the query that yougive it, and then it's
retrieving documents that mightbe great answers for your query.
So you're actually getting backdocuments that have already
been created and organized in anindex.
So it's a little like Google'staking you around the library
and showing you the books.
These LLMs are completelydifferent.
(04:20):
So the answers that you'regetting to the questions that
you're asking are synthesizedtext that's been created on the
fly by the LLM.
So this isn't text that exists,it's much more like having a
conversation with a smartanalyst or a smart associate or
a smart intern, and that quoteunquote person that you're
talking to, if you like, isactually creating the text in
response to your query.
So that's a very, verydifferent experience to Google.
Hannah Clayton-Langton (04:40):
And
something else I've picked up as
sort of a key differentiator inour previous episode and some
of our prep work is the scale ofeffort that's required to sort
of create one of these things.
Which again, as a as a layperson just downloading the cool
new app that everyone's talkingabout if that's ChatGPT, I
don't think you necessarily arenaturally giving like credit to
(05:01):
what it takes to build one ofthese things.
Can you sort of scale that forthe listeners?
Hugh Williams (05:06):
Yeah, look, I
mean, we talked about sort of,
you know, quote unquote oldschool AI last time.
I mean, we were really talkingabout AI models that are trained
using a few computers perhapsin a few hours or a couple of
days, and they probably takemillions of examples to come up
with the model that detectscredit card fraud or or ranks in
the search engine.
The scale of LLMs is just soincredibly, incredibly
(05:29):
different.
I mean, I'll use the word tokena little bit later on, but you
can just think words for now.
These LLMs are trained onhundreds of billions or maybe
even trillions of words to beable to generate the text that
they generate.
I've heard estimates that saythat OpenAI's latest GPT-4, so
the Chat GPT that you're usingtoday was trained on about 13
(05:53):
trillion tokens, about ninetrillion words, which is a lot
of zeros.
It's a bit like reading everysingle article, book, and post
on the internet many times over.
I did a little bit of a back ofthe envelope calculation,
actually, Hannah, as I wascoming in.
I think it's a little bit likereading one book every second
for about 15,000 years.
Hannah Clayton-Langton (06:15):
Okay, so
it's trained on quite a lot of
information.
Hugh Williams (06:18):
Yeah, and that
training takes a very, very long
time, weeks or months, andcosts an enormous amount of
money.
So to create one of thesemodels that we're that we're
using all day long is probably,you know, 50 to 100 million
dollars gets spent on usinginfrastructure and electricity
just to create this model thatcan generate this text.
(06:38):
So this is a vastly differentscale to the AI that we we were
talking about last time.
Hannah Clayton-Langton (06:43):
Okay, so
huge amount of cost involved,
huge amount of effort, hugeamount of humans involved,
right?
In terms of building some ofthis.
Hugh Williams (06:52):
Yeah, absolutely.
And you wouldn't want to get itwrong, right?
Like you'd if you're gonnainvest that amount of money,
you'd want to be pretty surethat you're gonna get the
outcome that you want to getthrough the training process,
because this is as much asmaking the next James Bond
movie, uh, investing in one ofthese uh in one of these new
models.
Hannah Clayton-Langton (07:06):
While
we're on the topic of the
training process, is that deeplearning, if I remember
correctly from the last episode?
That's really what we mean whenwe talk about deep learning.
Hugh Williams (07:15):
Yeah, that's
that's really the heart, if you
like, of large language models.
It's about feeding in trillionsof words or tokens, and then
it's about discovering patternswithin that text.
And we call those parameters.
Uh, and these systems havebillions of parameters.
So these systems they'rethey're called neural networks.
When the neural network haslots and lots of layers, we
(07:37):
refer to it as being quite deep,and that's where this idea of
uh deep learning comes from.
So it's been around a while,but really what it's about is
taking this vast amount of data,so these trillions of words or
tokens and learning in a verysophisticated way the patterns
that occur within that data andrepresenting those patterns in a
(07:57):
in a model.
Hannah Clayton-Langton (07:58):
Okay, so
one follow-up question and then
one observation.
Is all of that data that youmentioned going in, is that when
we talk about unstructuredinput?
Hugh Williams (08:07):
Yeah, yeah, I
think that's a that's a good way
to think about it.
Hannah, if we think about theAI we talked about in our last
episode, I'd say it's a lot morestructured.
So you remember talking aboutthings like free shipping and
shipping cost at eBay as beingone possible parameter that
could go into our model.
With these large languagemodels, we're really just giving
it unstructured text.
So we're giving it all of thetext that we could possibly
(08:29):
find.
So all of the World Wide Web,all the books we can find, the
source code, whatever it is thatwe can find, we're giving that
all to it, and we're asking thedeep learning system to actually
discover the patterns all byitself.
So the data's quiteunstructured, and the system
goes about discovering thepatterns within the data.
So we're not telling it whatthe parameters are, we're
letting it discover thoseparameters itself.
Hannah Clayton-Langton (08:48):
Well,
and this is the this was my
observation.
I'd be interested to know yourengineering take on it.
So I've taken away that thedeep learning like consumes a
huge amount of information andthen it basically figures out on
its own what the patterns are.
Hugh Williams (09:02):
That's exactly
right.
Hannah Clayton-Langton (09:03):
And I've
heard a few engineers, I think
yourself included, describe itas a black box and be like super
excited about the magic or theblack box, which I agree is
really exciting, but I find itkind of counterintuitive because
every engineer I've ever metlike won't sleep until they
understand exactly howeverything works in a super
intricate way.
And then when it comes to LLMs,they're sort of just like, oh,
this is amazing.
(09:23):
We have no idea what's goingon, which like I find to be very
un-engineery.
Hugh Williams (09:28):
Yeah.
And if I go back to my time atMicrosoft, we used to have all
these diagnostic tools for uhfor Bing.
So if we had a query and we'rewondering why the results that
we were seeing showed up, wecould actually go and look at
this tool, and this tool wouldsort of diagnose for us what
were the likely signals thatwere causing a particular result
to show up.
So often we'd get an executivewho email us and say, I ran my
query, which was my name, verycommon.
(09:48):
We've all searched for ourname, let's be honest.
And uh, you know, this thisbogus result turned up as the
first result, what's going on?
We used to have thesediagnostic tools that we could
use to understand what ourranker was doing, and then we'd
explain back, oh, it's really,really sensible, and here's what
actually happened.
But today, with these largelanguage models, that's pretty
much impossible to do.
In fact, somebody said to methe other day, how does it
(10:09):
summarize a document?
And I said, Well, nobody knows,really.
So when you say, please shortenthis document or summarize this
document or turn this documentinto bullet points, it's just
seen enough of examples of thatin the vast amount of text that
it's seen that it's able tocarry out that task, right?
So it's seen examples of a longdocument shortened to a shorter
(10:32):
document, it's it's seen anexample of uh an essay turned
into PowerPoint slides, whateverit is, it's seen enough
examples of that in thetrillions of words that it's
seen that it's able to do that.
So you give it a simpleinstruction like summarize or
shorten, and it can take thefollowing content and know what
to do with it.
So it's a little bit likehaving an intern, right?
Like if you if you sent theintern an instruction, you said,
(10:53):
look, can you please summarizethis document for me?
After they've done that acouple of times and you've given
them a little bit of feedback,you can give them a third
document and they'll do a prettygood job, right?
And so that's exactly what'sgoing on with this large
language model, is it's justable to do it, and nobody's
really able to explain exactlyhow or why.
Hannah Clayton-Langton (11:09):
I think
there's an important distinction
which maybe we'll talk aboutlater, which is it understands
patterns, but it doesn'tnecessarily apply comprehension.
Yeah, exactly.
We'll talk about some of thefallibilities of it maybe
further down in the episode, butit's just super good at
identifying patterns, and youfeed it enough source and
training data, you get somethingthat feels like you're talking
(11:30):
to a human, but actually it'sjust good pattern identification
fed with a bunch of data thatit can sort of preempt the
answer that you might want.
Hugh Williams (11:36):
Yeah, exactly.
I mean, I think in in techcircles we say like it's a
really advanced stochasticparrot, which is another way of
saying it's a statisticalparrot.
You know, it's like the old uhmonkeys with typewriters.
You know, if you have enoughmonkeys with typewriters and
they hit the keys, it eventuallyend up with the works of
Shakespeare.
I mean, these are highly tunedstatistical monkeys that are
capable of pressing the rightkeys at the right time and
churning out output that seemsto make sense, but it's just
(11:59):
it's just generating data basedoff the patterns that it's seen.
Hannah Clayton-Langton (12:02):
And how
are we making sure that the
monkey that types outShakespeare is the one that
we're listening to?
Like, how are we feeding backto this model to check that it's
identifying the right patterns?
Hugh Williams (12:14):
Yeah, great
question.
So we'll we'll come back andtalk about transformers in a
second because that's animportant piece of technology
that sits within the field ofdeep learning that's made all of
this possible.
So we'll have to come back andtalk about that, Hannah.
But once this training processis done, so we've done our deep
learning with this transformertechnology, what actually
happens inside these largecompanies is that the system is
trained one more time and it'strained using human feedback.
(12:36):
And we spoke a little bit aboutthat in our last episode, but
but let's imagine you you'reworking in one of these large
companies, you're at OpenAI orAnthropic or Microsoft, wherever
you are, and uh you finishedthis training process, took
weeks, months, cost an enormousamount of money.
What you'll then do is you'llhave a series of questions that
you'll ask the model, and themodel will churn out multiple
(12:58):
answers to those queries.
You'll give those answers tohuman judges, and you'll ask the
human judges lots of differentquestions.
You know, one question might berank these from the best answer
to the worst answer.
Another question might be, youknow, identify any safety issues
that you see in any of theanswers.
You might say, which style doyou prefer?
So you can ask humans all sortsof questions about the output,
(13:18):
and then you can take andcollate all of that output that
comes from the humans and usethat to adjust the model if you
like.
So the model gets a little bitof feedback about what are the
preferred things that it shouldbe doing, and that adjusts the
weights within the model, andthe model becomes more polite,
more friendly, safer, all thosekinds of things and starts to
produce answers that that humanslike a lot more.
(13:39):
So you can't just deploy thesemodels, you actually have to,
you know, train them with humansa little bit afterwards.
Hannah Clayton-Langton (13:45):
Okay,
and that essentially becomes
another pattern recognition thatfeeds the smarts inside of it.
Hugh Williams (13:51):
Yeah, exactly.
So these systems without thatpiece of human feedback at the
end are really just giantpattern generating machines.
So they'll just generate text.
But with this additional stepof providing human feedback,
they learn how to, you know,adapt to human style, human
preferences, they get morechatty, they get safer, they get
more reliable in answering thequestions.
So there's an enormous humaneffort that goes on at the end
(14:12):
to kind of adjust the weightswithin the model to get them to
do the things that they dotoday.
So that's really what happensin the consumer product, if you
like.
So when you're using somethinglike ChatGPT, you're not just
using the original model, you'reusing something that's been
adjusted and put throughscenarios that make it a lot
more human-like.
Hannah Clayton-Langton (14:28):
That is
not something that I was
expecting to form part of thiswhole process.
So I think that's a really coolinsight.
And I imagine that that will benews to a lot of the listeners
as well, even those who areusing ChatGPT quite a lot.
Hugh Williams (14:40):
So the other
thing I'd say too, while we're
on that topic, is there's a lotof uh what we call heuristics on
top of these systems, which isbasically like human-written
rules that stop the system orcause the system to behave in a
certain way, right?
So you can't today successfullyask these systems to do
something illegal.
So you can't say, hey, teach mehow to make a bomb.
(15:00):
Um you can't ask them to, youknow, for self-harm information,
you know, things that areillegal within your
jurisdiction, whatever it is,it'll just say, sorry, I can't
do that.
And most of that is done withhandwritten rules.
So there's handwritten ruleslooking for certain keywords and
certain patterns, and when thathappens, the question that
you're asking is is interceptedand you get a standard canned
answer back.
So there's an enormous numberof rules sitting on top of these
(15:22):
systems, in addition to thehuman feedback that teaches it
to be more human-like.
So there's a lot of sort ofhuman effort that goes into
getting one of these systemsfrom sort of being in its wild
state, if you like, into being aconsumer product that we can
put in your hands.
Hannah Clayton-Langton (15:35):
Okay,
that is super interesting.
And I feel like I jumped usahead.
So we were talking abouttraining the model, which
happens using deep learning.
Uh, and then we talked aboutTransformers, which I I know
from my prep is what the T inchat GPT stands for.
Do you want to walk us throughthat now?
Hugh Williams (15:52):
So there was a
landmark paper that came out of
Google in 2017, which is calledAttention is All You Need.
And I'm sure some of ourlisteners will have heard of the
paper.
It's one of the most famouspapers in modern computer
science.
And it's really a landmarkpaper because it made this idea
of deep learning architecturallysomething that could be used to
(16:13):
process trillions of words andidentify billions of parameters
and build things like ChatGPT,Claude, Gemini, and and so on.
So deep learning itself was avery important field, but it was
really used for things likenatural language processing and
image generation.
So it was sort of sitting in acorner of computer science doing
really interesting things, butit was never able to be scaled
(16:34):
to web scale to build a productlike this.
But this transformer idea blewthat up and made it possible so
that we could we can build theproducts that we use today.
Hannah Clayton-Langton (16:43):
Yeah,
okay.
So just to play back so far,transformers are one of the key
aspects that underpin theapplication of deep learning to
the scale that is required foran LLM.
Hugh Williams (16:54):
Yeah, you've got
it.
Hannah Clayton-Langton (16:55):
Okay,
and the deep learning existed in
other, possibly smallerapplications, sort of deep
inside smart bits of technology,but it had never been scaled to
the point of costing as much asa blockbuster movie to train.
Hugh Williams (17:08):
Yeah, exactly.
And so for people like me, youknow, I was working on search at
eBay and whatever else.
Deep learning was just a anintellectual curiosity that was
sitting in research, really, youknow, was doing things like
identifying objects in imagesusing lots of processing, but it
was not an idea that wascapable of being used at the
scale that we were working.
But with this transformerbreakthrough, that all that all
massively changed.
Hannah Clayton-Langton (17:29):
And
before we get into the detail of
transformers, I that's a bit ofa pattern that I've seen when
we talk about differenttechnical concepts, is a lot of
them start as like intellectualtheories that people get excited
about, and then slowly thethinking develops, and then you
might find use cases that uhinterface with like modern life.
Is that fair?
Hugh Williams (17:45):
Yeah, I think
that's I think that's absolutely
right.
You know, suddenly computerscan be miniaturized enough that
you can build a smartphone andput it in your pocket.
You know, at some pointcomputers were just something
abstract concept that sat ingiant rooms in defense
installations and whatever else.
So you need these kinds ofbreakthroughs for these kinds of
things to happen for sure.
Hannah Clayton-Langton (18:00):
Okay,
now tell me more about
transformers.
Hugh Williams (18:03):
Okay, I'll do my
best, Hannah.
A couple of things abouttransformers.
So the first thing is that atransformer can understand the
relationships between the tokensor words that it is processing
in a way that's pretty cool.
So I'll make up a really simplesentence, you know, the cat ran
around the room chasing themouse, and then she sat on the
(18:24):
mat.
If you just had regular deeplearning, there's a couple of
problems with regular deeplearning.
So the first is it can onlyprocess a word at a time through
our sentence.
That's annoying for us computerscientists.
We want to do things inparallel, so we want to be able
to throw lots of computers atthe problem and process all the
words at once.
We don't want to be going leftto right, that takes forever.
So deep learning had thatproblem.
(18:45):
Transformers completely changedthat and allowed many computers
to be thrown at the problem,and each computer could process
a word.
And so suddenly we couldprocess all the words in that
sentence at the same time, notjust one word at a time
throughout the sentence.
But the the really bigbreakthrough in Transformers,
besides this sort of speeding upaspect, was that the
(19:06):
transformer could consider therelationships between the words.
So in old deep learning, by thetime we got to um she sat on
the mat, we'd have forgotten whoshe was.
But with transformers, we'reable to understand which words
in the sentence influence whichother words.
And so by the time we get toshe, we can say, oh, she means
the cat.
And so it's like having ahighlighter if you like.
You can go back through all ofthe text, vast amount of text,
(19:28):
and you can highlight all of thewords that are related to a
current concept that you'retrying to process.
And then that allows you, whenyou're actually generating the
text later on, to generate moreplausible text because you
understand the context of eachword.
Those words don't just exist inisolation.
Hannah Clayton-Langton (19:45):
And what
does that mean in real terms
for a chat GPT user or an LLMuser?
Is that what makes it soundlike human and credible?
Hugh Williams (19:54):
Yeah, so it's
improved enormously the accuracy
of the text generation becauseit's able to understand context
much better.
So it's not just likeautocomplete that we're used to
on our phone or maybe in Googleor when we're using Microsoft
Word.
So it's not just generating thenext most probable word given
the last word that you've typed.
It's able to use all of thecontext of the essay that you're
(20:15):
writing to generate the nextword.
And so the chances of itgenerating a really plausible
word go up enormously becauseit's able to understand all of
the relationships between thewords.
Hannah Clayton-Langton (20:24):
And for
the avoidance of doubt, this is
not because the computer canunderstand anything, it's
because it's using an enormousamount of information gone
before to recognize patterns andtherefore have a better guess
at what word you'd want to comenext.
Hugh Williams (20:39):
Yeah, that's it.
I mean, what these systems aredoing is they're trying to
generate text.
And given the set of words it'sgenerated so far, it's trying
to generate the next mostprobable word.
And if it's got enough contextand an understanding of the
relationships between the words,it can do a much better job of
generating the next word.
So it's a it's a very, veryadvanced autocomplete, if you
like, that understands all ofthe context of hundreds,
(21:00):
thousands, perhaps tens ofthousands of words when it
generates the next word ratherthan a simple autocomplete,
which is really just looking atthe last word or two.
Hannah Clayton-Langton (21:07):
Okay,
two questions then.
Is the magic of an LLMbasically the power and smarts
of the deep learning combinedwith the user relevance of the
transformer's output?
So like it takes somethingthat's super smart at patterns
(21:27):
and it combines it withsomething that can speak to
people in a way that feelsrelevant.
Is that sort of deep learningplus transformer equals LLM?
Hugh Williams (21:37):
I think that's
right.
Hannah, the other thing I'dthrow in there is there's a vast
amount of data that went intotraining this.
So it's it's really big data,plus, as you say, deep learning,
plus this transformertechnology, and then what we
talked about earlier on, whichis the heuristics and rules and
the feedback that teaches themodel how to be a little bit
more human-like.
So there's really a fewcomponents that go into it, but
(21:57):
all of that put together givesyou the consumer products like
ChatGPT and Claude that are inour pockets today.
Hannah Clayton-Langton (22:02):
And
what's kind of reassuring for me
there is that it feels like thecinch or the thing that really
enables these models is actuallyhuman input.
So like the heuristics and thehuman feedback, which I didn't
know of, but it makes me feel alittle bit less like computers
are fully going to take over theworld.
Hugh Williams (22:21):
Yeah, absolutely.
You know, the human in the looppiece is really, really
important because we have toteach it how to be more
human-like, not just generatetext based on all the text that
it's seen.
But we should also worry aboutthe bias that comes with that.
So we're not just getting thebias that comes from the text
that goes into it, we're nowalso getting the bias from the
humans who are giving itfeedback and teaching it to be
more human.
Hannah Clayton-Langton (22:41):
Okay,
and potentially basic question,
but the humans that are givingit feedback, are they like the
software engineers that arebuilding it, or are they a
random, or maybe not random, aconscious cross-section of
people like basically off thestreet who you ask to review the
models' outputs?
Hugh Williams (22:59):
Yeah, absolutely
the latter, Hannah.
I can't imagine the softwareengineers wanting to do the
former, you know, if they'relike.
Hannah Clayton-Langton (23:03):
Well,
there would be some biases
definitely in there if theywere.
Hugh Williams (23:06):
They'd be like,
can we just get this over and
done with?
I want to get back to writingcode.
But no, no, it's uh it's ahired workforce, typically, you
know, um paid per task or perhour.
And the trick here is you'vegot to write down the task
really, really carefully thatyou want them to do.
You've got to train thosepeople so they can perform the
task, and then you've got toprovide tools so that they can
actually provide their input,collate that and take that back
(23:26):
to the system and managing thequality of these people, you
know, the workforce itself,paying it, you know, it's a big,
big project in itself.
But this is happening at a anindustrial scale, huh?
Hannah Clayton-Langton (23:36):
Huge
operation.
And then the people who writethe heuristics, are they like
the product managers?
Hugh Williams (23:41):
I I would say,
um, I'm guessing a little bit,
but there'll certainly beproduct managers who are
thinking about the why and whatof the heuristics.
But uh the heuristics areprobably written, you know, in a
in a language by you knowpeople who are trained in
writing heuristics.
Because you know, you want tohave a safety team looking at
safety issues who understand thesafety issues, and they're
probably trained to use thesetools to write the heuristics
(24:02):
and maintain the heuristics andstay on top of the kinds of
issues that are coming up aslaws move, as people try and
hack these systems, as newmodels come out with new issues.
So you'd probably have aspecialist safety team that's
writing heuristics that arerelated to safety, for example.
Hannah Clayton-Langton (24:16):
That
sounds like a super interesting
job.
Okay, and the T in chat GPTstands for Transformer, is that
right?
Hugh Williams (24:22):
That's right.
Hannah Clayton-Langton (24:22):
Okay,
what about the G and the P?
Hugh Williams (24:24):
Um so G is
generative, which basically
means it generates text.
Um and the P is pre-trained,which basically means that a
training process happens.
So GPT, though, it's worthmentioning, is a trademarked
proprietary term for open AI,but you can just think of that
as meaning large languagemodels.
So the only folks who are usingGPT as a term are the folks at
OpenAI.
Everybody else just says largelanguage model.
Hannah Clayton-Langton (24:45):
And all
large language models are
generative predictivetransformers.
Hugh Williams (24:50):
Yeah, yeah.
I think uh OpenAI would saythat.
Generative pre-trainedtransformers.
Hannah Clayton-Langton (24:54):
What did
I say?
Predictive.
Generative pre-trainedtransformers.
Hugh Williams (24:57):
Yeah, it's a
common mistake, actually.
People say predictive.
Hannah Clayton-Langton (24:59):
So the
an LLM is like the generic
version of that, and then likeChatGPT, I presume, will purport
to have loads of really coolproprietary smarts that makes
those unique, just as a car isgeneric and then like Mercedes
has a brand.
Hugh Williams (25:13):
Yeah, that's
right.
Hannah Clayton-Langton (25:13):
Okay, so
we've covered the training
behind the model, and we'vecovered how it's creating like
human like output.
But what about as a user whenI'm asking it to like plan my
holiday to Florence?
What's going on there?
Hugh Williams (25:26):
Holiday to
Florence sounds good.
Hannah Clayton-Langton (25:28):
Yeah, I
should book one.
I better have one boat.
Hugh Williams (25:30):
You probably need
a holiday.
Hannah Clayton-Langton (25:31):
Yeah.
Hugh Williams (25:31):
Yeah, yeah, yeah.
One thing I did want to mentionalong the way is sort of how
this training happens using thistransformer technology.
So we should talk about maybeGPUs and data centers for a
second.
So it costs 50 to 100 milliondollars to do this training
process.
One of the big costs in thereis the hardware itself that's
used in the training.
So I'm assuming you've heard ofGPUs.
Hannah Clayton-Langton (25:52):
So GPUs
are what NVIDIA produces, is
that right?
Hugh Williams (25:56):
That's right.
Hannah Clayton-Langton (25:56):
Okay.
Yeah.
So tell me why they're sovaluable because I hear a lot
about NVIDIA, but I am not superaware of what's so special.
Hugh Williams (26:03):
I feel like
NVIDIA is one of the companies
that's really hit the jackpot.
It's sort of owned the goldmining technology when suddenly
gold became popular orsomething.
But uh the GPUs are graphicalprocessing units, so or graphics
processing units.
And a graphics processing unit,you know, traditionally was a
card that you put into ahigh-end personal computer under
your desk when you had anapplication where you needed
(26:24):
high-powered graphics.
So you're a gamer, you want toplay games, and you know, you
need you know high-speedgraphics uh on your desktop
computer, you'd put a GPU cardin there, a graphics card in
there.
Also, people, architects, thosekinds of people used to use
GPUs.
Yeah, CAD and those kinds ofthings.
Hannah Clayton-Langton (26:38):
So And
is that because you just so I
can check my understanding, isthat because there's like a good
amount of computing powerthat's required to create
high-quality images versus liketext or something?
Hugh Williams (26:48):
Yeah, graphics is
a very unique thing where
there's lots of things done inparallel, lots of things done at
the same time, and it requiresvery specific kinds of maths to
do things like rotate shapes,have shapes move in front of
each other, those kinds ofthings.
Graphics requires, you know,vector arithmetic.
So that's really doing thingsthat help shapes move in front
of another shape, spin shapes inreal time.
(27:08):
So things that, you know, it'sfast sort of matrix maths, and
that was you know really, reallyimportant in graphics.
It turns out, though, thatmaths is incredibly important in
the transformer technology.
Hannah Clayton-Langton (27:20):
So they
did you the same type of math?
Hugh Williams (27:21):
Yep.
Okay.
Yep.
So to do the transformercomputation at scale, you want
parallelisation and you actuallyhave to parallelize this vector
maths.
And it turns out GPUs areperfect for that.
So these very expensivegraphics cards turn out to be
the gold mining shovels, if youlike, for training large
language models.
Hannah Clayton-Langton (27:39):
Okay,
and just a bit of a throwback to
our product episode.
Slack, like the the programthat people use for
communications at work, didn'tthat also come out of a gaming
company?
Hugh Williams (27:49):
Yeah, yeah.
I think uh the legendary storyis that uh those folks were uh
gaming company and then theythey built a tool on the side to
allow themselves to communicatebetter within their company,
and then uh the gaming side ofthe business didn't go so well,
they realized that the chat toolwas uh was pretty cool, and the
rest is uh the rest is history,I suppose.
Hannah Clayton-Langton (28:06):
So
there's some pretty valuable
accidental offshoots coming outof gaming companies.
Hugh Williams (28:09):
Yeah, absolutely,
absolutely.
And you know, gaming is um it'spretty hard maths, it's sort of
the the essence of it, and GPUsare really the the tool that's
used for that hard maths, and itturns out that that maths is
also pretty useful in largelanguage models.
Hannah Clayton-Langton (28:20):
So you
have a bunch of GPUs which cost
like 40 grand or something.
Hugh Williams (28:24):
Yeah, of that
order.
Hannah Clayton-Langton (28:25):
In USD.
Hugh Williams (28:26):
Yeah, absolutely.
So, you know, 20 to 40,000bucks per GPU card, and uh, you
know, these data centers areabsolutely full of them for this
training process.
Hannah Clayton-Langton (28:34):
Just for
the training process.
Hugh Williams (28:36):
Yeah, they're
using the the evaluation as
well.
So when you want to buy yourholiday to Florence or whatever
it is that you're asking about,or help me cook a recipe or
whatever those kinds of things,the GPUs are used in that as
well, but you need vastly moreinfrastructure for the training
than you do for the inference,which is the the what we'd call
the the question-asking piece ofthis.
Hannah Clayton-Langton (28:53):
Okay,
because there is a whole topic
of conversation around like thecompute power required by LLMs
and like the environmental costof it and the financial cost of
it, but the main consumption ofthat compute happens in the
training phase.
Hugh Williams (29:07):
Correct.
Hannah Clayton-Langton (29:07):
Okay.
Hugh Williams (29:08):
So training takes
weeks or months, probably costs
50 to 100 million dollars todo.
When you type a question, like,you know, help me plan my
itinerary to Florence, thatprobably costs a very small
fraction of a cent.
Hannah Clayton-Langton (29:21):
Okay,
because sometimes I feel guilty
asking Chat GPT stuff, but itsounds like that's not where I
should be feeling guilty.
Hugh Williams (29:27):
No, though I was
seeing a little bit of uh bit of
maths coming into the episodeabout how much you know saying
hello and thank you costs whenpeople do that all the time.
Because people feel like theyhave to be polite to this.
Hannah Clayton-Langton (29:36):
I had
heard this and it generates like
a massive environmental likeimpact, people just not wanting
to be rude to a computer.
Hugh Williams (29:42):
Well, it doesn't
actually now, but I'll do the
back of the envelope maths andthen we can talk about why it
doesn't now.
But I kind of figured out thatif there was a hundred million
people a day saying please andthank you, that that would work
out to be a few million dollarsa year in compute cost for any
of these companies.
So it's not, you know, it's nottrivial.
But what I've also figured outis that the The companies are
now intercepting those queriesin some way and uh just
(30:03):
generating pre-canned responses.
And they can do that in one oftwo ways, or maybe even three.
So one is you could just do itin the app.
And we talked about apps in oneof our episodes.
You could just say, you know,Hannah typed in thank you, and
you could just respond with theapp and say no problem at all,
or give a thumbs up sign withoutever actually sending it to an
LLM.
Hannah Clayton-Langton (30:20):
That
sounds sensible.
Hugh Williams (30:21):
Yeah.
Second thing you could do isyou could just do that in the
cloud.
So you could you could do thatsame thing as soon as the
request arrives, you couldinspect it for some common words
and just turn it around andsend it straight back.
The third thing you could do isyou could have a different
model that's a cheaper model, ifyou like.
Okay.
Um, that's capable of doingreally trivial tasks, and you
only send the hard tasks to theto the larger model.
So there's lots of differentways of intercepting those kinds
(30:42):
of trivial tasks.
Hannah Clayton-Langton (30:43):
But
regardless, you're not like
calling out to the huge model tocome up with a response to
something that's basically likean afterthought and
inconsequential.
Hugh Williams (30:52):
Yeah, exactly.
We should make it a little bitlike uh Marvin the Paranoid
Android for those people who'veuh watched Hitchhiker's Guide to
the Galaxy.
Hannah Clayton-Langton (30:57):
Well,
that's over my head, but I was
gonna ask you, is that anexample of heuristics then or
not?
Hugh Williams (31:01):
Yeah, it's a
perfect example of a heuristic.
Hannah Clayton-Langton (31:03):
Okay,
there we go.
Yeah, yeah, love it.
Okay.
Hugh Williams (31:05):
Hey, this series
is working.
Hannah Clayton-Langton (31:07):
So um
that makes LLM sound pretty
awesome, which we know they are,but there's also a bunch of
stuff that they're not so goodat.
Hugh Williams (31:14):
Yeah, so they
hallucinate, I guess, is
probably their biggest problem.
You know, hallucinates, Iguess, a term that it's got in
the public consciousness.
You know, what it really meansis it with great confidence
makes up things that aren'ttrue.
Hannah Clayton-Langton (31:26):
I know
some people that do that.
Yeah.
Yeah, yeah, yeah.
Humans that do that.
But yeah, so is that becauseit's found a pattern or it
thinks it's found a pattern andit's generating an answer that's
based on pattern recognitionbut not comprehension of what
it's saying?
Hugh Williams (31:42):
Yeah, exactly.
So it has no ability to go andfact-check the things that it's
producing.
It's just producing things thatare highly probable given the
data that it's seen.
So when it confidently saysthat Winston Churchill invented
the internet, you know, it'sjust doing that because that
seemed like a plausible patternthat should be generated.
Hannah Clayton-Langton (31:58):
And
could you get an LLM to
fact-check itself, or does thatget a bit meta?
Hugh Williams (32:04):
I think that's
getting a bit meta, but I I
could see that happening in thefuture.
You know, we're probably nottoo far away from having some
other technology fact-check theLLM.
But one common thing that uhour listeners can try if you've
got access to multiple LLMs isto take the output of one LLM
and give it to another LLM andask it to fact-check it.
So that's a way of using theLLMs to keep an eye on the LLMs,
(32:25):
and often you can uh get rid ofthe falseness that that comes
out of the LLMs by doing that.
Hannah Clayton-Langton (32:29):
And they
do all say, like somewhere in
the user interface, you mustcheck the output of these
models.
Like it's not necessarily facein fact.
Hugh Williams (32:40):
Yeah, absolutely.
In fact, Deloitte is one of thebig consulting companies that
I'm sure all of our listenershave heard of, are in a lot of
trouble in Australia right nowbecause uh for $440,000 they
authored a report, and anAustralian academic read the
report and uh figured out thatlarge slabs of it was generated
with AI, including a bunch ofcitations that were made up.
So there were citations in theback that didn't actually exist
(33:03):
that just seemed like plausiblesounding uh citations, and same
with some of the footnotes thatwere created.
So uh it's all over the news.
Uh certainly a big story inAustralia right now.
But certainly, you know, ifyou're using these tools in a
professional environment or evena non-professional environment,
you should take great care tomake sure that the output is
actually true if that'simportant to you.
Hannah Clayton-Langton (33:20):
That is
super interesting, and uh, this
is more of a tease for a futureepisode.
I was talking to some engineersat work about how they use LLMs
to write code and whether ornot it saved them time.
And they were like, at thispoint, it saves us time writing
code, but it generates more workreviewing code.
I guess that's maybe why peopleoften liken it to like an
(33:41):
intern or like a grad, becausethey can do a bunch of the work
for you, but it you can't takeit as red and you've got to
spend some time checking thatit's correct and good and
factual.
Hugh Williams (33:52):
Yeah, that's
right.
I've been coding with uh clawedcode quite a lot at home.
I've certainly learned to pauseevery hour or so and really
holistically look through thecode and try and understand, you
know, what I've created andwhere it can be cleaned up and
you know, carry out a lot ofmanual intervention.
So I'm certainly saving time inwriting the code, but I think
your engineers are right.
There's more time now in, youknow, inspecting the output,
(34:13):
which I suppose is a little bitlike generating text, writing an
essay, an email, whatever elseit is, it's saving you an
enormous amount of typing, butyou've still got to go and
review it pretty carefully.
And you're probably spendingmore time reviewing it than you
would review your own text.
Hannah Clayton-Langton (34:25):
Okay, so
they're not good, or they they
have this sort of propensity tohallucinate.
Yep.
What else are they limited in,LLMs?
Hugh Williams (34:32):
They're not great
at math, which I guess is is
not surprising.
Hannah Clayton-Langton (34:35):
See, to
me as a non-engineer, that is
surprising because I thoughtcomputers were really good at
math.
So tell me why LLMs aren't goodat math.
Hugh Williams (34:42):
So maybe let's go
back to why computers are good
at math.
So if we if we go back to ourfirst episode, which I hope most
of our listeners have had thechance to listen to, you know,
we're talking about programmingand really writing deterministic
kinds of steps and logicallybreaking things down.
And so if we're writing codelike that in Python or whatever
programming language we'rechoosing, you know, that's when
the computer is going to begreat at maths because we can
write down sort of logical stepsand ask it to do particular
(35:04):
mathematical things.
Hannah Clayton-Langton (35:05):
Ah, I
see.
And then if an LLM isrecognizing patterns, it's
taking an inference, and thatinference could be wrong.
Hugh Williams (35:13):
Yeah, absolutely.
So when you say to the LLM,multiply these two numbers
together, what it's it's notreally multiplying those two
numbers together, it's producingthe text that's most probable
given you've given it those twonumbers with a multiplication
sign in the middle.
So it can be off, wildly offsometimes.
But what's happening now, um,and OpenAI have built this into
ChatGPT, is it's got a mathsmode and it kind of can now
(35:36):
detect when you're trying to dothings like that.
So if you try and multiply twolarge numbers together, their
consumer product will say, Hey,I think this user's trying to
multiply numbers together, andit'll actually go and run some
Python.
So instead of using the LLM,it'll actually go off and
execute a maths module, and thatmaths module will do the maths
for you and give you back theanswer.
So it's not actually the LLMdoing it, they're just
intercepting the query andsending it to a different
(35:56):
module.
Hannah Clayton-Langton (35:57):
Okay,
and I've also noticed that it's
not always great at imagegeneration, or at least I use a
few different LLMs, and some ofthem are better than others, but
like why is that?
Hugh Williams (36:06):
So image
generation is a slightly
different problem, but it usesthe same technology.
So the conversational LLMs thatwe're talking to, that's that's
one big model that's trainedoff tokens or words, and we've
we've spoken a lot about this inthe episode.
There's also a separate modelthat's built for image
generation.
Same kinds of principles.
You feed in enough images,treats those as pixels, tries
(36:28):
and finds patterns in them, andwhen you ask it to generate an
image, it's able to take thepatterns from all of the images
that it's seen and generate aplausible image.
But it's a you know,technically a separate system to
the system that generates thetext.
And it's a field that'sevolving really quickly.
I'm sure that most of our usershave tried it with ChatGPT.
They probably tried it a yearago, and it would generate all
sorts of weird and wonderfulstuff.
(36:48):
It certainly couldn't generatetext very well.
Today, it's a lot better, it'sable to generate text a lot more
accurately.
So it's definitely a field thatthey're investing in, and it's
you know it's getting better andbetter uh every week.
Hannah Clayton-Langton (36:59):
Well,
our logo was a was it ChatGPT
used?
Hugh Williams (37:02):
It was Hannah.
Yeah, yeah.
I'm I'm not an artist who couldproduce a logo, so I took uh
took your headshot photo, uh, myheadshot photo, and told it to
produce a logo that uh in it inthe style of Tintin.
Hannah Clayton-Langton (37:14):
Tintin.
I can see it now.
I never knew that.
I never knew that.
Okay, and actually, here's alittle peek into how this
podcast works.
We leverage LLMs a lot for thispodcast.
Hugh Williams (37:25):
Oh, we certainly
do.
Hannah Clayton-Langton (37:26):
And it's
kind of meta because I was like
reviewing something that an LLMhad made for us based on like a
some pre-work we'd done, and itwas saying don't always trust
the outputs of LLMs becausethey're not always right.
And I was thinking, oh my god,the LLM is telling me.
Hugh Williams (37:39):
So you don't have
to trust the LLM for the LLM
episode.
Yeah, yeah, yeah.
Hannah Clayton-Langton (37:43):
All
right, so that's probably a good
place to stop on LLMs.
Hugh Williams (37:46):
Yeah, I reckon
we've covered a lot today,
Hannah.
It was a really, really funchat.
I'm glad we got a chance to dothis one in person.
Hannah Clayton-Langton (37:52):
Yeah,
this would have been hard at
like 7 a.m.
and 7 p.m.
virtually.
So definitely a good one tocover in person.
And I actually feel like I didAda Lovelace proud.
Hugh Williams (38:02):
Absolutely.
I think we uh we certainly gota long way into a pretty tough
topic, but I hope very much thatour listeners got something out
of it and then they can go andexplain LLMs to uh friends and
family.
Hannah Clayton-Langton (38:13):
Yeah,
exactly.
Like super technical, but sorelevant that I think it's worth
going on the journey tounderstand it in a little bit
more detail.
I'm sure there's plenty more wecould be going after in that
space.
Hugh Williams (38:22):
There's a ton
more we could do.
We should do an episode on uhAI and coding.
We could talk about ethics, youknow, the morals of AI, what's
going to happen next,personalised AI.
There's a there's a ton wecould do maybe later in series
one or even in series two if weget to a series two.
Hannah Clayton-Langton (38:35):
Yeah,
okay, let's do that.
So, listeners, if you like whatyou've heard today, you can
like and subscribe, and ofcourse, leave a review wherever
you get your podcasts.
Hugh Williams (38:44):
Yeah, that would
be great.
And then if you want to learnmore about the show, you can
head to techoverflowpodcast.com.
And I'm busy posting onLinkedIn, Hannah.
I think I might be I might bebeating you on X and Instagram.
Hannah Clayton-Langton (38:58):
But we
are on there too.
I need to think about how I canleverage an LLM to do some of
that work for me.
Hugh Williams (39:02):
Yeah, absolutely.
We'll build an agent to dothat.
That's another topic we cantalk about agents.
Agentic AI, yeah, we definitelydefinitely need to do that
again.
Hannah Clayton-Langton (39:08):
Okay,
definitely a third episode
coming.
But for today, this has beenthe Tech Overflow podcast.
I'm Hannah.
Hugh Williams (39:14):
And I'm Hugh.
Hannah Clayton-Langton (39:14):
Thanks
for listening, and we'll see you
next time.
Yeah, bye.
Bye.