Paige Bailey: Google Deepmind, LLMs, Power of ML to improve code | Learning from Machine Learning #5 - Learning from Machine Learning

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
(00:00):
When I had entered into my graduate program,

(00:03):
I was told that it was a waste of time
to do like computer science and programming,
that I would never be respected if all I did
was like build tools or build libraries
or write code to solve science problems.

(00:24):
How did the best machine learning practitioners
get involved in the field?
What challenges have they faced?
What has helped them flourish?
Let's ask them.
Welcome to Learning from Machine Learning.
I'm your host, Seth Levine.
Welcome to Learning from Machine Learning.
On this episode, we have a very special guest,

(00:46):
Paige Bailey, the lead product manager
for generative models at Google DeepMind.
And at such an exciting time, this week was Google I.O.
where they got to announce new advancements for Google Bard,
talk about Gemini, and how they're incorporating
many aspects of generative models

(01:06):
into all of their products, really.
Paige, welcome to the show.
Awesome.
Thank you for having me.
It's such a pleasure to have you.
You have such an interesting background.
You want to just give a little bit of background,
introduce yourself, how you got interested in computers
and machine learning?
Sure, that sounds good.

(01:27):
So my background, I did geophysics and applied math
when I was in school.
That was kind of the focus of my career.
I think when I was younger, I wanted
to be like some sort of lady, Carl Sagan,
like focused on planetary science
and sort of being able to explain
these complicated technical topics to the world.

(01:51):
I got into computers very early.
I grew up in quite a small town.
And my family kind of rescued an apple, too,
from being thrown away.
And that was how I first learned how to program.
And then really, I think this time in particular

(02:12):
is kind of what I've been waiting my entire career for.
I've been doing machine learning for a bit over a decade.
And previously, you would have to go through all of this pain
to get the data, to build a model,
to try to choose an algorithm, and then
to do the really hard work of trying to get
those models into production.

(02:33):
And even then, those models were just kind of single task
models.
And today, we have these highly capable, very general purpose
models that are doing an overwhelming number of things.
And really, we keep discovering new ways
that they're useful based on the way that people are sort

(02:53):
of testing them out and stress testing their boundaries.
So it's been really exhilarating, honestly.
I would never have imagined that this
would happen back when I started doing machine learning in 2009,
2010.
So back in 2009, 2010, what were the types of machine learning

(03:14):
problems that you were working on?
Oh, lord.
So back then, I was fortunate enough to do,
I got some NSF kind of grants to do planetary sciences research
both at the Laboratory for Atmospheric and Space Physics
in Boulder, as well as at Southwest Research Institute.

(03:36):
And most of the machine learning there
was just kind of fancy statistics, right?
So you were doing linear regression,
logistic regression.
You might be doing decision trees
or support vector machines.
But really, it was just taking this very tabular data
and attempting to make sense out of it.

(03:58):
And it was useful, very, very useful,
and capable of driving many interesting scientific advances,
but nothing like what we have today.
And also using sometimes very, I would say, very niche tools.

(04:20):
So there was, in the space sciences,
there's something called IDL.
There's also, I remember, many, many sleepless nights
attempting to wrangle MATLAB into doing
what I needed it to do, whereas Python always
seemed to make much more sense.
So just, again, like a sort of an explosion

(04:45):
and a revolution in terms of the kinds of models
that we can build now and then also the tools that are
available to help build those models.
Yeah.
I love MATLAB, by the way.
Oh, wow.
Yeah.
A lot of my undergrad work, I use MATLAB.
And one of my first big machine learning projects,
I inherited the project.
And a lot of it was in MATLAB.

(05:05):
So I started out doing a lot of work there.
I think it's great.
People don't talk about it enough.
Python has kind of eaten it up like a lot of other things.
But I love MATLAB.
Yeah.
One of my professors always used to say
that it doesn't really matter what tool you're using.
It doesn't matter if it's Python, a spreadsheet, MATLAB,

(05:27):
SPSS data, or R, or whatever it happens to be.
The important thing is that you're
asking an interesting question.
And you're being thoughtful about analyzing data.
And the tool is just kind of something
that gets you to the answer.
It shouldn't be the answer itself.
Yeah.
Yeah, definitely.

(05:47):
That's for sure.
Yeah, I remember I did some very interesting work with SPSS
as well.
But yeah, it's not the tool, really.
It's understanding the problem, understanding that,
do you even have the right data to start
approaching this problem?
So yeah, so I guess at face value,

(06:10):
people might not understand.
So you're in geophysics.
How did you become the lead product for generative models
at Google DeepMind?
But I mean, it makes sense.
There's so much dealing with so much data.
And I guess a lot of the lessons that you
learn with how to handle all of that kind of data apply.

(06:32):
But if you want to speak to that.
Absolutely.
So one of the cool things about geophysics
is that geophysicists have been using GPUs for a long time.
And they've also been kind of the flavor of Earth scientists
that are more likely to be delighted by computers

(06:57):
as opposed to running away from computers to the mountains.
So seismic data is very massive, as well as well-logged data,
kind of the interpretations of subsurface data.
And all of this Earth sciences data

(07:20):
really needs heavy kind of computational horsepower
in order to analyze it.
So even before deep learning was a thing,
Earth scientists were already building models
and sort of attempting to analyze these patterns
in the subsurface and in seismic using GPUs.
And that's something that I don't know if many people know.

(07:43):
But I actually had started experimenting with CUDA,
not just for deep learning, but because of Earth sciences
problems and quantitative hydrogeology,
and also understanding fluid dynamics problems.
Yeah, that's fascinating.

(08:04):
I guess having that experience with CUDA
and dealing with that type of data,
understanding how to process it set you up probably pretty
well for your role with TensorFlow.
I know that you played a pivotal role
in the development of that.

(08:25):
Can you speak to some of your work with TensorFlow?
I don't know if it was a pivotal role
in the development of TensorFlow,
but I certainly was delighted.
So TensorFlow was, I've been experimenting
with kind of open source machine learning tools for a long time.
I think I mentioned before, Python always
seemed to make more sense than MATLAB.

(08:49):
And there's kind of a really nice Python data science
ecosystem, SciPy, NumPy, Matplotlib, Scikit-learn,
like all of those great things.
But then TensorFlow came out and was open sourced in late 2015.
And it was kind of revolutionary in a few ways.

(09:11):
It had some documentation.
It had some examples that you could use.
I think all of those deep dream images
went viral over the internet.
But it was also in Python.
And even though it was a really kind of janky, weird sort
of Python that you had to construct graphs to use it,
it was still something that felt a little bit more approachable

(09:34):
than maybe some of the other deep learning frameworks that
were implemented in C++ or Lua or whatever they might have been.
So I got very excited about it, started
learning how to use it, trying it for my projects.
My first deep learning project was

(09:56):
applied to the Earth sciences, so understanding
how to categorize different shapes of its holes and reefs
and just being overwhelmed at how something would have taken
a poor grad student six months to do.
And then suddenly, this was able to blast through it

(10:16):
in just five minutes with a really modest amount
of training data.
But I started contributing.
Eventually, the TensorFlow team kind of took notice.
And I got to join them and to work at Google Brain.
And in addition to getting to work with the TensorFlow team,

(10:40):
the JAX team, which is also an open source numerical library
for doing deep learning in addition to many other things,
especially in the sciences, I got to work with them quite closely.
And then all of our machine learning frameworks
teams at Alphabet.
So like I said, it's very exhilarating.
And it's been really interesting to see

(11:03):
how the space has evolved over the course of the last many years.
Yeah.
So fast forwarding to today and some
of your more recent projects with all the things that
are happening with these large language models,
can you speak to the work that you've

(11:24):
been doing with applying large language models
to different software development tasks?
Absolutely.
So I guess this is also a nice segue.
So I'm boomerang back to Alphabet.
During the pandemic, I spent a little bit over a year
at Microsoft, specifically GitHub,

(11:45):
helping with introducing machine learning features into VS code,
GPUs in code spaces, and then also, of course, copilot.
And so I think the potential for generative models
in the software development space is huge.

(12:05):
Historically, single task models for things
like doc string generation or single task
models for code generation or for code completions,
single task models for build repair
or for helping resolve or identify errors.
And now we're seeing models do all of these things
and even more with just a singular model.

(12:29):
One of the coolest things about the announcements
that we had at I.O. this past week
is that they're all using kind of our latest large language
model, the technical paper is out.
I encourage everyone to read it.
We also have a website at g.co. slash Palm2.
But this model, based on kind of the way

(12:53):
that it was trained in it and the input data,
it's capable of doing a broad variety of software development
tasks.
And it's supporting code generation, code explanation,
error explanation, error fixing across so many alphabet
products.
So both are tools within Google Cloud.

(13:16):
The product is called Duet, as well as
all of those features within Google Colab, which is,
if folks aren't familiar, it's kind of a data science
notebook environment that's ready to use
and that you can kind of have handy in your Google Drive
instance, as well as Android Studio, as well as code,

(13:37):
features, and BARD.
But it's all powered by the same model.
And it's been really energizing to see
how people have been testing it out and using it.
And then also the features that we have coming down the pipes,
so things like self-healing code, and then also

(13:57):
things like tool use.
So being able to access a Python interpreter with code.
Yeah, so I'm familiar with a lot of those tools.
I think I was an early adopter for Google Colab.
Yeah, I've loved it for so long, the ability
to have free access to GPUs.

(14:18):
In the past, it was a little bit longer access.
Now it's a little bit less, but that's OK.
Just being able to, any practitioner getting access
to a GPU is just like, yeah, it just
changes your iteration speed.
And you can kind of work so much faster.
For the other things, I don't know why,

(14:39):
but I was more of a late adopter for Copilot.
I got it within the last month, it's embarrassing to say.
But that's when I finally got around to trying it.
I don't know, I was hesitant.
I just thought, oh, maybe it's going
to introduce bugs or something like that,
or maybe it was my pride or something.

(15:00):
I wanted to just continue to be coding on my own.
But I started to use ChatGPT, I started
to use Bard to help me with certain things.
I mean, it's not like I wasn't finding myself
on Stack Overflow like every other programmer looking
at things.
But yeah, it's incredible.

(15:21):
I think that this paradigm shift that's taking place,
I mean, with machine learning in general,
but for generating code, it's like what it used to be
is you used to write a function, and then you
would struggle to write your documentation
or your comments and things.
Now with this new technology to do code generation,

(15:43):
you're writing what you want the code to do,
and then it's making these suggestions
for what the code should be.
And multiple lines also, which is really cool.
And it's not just trying to autocomplete,
it's very smart.
At least it appears to be very smart.
One of the questions that I have is, so yeah,

(16:07):
there's a lot of code out there, just like there's
a lot of information out there.
GitHub is filled with so many libraries,
but not all of it is battle tested.
Not all of it is peer reviewed.
Code goes through code review, so it might not all

(16:29):
be the highest quality.
Or it's from, I mean, the way that things are moving
these days, it's just from a year ago or two years ago,
and it's using a different version of a library that
has a different dependency or whatever the actual specific is.
But I guess, how do you mitigate those sorts of problems?

(16:50):
I know this is like a loaded question,
but how do you mitigate those sorts of problems
where you might be training on data that's
either not the highest quality or out of date?
Yeah, that's a great question.
So we have a collection of fine tuned versions
of our large models for internal use only.

(17:15):
So kind of supporting the software engineers
within Google who are building the software that
powers all of the Alphabet products.
And these fine tuned models are fine tuned on Google 3 internal
source code, which is many, many tokens of super high quality
peer reviewed data that is the equivalent of an L5

(17:40):
suite or more.
And that moves the needle quite a bit
in terms of making sure that the recommendations that we're
giving, the explanations that we're giving are pretty solid.
For the GitHub code, I can certainly attest.
There's a lot of very low quality code on GitHub.

(18:01):
A lot of it doesn't run.
A lot of it is using dated APIs or maybe
have this very low process of somebody just
push committed it to the repo without going
through any evaluation from a peer.
And so I think that if you are building these kinds of models

(18:21):
externally, there's a lot of work
that needs to go into kind of carefully curating and cleaning
the GitHub data sets in order to make it solid for use.
And if people are curious about this topic,
I strongly, strongly recommend taking a look.
There's a recent paper from the Hugging Face and Service Now
team called Star Coder.

(18:43):
And they go through kind of with their big code data set
like all of the things that they needed
to do in order to get a higher quality data set to train.
And it includes things like deduplication.
I think they also were considering preferentially
waiting code that is newer or code that
comes from repos that follow software engineering best

(19:06):
practices as opposed to code that might be from like a Python
101 student.
And all of those kind of careful bits of attention
to the pre-training data set move the needle significantly
in terms of model performance down the line.
But those things, there are also other tricks

(19:28):
that you can do like retrieval techniques
or being able to do these kind of self-healing operations
where you recursively apply the model to the output code
to see if it would actually run and then like fixing anything
that might be wrong or spotting any security vulnerabilities
if there are any in the output code.

(19:49):
But it's certainly like an ongoing field
of research in order to understand
what the best options might be.
Right.
So yeah, so I guess designing, I don't know what to call it.
I guess systems that help software engineers

(20:09):
or machine learning practitioners sort of do their job,
generate code, test code, help you write documentation,
do all these things.
I guess other than dealing with maybe outdated code
are the things that we were just talking about.
Are there any other challenges that you
face when trying to design those systems?

(20:31):
Oh, absolutely.
And I'm sure I'm going to list a few, but there are many more.
And I think people are discovering even more every day.
So one is likelihood of reciting code.

(20:51):
There's something that we've implemented for BARD
where if you generate code and a portion of it
is verbatim identical to something that
might be within a GitHub repo, we point you
towards the GitHub repo and then also tell you
what license it was under.
So whether it was Apache 2 or MIT,
which are very permissive, versus GPL,
which is not permissive at all and is something

(21:13):
that you would probably, you would not certainly
want to use for your business.
I think there are also questions about security vulnerabilities
and performance.
So if you generate code, ideally you
would want it to be efficient.
You wouldn't want it to be something
that would take 10 or 100 times longer, more compute

(21:36):
in order to execute.
And you would also hopefully want
the code that you generate to be consistent in terms
of syntax and conventions with the code
in your existing code bases.
So all of these things are considerations
that folks have to think about when implementing tools

(21:58):
for their own users.
And it's something that we certainly
think very deeply about when orchestrating ML applied
to software systems internally at Alphabet.
Right.
So for Bard, this week you guys announced

(22:19):
that the underlying model, I guess,
was upgraded from POM1 to POM2.
What is it you think about POM2 that makes it
like a better base model?
Is it more data?
Is it whatever you can speak to about it?

(22:40):
That's a great question.
So the model that we upgraded from,
it wasn't actually POMv1.
And POMv1 was a very, very large model
that there's a paper about it so folks can go
read if they're curious.
But it was not ever used in production for Alphabet,

(23:04):
I don't believe.
But the first model for Bard was a version of our Lambda model.
The model that we have upgraded to
is a version of POMv2, which was announced on Wednesday.
And some of the capabilities of it
are that it's much, much better at code, at math, at reasoning.

(23:27):
It's also much better at multilingual tasks.
So POMv2 was trained on over 100 spoken word languages,
dozens and dozens of computer programming languages.
And as a result of that really robust and very diverse
pre-training data mixture, it can
do things like translate from one language to another.

(23:49):
It can explain idioms and riddles.
Even in different languages, it can
translate from one programming language to another.
It can tell you if code might be vulnerable
or if it needs performance fixes.
It can generate code.
It can explain code.

(24:12):
It can write mathematical proofs.
It's just lots and lots of different things.
We're discovering new uses for it every day.
And one of the other most compelling features
about this POMv2 family of models
is that it comes in a broad variety of sizes.
So everything from our smallest version, which

(24:34):
can fit directly on a mobile device,
to more modest sized versions that are still,
despite being an order of magnitude
or more tinier than the largest version,
still preserving all of the capabilities
and doing it just faster, more efficient, and cheaper.

(25:00):
So it's definitely, from a business perspective,
the POMv2 family makes a lot of sense.
Yeah, that makes sense.
Just to dig into the idea, because this
is something that you see across a lot of large language
models, you see Llama that has four maybe different sizes
and obviously POM as well.

(25:24):
So the reason for the different sizes,
the different number of parameters,
is it just to deal with the different trade-offs
of where you're running the model, what your trade-off is
for latency and performance?
Is there anything else that goes into it?
That's a great synopsis.
So the smaller versions of the models

(25:46):
make it easier to serve in a broad variety of locations.
It also makes them much, much quicker at inference.
And then the kinds of capabilities
that we've been seeing from these smaller models that
were open sourced is you can have very, very modest sized
models.
And as long as you fine tune or instruction

(26:07):
tune on very high quality data sets,
you're capable of getting the exact same performance
that you would from a much larger model,
despite it being faster, cheaper, easier to deploy.
And I personally, I think that's the most interesting field
of research right now is trying to take
these highly capable models and make them more accessible

(26:33):
for people all over the world, even if the only device
that they have to work with is a mobile device.
Yeah, that's something that I'm also really interested in.
Training like teacher-student models
and distilling information from these large language models
to get them to be stored into smaller systems.

(26:55):
Because yeah, it really depends on your use case.
Sometimes there's just something really nice
about being able to run it on your laptop.
And a lot, I mean, almost all of these models,
it's just like it's not even possible.
You have to access it through an API
or you have to access it through the interface
that the company offers it.

(27:15):
But yeah, I think that there's something really nice.
Like even, I mean, I'm doing some of my work.
Like I'm trying to get things that are like 400 megs down
to 40 megs, just because it gives you the ability
to like run maybe five or 10 times the amount of things
in a much faster iteration time.

(27:38):
So there's been some like dropping of some ideas
about the new release, the new model
that I believe it's DeepMind with Gemini.
What can you tell us about Gemini?
So many of the people from the Palm B2 team
are on the Gemini project, including myself.

(28:00):
And we're still actively training the model.
It's intended to be Alphabet's most compelling model,
which kind of tracks with, every time we build a model,
we hope that it is a super set of the capabilities
of the models that preceded it.

(28:21):
But Gemini is very special in that it was built
from kind of the ground up to be multimodal.
So we're already seeing multimodal features
in the first versions of the model that we've trained.
And we're anticipating that it will be in production
very quickly, or at least the smaller versions of it

(28:42):
will be in production quite quickly.
But I guess the only thing that I can say specifically
is stay tuned.
We're very excited.
It's the first model that Google DeepMind
has trained kind of jointly together.
And it should be particularly compelling

(29:03):
for not just the text and the code use cases,
but also the multimodal use cases.
Yeah, that's very exciting.
I'm looking forward to it.
In regards to multimodal,
what multimodal use cases are you most excited for?
Yeah, so multimodal, I love this idea

(29:26):
that you can have audio, video, images, text, code,
as inputs, including many of them
kind of being interspersed together.
And then kind of define what your output should be,
either by, you might get some text as output

(29:47):
from your original model,
and then you stack a diffusion model on top
such that you can generate an image back out,
or you can generate audio or video.
But some of the use cases that I'm most excited about
for multimodal models is that you can imagine,
say you're taking a physics course,

(30:07):
and you just can't nail a concept.
Like you just, for whatever reason,
like angular velocity just isn't clicking.
And you could easily just ask, like, hey,
I really, what I would love to have is,
like a video that explains angular velocity to me.

(30:30):
And I really want it to just have like cats.
And then I also want you to have like a quiz
to check my comprehension every like one minute of the video.
I don't want this video to be longer than four minutes.
And then I also want to have like an outline at the end
that explains what the concepts were in the video.

(30:53):
Also with images.
And that's something that a multimodal model
would be able to support.
So just being able to do something like that is huge.
And it also brings about this world of,
super, super tailored custom materials for folks.

(31:16):
Like you could imagine each kid gets
a new bedtime story every night.
That's complete with, you know, a new story,
a new adventure and new images.
It's like the, I'm not sure if you've ever read
The Diamond Age by Neil Stevenson,
but it's like having the primer just kind of like
available for every person.

(31:38):
And that's, it's really energizing.
That's awesome.
Yeah, I think that that's such a cool use case
because, yeah, everyone's unique.
Everyone learns in their own, in their own ways.
People respond to different mediums, you know, differently.
So being able to tailor the learning approach,

(31:59):
that, yeah, that could change,
that can really change the way that we learn,
the way that we interact,
the way that we interact with the different, you know,
material that we're trying to learn.
That's really cool.
Yeah, I saw a tweet the other day
that someone with the, with the,
that someone with the, with the learning disability,

(32:21):
they didn't share what their learning disability was,
but they mentioned that interacting
with these generative models has sort of, you know,
made it that their learning disability
isn't hindering their life as much anymore.
You know, they're able to, you know,

(32:43):
instead of having to attempt to digest all of this content,
even though it's not architected in a way that is,
that's optimal for their learning style,
they're capable of working with generative models
to consume the information in a different way.
And it clicks, and it's the first time in their life

(33:03):
that they had said that it felt like that.
And that's huge, right?
Like being able to, being able to, you know,
unlock the joy of learning for people
who had previously been frustrating.
That's opening up the world to so many more creators
and so many more potential engineers
and folks that, you know, can contribute.

(33:26):
Yeah, absolutely.
That's so rewarding.
It's so nice to hear, you know,
especially now with all of the people that are sort of
bringing up all of these negative
potential future use cases for AI,
but to know that there are all of these use cases
where it can help level the playing field
you know, in some ways and open up opportunities for people.

(33:51):
You know, things are changing, things are changing rapidly.
And machine learning does have this tendency
to perpetuate a lot of the things from the past,
but it's nice to know that there are times
when it can actually increase accessibility.
So that's a nice use case.

(34:12):
One of my favorite stories,
one of my favorite stories from the copilot days
were somebody, usually on GitHub,
when somebody files an issue, it's like not a fun thing.
You know, it's like, man, this is broken,
or like, I'm confused or whatever.
But there was one person
whenever we had first released copilot

(34:32):
who wrote and said, you know, like, I have tremors,
like, and they've gotten so bad recently
that, you know, I was forced to, you know, stop doing my job
and this person had been a software engineer.
And they said, you know, you've introduced this kind of like

(34:53):
speech to code, basically,
generation features within the IDE.
And, you know, it's made it
so that I can actually build software again.
And I never thought that I would be able to do that.
And that is also, you know, not late.
That's the kind of thing that makes you delighted

(35:14):
to come to work every day, I think,
is, you know, the potential of making it
so that people can do the things that they love,
even though they might be physically limited in some way.
Right.
Yeah, knowing that you're having a positive impact
on that person, I mean, who knows?
There's probably many other people that are also, you know,

(35:34):
getting that positive impact.
So yeah, that it's definitely delightful.
It's rewarding to hear, you know,
working in this field.
Thanks for geeking out with me,
talking about large language models and all of that.
But I wanna switch and zoom out
into just the machine learning field in general.

(36:00):
So what do you believe is an important question
that remains unanswered in machine learning?
There's so many, right?
And I think one of the questions that I'm most interested in
is how do kind of the pre-training data mixtures

(36:22):
for large language models,
how do they impact performance on downstream tasks?
And this is an unsolved problem.
There's this notion of like,
oh, well, I wanna do code stuff.
So like, perhaps I should have more code
or I wanna do multilingual stuff.
So perhaps I should have more multilingual data,

(36:44):
but nobody knows how much,
nobody knows how much quality impacts
that performance for data.
Nobody knows like how the data should be structured
or formatted or if it should be included
in a broad variety of ways.
Like if you want to predict edits,
perhaps you should have like code dips, right?

(37:06):
Like instead of just the source code itself.
So all of this experimentation
ends up being pretty expensive, right?
Like, and people end up taking like really expensive risks.
So for Palm V2 as an example,
the input pre-training data mixture

(37:28):
included an awful lot of multilingual tokens,
which meant that the number of English tokens
was much lower.
And there was a risk from the team that like,
perhaps we'll train this model,
this like super expensive,
like millions of dollars a model,
and then we'll get something

(37:48):
and it won't be as good at English tasks
as it is on all of these other things.
Right.
And that was a risk that the team took.
And it ended up paying off because the performance
ended up being better actually
across all of the task ranges.
But that could have been something

(38:11):
that turned out very differently.
And I think for people who are interested
in the large language model space
and the deep learning space,
understanding how the data choices that you make,
and then also the quality of the data that you make
impacts how your model performs

(38:33):
is really, really compelling and important.
And then also how does that,
like the pre-training phase
and the attention that you pay in pre-training,
how is that compared to like downstream fine tuning
and instruction tuning?

(38:53):
Because we're increasingly seeing
that the fine tuning instruction tuning RLHF portions
are much more impactful and move the needle
much, much more than everything else.
Right.
What's your intuition on that?
Why do you think it has such a big effect?
So I think it's because it helps the model focus,

(39:16):
which is not a technical way to describe it at all,
but I think it's intuitively maybe a little bit easier
to understand, right?
Like the model, you initially train it on a lot of data,
like the entire internet, right?
Or a lot of data.
And so it's capable of doing a broad variety of things.

(39:38):
Whenever you instruction tune it or fine tune it
or RLHF it, you help it kind of focus
on the kinds of questions that you're really interested
in answering or the kind of format
that you would really, really like to see.
Like say, your model is giving relatively short outputs
and you want them to be a little bit more long form.

(40:01):
You can kind of tune that with RLHF
such that your model is kind of rewarded
whenever it outputs longer context things
as opposed to shorter things.
Right, so you can kind of tune it
to how you want those outputs to be.

(40:23):
Yeah.
One of the things that you mentioned before,
it made me think about like, well, two things.
I guess how there are these different waves
that have happened, I view in machine learning.
Like I think there was a time a couple of years ago
where it was like, you need to have a fine tune model
on this task, right?

(40:43):
And now with all of the advances in generative models,
it's like, oh, maybe there's one model
that can kind of answer all of these questions.
But there still is this question about,
can we use that generative model to then point you
in the right direction of the best tool

(41:03):
to use to solve your problem?
I guess, what's your take on that
and how have you viewed that sort of transition
between having a generative model
that can kind of solve many problems
versus fine tune models that are for specific tasks?

(41:24):
Yeah.
So we found just anecdotally,
so initially for some of our large models
that perhaps weren't pre-trained on as much source code,
we did have to have fine tuned versions

(41:45):
of those models using source code.
And then for future iterations of the larger models,
we included all of the tokens that we had used
during the fine tuning process
within the pre-training data mixture.
And the resulting model exceeded the performance
of the fine tuned model based on those kinds of inclusions

(42:09):
and then also using a tokenizer
that was a little bit friendlier for code.
So practically we've seen that the generative models
provided you keep adding in high quality pre-training data,
they can sort of absorb the tasks

(42:30):
from the fine tuned models.
But I also will say that for businesses,
whatever solves your problem,
like that's the thing that you should use.
And Lord knows the world is like running on Excel spreadsheets
for much of the finance sector.

(42:53):
And the government is running on COBOL.
And like the cost of migrating off of either of those things
is just overwhelming.
And I don't wanna be the one causing
like the financial downturn of like
telling the finance community
that they shouldn't be using Excel spreadsheets
or basic or whatever it is.
So I would say that whatever modeling meets your needs

(43:21):
is what you should use.
And certainly experiment with generative models,
see if it makes sense.
And then also see if it makes financial sense.
Cause if you can get by with a smaller model
that perhaps is open source and easier to maintain,
then why should you be paying for API calls?

(43:44):
But that's personal opinion.
Like new technology is always going to be very cool
and push boundaries.
But at the end of the day, what matters is your business,
your users and their problems.
Yeah, absolutely.
I agree 100%.
Identifying the problem, the underlying need,

(44:06):
knowing that there's many approaches.
I view that generative models are one tool in your toolset.
You don't necessarily need to use it for everything.
It has a lot of amazing use cases, but yeah.

(44:27):
Certain business practices like you were mentioning are,
I'll say stuck in there,
stuck in there certain ways.
And we're moving towards this new future,
but everyone's not ready for the leap over the canyon.
I don't know.

(44:47):
There's a place, there's a space
and we have to get people over.
And some people will take smaller steps than others.
Yeah, and there's also like some appetite for risk.
I think you have to have
if you're preferring generative models,
just given that we're still exploring factuality

(45:10):
and being able to verify
that models are returning correct responses.
Cause a lot of the time they don't.
Like there are always examples towards like counterfactuals.
But the one thing that I saw on Twitter earlier this week,
cause all of these generative models things

(45:32):
seem to be percolating to Twitter.
But someone instead of like consulting
with a financial analyst, they were like, I have some money
and I wanna invest it in some stocks,
like recommend some stocks that I should buy.
And they got Chad GBT to do that.
And then they purchased the stock.
They kind of mapped it out over time

(45:54):
as to what would have happened
if they followed the portfolio recommendation
versus the buy some stock recommendation.
And the Chad GBT recommended stock purchases
or the generative model recommended stock purchases
actually ended up doing quite, quite well

(46:15):
compared to the kind of investment portfolio recommendations.
But all I could think was just like,
I don't want my 401k to be at the mercy
of a generative model, at least not now.
Like it's, and it might be that perhaps
these investment strategists are recommending certain things

(46:36):
because they know that it's more stable
than perhaps these stocks that might be in the near term,
high performers, but long-term,
like much more variable in terms of return.
So it's like choose your own adventure
and be very careful for betting all of your money

(47:01):
on generative models.
Yeah, I mean, yeah, in terms of financial decisions,
I think, yeah, everyone's sort of like,
how much risk are you willing to take?
And then there should be like another step,
like, are you willing to take the risk
of using a generative model for your financial portfolio?
Maybe some people will say yes, who knows?

(47:22):
Okay, so switching into like
learning from machine learning standpoint
and just sort of general advice for people in the industry.
Well, I'll start with this one.
Who are some people in the machine learning field
that influence you?

(47:43):
So there have been so many,
I think, and I'm fortunate to work
with many of them every day.
So I've been coming from kind of the open source tools,
developer tool space.
I've really loved working with the JAX team.

(48:06):
So JAX is for folks who might not know,
it's an open source library for doing,
kind of building deep learning models,
but also doing kind of scientific experimentation
and all of the models that we build at AlphaBet
are built using JAX.

(48:26):
Matt Johnson, Peter Hawkins, James Bradbury,
like they are all, Sky Wonderman-Milm,
they're all very, very close collaborators,
Roy Frosting as well, Yash Kataria,
and they've been delightful to work with and to learn from.
I've also really, really loved working with Jeff Dean,

(48:54):
as well as the collaborators at DeepMind.
So as part of this Gemini effort,
I've gotten to work more closely with Oriol Benyals,
who's kind of driving much of the efforts.
And then also, of course, the learning for code folks,

(49:14):
who care very deeply about machine learning
as applied to software systems
and think very carefully and thoughtfully about it.
And if you're on Twitter,
I would also highly recommend following
the Hugging Face team,
because they have a lot of passion
around open source machine learning and deep learning

(49:36):
and sharing their knowledge,
as well as Andre Karpathy,
who is a dear, sweet human just generally,
but also cares a lot about education and advocacy
and making sure that these concepts
are understandable to everyday humans.
So there are a lot of people

(50:01):
to admire in this community.
I feel very fortunate.
Yeah, absolutely. Yeah, there's so many great people
in the field and people are pretty generous
with their time also.
And there's this sense of wanting to help other people out.
And it's a really nice community

(50:22):
to be in the machine learning community.
On your website, which is nicely named
Dynamic Webpage, right?
Is that?
Oh, so my website,
I have a tendency to purchase URLs that are puns.

(50:43):
And I think probably most of us in this industry
are like URL hoarders,
but the one that might have,
I think the URL might have even expired,
but page views is the one.
Okay, so under a section that I saw

(51:04):
on one of your websites, it was under page views.
You had some really interesting tenants.
I'm gonna read them if you don't mind.
Yep. Yep.
So bring data to opinion fights,
relentlessly ask questions,
communication is everything, especially in open source,

(51:24):
choose growth opportunities,
nurture and build communities.
If you don't have documentation,
you don't have an MVP,
give without expecting a return
and believe in people, not acronyms.
I love these.
I love them so much.
Thank you.
Are there any that you,
is there anything that you would add to any of them

(51:45):
or anything that you wanna talk about with any of those?
So those, I think I put down on paper
about half a decade ago
and they still seem like pretty good life roles.
I think now, especially the believe in people,

(52:06):
not acronyms, one is going to become even more compelling
and important.
So COVID kind of turned everything on its head, right?
Like I personally, like it helped prioritize things for me.
It was part of the reason why I left Alphabet.
Like I wanted to move back to Texas

(52:26):
to take care of my parents.
They're like 70 plus, 80 plus.
And I think a lot of other people also,
it helped them prioritize
what they should be spending their time on.
So we saw a lot of kids choose not to go to college

(52:49):
and to perhaps enter into the workforce
or to perhaps like start building software
or to build their own businesses, right?
And these kids are incredible.
Like they're doing such amazing things.
Like the most killer use cases for generative models
are all these kids that are like entrepreneurs
and somehow like 16 years old, right?

(53:11):
And so I feel like, and then also
in the deep learning community,
we're increasingly seeing that folks
aren't really publishing papers anymore, right?
Like whereas previously there was this academic ivory tower
of like, oh, you do some research, you produce a paper,
you present to the conference,

(53:32):
and then you add it to like
this extensive scholarly pedigree.
Whereas now it's just kinda like, well, I did some work.
I produced something that people can test out.
I will perhaps write a blog post, but for the most part,
it's just like something cool that's out in the world
that people can like see and touch and experience.
So I think that if you don't have a college degree,

(53:57):
if you are debating whether or not college is right for you,
if you're concerned that like perhaps not having a degree
will limit you in some way,
whether it's an undergrad degree or a graduate degree,
or if you have like a journalism degree and you're like,
but can I do these work?
You can, there are no rules in this space.

(54:17):
Like, you can do anything that you would like to.
And the important part is just kind of focus
on building something great that delights you
and that delights other people.
And that's all that really matters.
That's really good advice for people
starting out in the field.

(54:40):
A little other, another variation of that,
when you were just kind of getting your feet under you,
or I mean, I don't know if that is even the case,
but what advice would you give yourself
when you were starting your career in data science?
So I would say, I would just kind of reinforce,

(55:04):
do what you think is right and what interests you,
and then don't feel, give yourself permission
to do that and don't feel guilty or like you aren't,
like you aren't meeting up to the expectations
of your peers or your academic advisors.

(55:28):
I was told when I had entered into my graduate program,
I was told that it was a waste of time
to do like computer science and programming,
that I would never be respected
if all I did was like build tools or build libraries
or write code to solve science problems.

(55:50):
And I was also told that machine learning
would never have a place in the earth sciences,
which was in hindsight, kind of silly.
But at the time I was just like,
am I making the worst career choice of my entire life?

(56:12):
And I had a lot of internal emotional angst about it
back then, but now it's definitely the right choice.
If you feel like something is important,
sometimes you can see the future,

(56:32):
even if other people are standing,
facing a different direction.
That's great.
Yeah, it's so interesting, right?
When you're in the moment, things can seem so chaotic
and there could be so much uncertainty.
And then, I stop myself,

(56:54):
like the things that you're saying,
they're like looking at things now,
it's like some of that's like laughable.
Like, you look now, I mean, hindsight's 2020, of course,
but it's just, I mean,
machine learning has become so pervasive
and using all of these sorts of techniques and technologies.
But it's amazing how, yeah, in the moment,

(57:17):
it can be so uncertain, but then when you look back,
it can become so clear.
And that was just like eight or nine years ago, by the way,
that people were thinking that machine learning
wouldn't have a place in the earth sciences,
wouldn't have a place in the physical sciences.

(57:37):
All of this is super new.
And so I think that given the trajectory
and kind of the exponential increase
and how these things are progressing,
it's really tricky to predict the future.
And there's, I forget who said it,
but like, and it's kind of cliche, I guess now,

(57:58):
but the best way to predict the future
is to be the one building it.
Like, it is very clear that the future
is going to be built using the generative technologies
and assistive technologies
and productivity enhancing technologies.
So if you're doing that,
you're probably gonna be heading in
towards the right direction.

(58:19):
Absolutely.
So speaking of how fast things are changing
in the last decade, in the last five years,
even in the last year or week,
how do you stay up on all of the newest techniques?
I mean, I know in a way you're part of it,

(58:41):
you're building the future,
but there's so many people that are releasing stuff.
Do you have any techniques
to sort of stay up on top of everything?
It's a really tricky problem.
I would, and again, this is gonna sound silly.
I would be on Twitter

(59:04):
just because the machine learning community
seems to hang out on Twitter
and they talk about things that they find interesting.
It's like getting to sneakily listen in
to all of the hallway conversations
in all of these AI research labs.
And increasingly, we had mentioned it earlier,

(59:24):
but people aren't writing papers anymore.
And so the best insights that you can give
into kind of pedagogy and kind of how these models
were built is by hearing what people
are currently focused on.
Like, is it long context?

(59:45):
Is it this new tokenizer that seems interesting?
Is it some sort of algorithmic advance
that people are really paying attention to?
Is it the data mixtures?
Is it some other model architecture?
But the kind of honing in on where the conversation
seems to be happening is really, really important.

(01:00:09):
And right now, if you're just looking
or attempting to look at every deep learning paper
that's posted on archive, your life's gonna be insane.
Like there was a website a while back,
I think it was created by Andre Carpathy,
the archive sanity website, and it would post the paper
and then have like a really nice kind of images with PDFs.

(01:00:31):
But in the last three or four years,
it's gotten overwhelming to keep up even with that.
So I would strongly encourage you,
pick people that work at the AI research labs
that you care about, Anthropic, Google DeepMind, OpenAI,
and perhaps some of the others,

(01:00:53):
like Hugging Face that I had mentioned,
and follow them, see what they're talking about,
see who else they're following.
And then also don't be afraid to ask questions
if something isn't clear, because a lot of time,
if you ask a question to these people on Twitter,
they will respond, and that's huge, right?
Like being able to have access to some of the folks

(01:01:17):
that are building the future is massive.
Yeah, absolutely.
One thing that I didn't get to touch on,
so yeah, obviously machine learning, AI, generative models
has really made it into the mainstream,
and it's created this like frenzied hype.
I'm wondering from your perspective,

(01:01:42):
how do you view the gap between the hype
and the reality of AI?
Yeah, and this industry in particular,
there are all these hype cycles can be overwhelming.
I personally cannot wait for the AI hype cycle to be overt
so that we can all have a little bit more peace

(01:02:03):
to do the work.
I miss when AI was not cool,
because like, and NeurIPS was a lot more fun than two.
Like all of the academic conferences were much more chill.
Now it feels like there's more VCs than researchers.
But I do think whenever you're trying to distinguish

(01:02:25):
between who is like an AI influencer versus
who is someone who does this for their day job,
like definitely look and see where they work,
look and see what they've built.
And if it's somebody that decided to get excited about AI,
just about the time that web three went downhill,

(01:02:48):
because like somehow a lot of these web three influencers
have turned into like generative AI influencers,
and I don't know how that happened.
But just kind of like look at the backlog,
see what the person has accomplished
and what they've been interested in.
And if they haven't been doing AI for longer

(01:03:08):
than the last year or two,
then I would take what they say with a grain of salt.
Yeah, absolutely.
I think it's really good advice.
That's why it's really nice talking to somebody like you,
who has experience at Microsoft and GitHub and Google

(01:03:29):
and Alphabet and DeepMind and everything like that.
The last juicy meaty question,
what has a career in machine learning taught you about life?
That is a wonderful question.
I think what it has taught me the most
is what to appreciate about being human.

(01:03:52):
Like the more you see what these models can do,
the more you can understand what they're doing.
And it's very exciting and it's very cool.
And it's like discovering new capabilities every single day,
new ways in which I can automate myself

(01:04:14):
and sort of remove the tedious parts of the day
that I like creating meeting transcripts
or emails or drafting a paragraph for a doc or something.
But also seeing the ways in which it falls short.
Like it's not going to be able to give you a hug.

(01:04:42):
It's not going to sort of understand, at least not yet.
It's not going to understand that you might have a bad day
and that it should ask you, how are you doing?
It's not going to do those things

(01:05:03):
that are uniquely nice experiences
of interacting with humans in real life every day.
And I think that's important.
And I think one of the nicest possible outcomes
for generative AI becoming ingrained
in every person's life is that,

(01:05:25):
it helps us appreciate more what it is to be a human
and to interact with humans.
That's really nice.
Yeah, that's beautiful.
So by exploring the capabilities of machine learning and AI,
you can really appreciate the human connection.
Yeah.

(01:05:45):
So for folks that want to learn more about you
or the work that you're doing,
what would be a good resource for them?
Cool.
So I strongly recommend taking a look at both the Google AI
and the DeepMind websites,
though they will probably be merging at some point
in the near future.

(01:06:06):
I'm also chronically available on Twitter.
So twitter.com slash Dynamic Web Page,
and I'm Dynamic Web Page
pretty much everywhere else on the internet.
And then I also strongly, strongly encourage
everyone to just kind of get involved.

(01:06:26):
There are certainly ways to begin learning more
about the AI community without necessarily
having to have a career in it.
And then if you do want to have a career in it,
I think it's going to become increasingly possible
as businesses seem to be adopting AI
at a much quicker rate these days.

(01:06:50):
So don't be afraid to pitch to your boss
if you have an idea and you want to create a prototype.
Absolutely.
Paige, it has been such a pleasure.
I really appreciate your time.
Thank you for letting me pick your brain for this time.
Thank you so much.
No, thank you.
This has been delightful.
I am so glad to have gotten the chance to chat

(01:07:14):
and thank you for the awesome questions.
Really appreciate it.
Thanks so much.
Cool, cool.
Thank you for listening to this episode
of Learning from Machine Learning
with the remarkable Paige Bailey,
the lead product manager for generative models
at Google DeepMind.

(01:07:35):
Her work is pushing the boundaries of innovation
with Bard and the soon to be released Gemini.
Don't miss out on the valuable resources in the show notes.
Please leave a review, share it with your friends
and let's create a community of continuous learning.
Until next time, keep on learning.

All Episodes

Paige Bailey: Google Deepmind, LLMs, Power of ML to improve code | Learning from Machine Learning #5

Episode Transcript

Popular Podcasts

Bookmarked by Reese's Book Club

On Purpose with Jay Shetty

Dateline NBC

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}Paige Bailey: Google Deepmind, LLMs, Power of ML to improve code | Learning from Machine Learning #5