Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Speaker 1 (00:15):
Pushkin. It feels like searching the web is a problem
that's been solved. You know, it's ridiculously easy for me
to say, find out when Alexander Hamilton was shot eighteen
oh four, or whether they are making Sing three. Not yet,
but Matthew McConaughey has expressed interest and yet. And maybe
(00:38):
this is not surprising. The people who spend their lives
working on search do not think search is solved. This
is partly because the people at the frontier of search
don't just want to search the web. They want to
answer every question that might cross your mind, even questions
you can't put into words. I'm Jacob Goldstein, and this
(01:06):
is What's your problem? The show where on entrepreneurs and
engineers talk about how they're going to change the world
once they solve a few problems. My guest today is
Kathy Edwards, vice president and GM of Search at Google.
Cathy's problem is this, how do you teach computers to
tell people what they want to know, even if they
don't know how to ask. Later in the conversation we
(01:30):
get to the frontier of what Kathy and Google are
working on now, but we started with the problem they
have largely solved in the six years Kathy has been
at Google, the jump from search results based on keywords
to search results based on natural language, the way people
talk in everyday life. So one of the problems that
(01:51):
we were working on around six years ago is this
problem of natural language queries. So, if you're old enough
to remember the early days of search on the Internet,
there was this idea of keywordies, right, that you had
to sort of take this idea you had in your
mind of what you what you needed to know and
(02:11):
figure out what were the exact right keywords to enter
into the search engine to get your results back right.
I mean an example as an example, very early in
the you know, I remember being taught how to query
back in you know, nineteen ninety nine and being told
never used the word and never used the word because
(02:32):
the word and or the word there is in almost
every document on the Internet. And the way it worked
back then is you did this word matching, right, and
so if you had a word that was in your
query and there was that same word in the document,
then that document would be returned and potentially scored right.
And that was very helpful if it was a word
(02:54):
like genetics, right, which was highly specific and wasn't in
a heap of documents on the internet. But the word
and not very specific. And you know, in the very
early days the Internet, these words weren't even weighted particularly right,
The word and count for as much as the word genetics,
and so a document might have a ton of the
(03:14):
uses of the word and and one use of the
word genetics, and it would score really highly, even though
it wasn't particularly genetics. Folk. Now, by the time you
get to Google, that part is solved, right Google buy
that part is years ago? Is waiting genetics more more
heavily than it's waiting the But but what what part
six years ago was not solved? That's solved now or
(03:36):
solved ish now. But we were still seeing people do
these very keyword oriented queries. So they weren't saying things
like what wine pairs best with chicken? Or if they were,
they were doing those queries and getting not the best results,
because not only is there a question of word matching
and how much each word counts for, there's also the
(03:59):
question of like does the word what appear at all? Right?
Like are the answers to that question actually just documents
that talk about the best wine to pair with chicken
is you know, chardonnay, right, and not so much talking.
You know, they didn't include the question, and so we
sort of saw these like SEO documents that would spring
(04:23):
up that would have the questions kind of baked in
and an attempt to match. But those documents weren't necessarily
the best answers. And so this is when we started
to go just that next level deeper in our language
understanding with these AI models, their language models that really
can start to map out in a concept space, things
(04:46):
like this sort of translation between how you might ask
a query and then what that might look like in
the document. So, to take the example you gave of
what wine pairs best with chicken, even as late as
six years ago when you got to Google, you're saying
Google wasn't great at delivering the best results to a
query like that because it was written as speech, not
(05:09):
written as a series of keywords. So six years ago,
I would have been better off typing chicken wine pairing.
I would have got better results if I did that,
you're saying, because that's kind of the way. That's the
way Google had mapped the web. It was like a
series of important words and what sites are reliable and
they it just the technology wasn't there to actually try
(05:29):
and understand the way people ask questions in real life. Absolutely,
And it was this idea of bringing AI into search
and having these like large scale language models. That first
one was called Bert. We now use one called mom,
which is get to mom. But let's talk about Let's
(05:50):
try talking about Bert for a second. So how do
you get from search results that are fundamentally keyword based
to search results that are fundamentally you know, answering questions
that are posed in a more natural way, like how
do you make that leap? So the fundamental insight is
(06:11):
you go from looking at these words as tokens that
get matched against each other to suddenly you look at
all the words in all the documents on the Internet
and you create what's code an embedding space, which is
essentially you can think of it as a map of
the concepts that these documents know about. And suddenly, by
(06:32):
being able to say, okay, you can take a query,
map that into this concept embedding space. You'd take these documents,
map that into the content embedding space. You can start
to actually match together not these words, but what people
actually mean what they actually mean when they ask these questions,
and what they actually mean when they write these web
(06:53):
pages on the Internet. That seems I mean, A, it
seems super hard, right, and B As I'm parsing that,
I'm tempted to use a lot of anthropomorphic language, right,
I'm tempted to say, like, you have to go from
the computer just sort of having a list of words
and kind of weights around those words to a computer
understanding what people mean. Like, am I right to say that?
(07:16):
Or is that just my like layperson intuition getting in
the way of what's going on? I Mean, the first
thing I'll say is I think we're very far away
from the computer having any sort of sentience and truly understanding.
But I think it is true. It is fair to
say that there is a level of deeper understanding that
(07:37):
you're not just looking at these words as as you know,
bits in a computer, but you're actually starting to model
in a way that a human might, a brain might
model what the concepts are. And I do think that's
a first step of getting closer to this sort of
natural human understanding. So is there a way to talk
(07:59):
about how that works? It's it's pattern matching effectively right,
And it just so happens that if you magnify pattern
matching on a very large scale, that can be a
pretty compelling understanding. And so that's the sort of big idea,
the theory of how it works. I'm sure in actually
building the thing in building Burt, which was this big
(08:23):
model that did work, it wasn't that easy, right, I mean,
is there a is there a story version of how
you built it? So I think there were two hard
points along the journey. The first hard point was just
these models were being built at a scale that was
(08:44):
unprecedented the amount of information. You know, traditional neural networks
would run on thousands, maybe millions of training examples. Suddenly
you're trying to model all the words on the Internet
and this scale. Firstly, this scale is what gets you
the amount of training to actually get the concepts model
(09:04):
to be compelling. But frankly, the computers just couldn't process.
So you're you're building this model and saying, okay, now
to learn what you need to learn, read literally every
word on the Internet, is that right, yes, and not
read at once, because every layer of the neuronet needs
to read it and reprocess it. Right. So you're reading
(09:25):
every word, you know, a massive number of times. And
at the time we didn't really have the compute power.
You just needed more more computers, essentially more and more chips,
more more engines to just process and process and process.
So our research team had developed these these chips that
(09:47):
these processes that were really optimized for doing a sort
of deep learning work. And it was that these chips
and the way that we could sort of put all
the chips together at a work in concert to solve
this problem that really unlocked the amount of processing power
and needed to even build these models in the festival.
So the binding constraint wasn't like the theory or the
(10:09):
ideas of it, like you knew how to do it,
you just didn't have enough enough horsepower to actually make
it happen. Well, we knew that we could do it,
we didn't know offer to be any good, right, it
wasn't it, and you couldn't even try, right, yeah? Right?
And so then we tried it and we found out, actually,
this thing is pretty compelling. It can understand things that
(10:32):
our models previously have never understood. You know. But I
will say the second and this gets to the second
hard part. We once we had these large scale language models,
we didn't quite know how to put them into search ranking.
This was not something that had been done before. So
(10:52):
we have in search this incredibly rigorous methodology for testing
any given change to our algorithm, and it's it's based
in statistics, and it's statistically samples queries, and we look
at the before and they after, and there's a scoring
system to say is it better or not? And I
remember looking at the early experiments from this burst integration
(11:17):
into our search engine, and the queries that it was
impacting were just queries that, honestly before we would have said,
we don't know how we can solve this query. And
suddenly the model was just able to figure out these
sort of unspoken concepts that just our previous technology just
(11:41):
would not have even been able to come close to.
Like give me an example, like what kind of thing?
So here's a really great example. This is directly from
the one of the very first bit evaluations that we
did internally, and the query is can you get medicine
(12:01):
for someone? Pharmacy? Right? And so what's interesting about this
question is the users looking for something very specific. They're
looking like maybe my partner is sick. Can I go
and pick up their prescription at the pharmacy for them?
Or do they have to go and get it? Right?
It's also a goodly jankie where it's half in natural
(12:22):
language can you get medicine for someone? And half in
like keyword ease, they're just typing pharmacy at the end, right,
it's a weird exactly yeah. And so previously we didn't
know how to pause out this intent right, this idea.
You know, we could tell that it was about getting
a prescription from a pharmacy, but this notion of force
(12:46):
someone was a concept that was just slightly too complex. Oh,
I didn't even understand it until now. What they mean
is can I pick up someone else's prescription? That's what
they're actually asking, But it's very it's poorly worded, frankly,
and they'refore hard to figure out exactly right. And so previously,
(13:06):
before Bert, we would turn these wonderful web pages saying
this is how you get a prescription filled, which you
can imagine if you're this user doing this query, you're like, yeah,
I already know how to get a prescription that filled.
Thanks for me. What I need is filled for somebody
else exactly, and with Bert we were able to understand
(13:29):
pick up this idea of the force someone and put
the appropriate weight on it, that that was the sort
of you know, discriminating thing in the query, that that
was the key thing that the query turned on. And
then we were able to show this this web page
that talked about can I have a friend or family
member pick up a prescription for me? And that was
(13:51):
the sort of like aha moment where we could all
just sit around and be like, Wow, this is a
new level of understanding that we haven't got to previously.
So with birth, Google got to the point where it
was very very good at dealing with words in a deep,
complex way. But words make up less and less of
(14:12):
the Internet. Pictures and videos are a whole other story
that's coming up in a minute. Now back to the show.
So you have got to this place now where you
have you Google have made the leap from keyword based
(14:32):
searches to intention based search is what do people mean? Right?
Which is this big interesting leap? And so I'm interested
in kind of the next leap, like what's the next
big hard problem you're trying to solve? What's really interesting
to me is this idea of how many questions you
don't ask because you don't even know the words, right,
(14:54):
Like this is a bit of a sad story. But
I have at my house this oak tree, and the
oak tree I think is dead, and it's very sad
for me because a very beautiful oak tree. And what's
interesting is, you know, I looked at the oak tree,
I'm like, wow, those leaves are kind of brown, Like,
that's not it doesn't seem right to me. I wonder
(15:17):
if there's something's wrong with the oak tree, right, But
I can't necessarily right now really articulate that to a
computer this fundamental question of is this oak tree dead?
And if not, what can I do to save it? Right?
So what I do is I go and type in
some queries, I say, you know, oak tree dead? How
do I know if my oak tree is dead? You know?
(15:39):
And I get back results. But those results aren't necessarily
taking what they're not taking in any context of this
particular tree and what do the leaves look like? And
so this idea of how can you start to ask
these questions using all of the information around you, using
your camera to actually capture this particular oak tree, using
(16:03):
your location to know, you know, what are the native
oaks in this area? And what's the current incidence of
sudden oak death syndrome, which is a thing that I
have recently learned exists. Okay, so I get why this
is a hard thing to search in a text box, right,
And so the thing that's interesting to me is how
(16:24):
can we facilitate asking those types of questions where it's
a mix of here's something that you're looking at, Here's
something that you're saying with your words that adds to
the picture. You know, here's a lemon tree that's got
some black spots on it? What's wrong with it? Like?
How can you help me understand what I should do
(16:46):
about this? You know, these sorts of questions I think
are right now. We have to do a tremendous amount
of work to try and translate these questions into text
that we would issue to a search engine. And yeah,
we use that. Yeah, yeah, normal people. Yes, we're all
doing it. And when you think it's you know, sometimes
(17:08):
it's very easy, right, but sometimes you're like really having
to work hard to come up with a query that
will actually get you the answers that you need. And
I think that's really the next frontier for us is
how do we on the query side help users just
naturally intuitively express whatever information need they have. And then
(17:30):
how do we understand the whole universe of information, not
just the web pages, that all the images and video
and audio out there, and take that next level of
like concept understanding to match those together so that we
can get users even more precise answers that really help them. Great.
(17:53):
So that's the like vast dream slash big problem. Can
we reduce it a little bit so we can talk
in sort of practical terms about what you're working on.
I mean, I know there's this new AI model that
integrates images, like you can, you know, whatever, take a
picture with the camera on your phone and put in text.
(18:14):
So like, well, you have this new model, and it,
like the old one, has this worm fuzzy acronym. Right,
it's called MUM, which stands for hold on, I gotta
look at my notes, the multitask unified model. So like,
tell me about MUM. So MOM is our next level
model that you know Bert was about language. MOM is
(18:36):
about all these different modalities of information coming together, particular
images and language. I mean, is that if really images
and language and we've got some limited applications of it
in search today. So for example, you can take the
take the photo of somebody's handbag and say you want
(18:58):
to shop it, and that will work today. And that
is like we were not able to do this previously,
and that in and of itself is a big breakthrough.
But there's still so much headroom, right like, this, still
so much ability to say, you know, you can add
sort of I would I would classify our current ability
to process words in this multimodal context as you know,
(19:22):
kind of like back in the early days days of
the internet, you can say near me to find where
you can buy it. Near me, you can say buy,
but you can't necessarily like ask an incredibly complicated question
about a picture, right like, so we're kind of back
to keywords in this new pictures plus words universe. Let
me ask a dumb question, why why can't you just
(19:45):
take all of your brilliant intent AI and copy and
paste it to fit with the image AI. So a
couple of things. The first is that anytime we develop
sort of this new technology, we also need to see
how users start using it, right And so I think
(20:07):
it's also fed say that we don't have. You know,
we have a ton of people using this, but we
haven't yet. There hasn't been time for that new technology
to really be accepted by the world. And then we
have this vast set of queries that we're doing poorly on, right.
So that's the other thing you should know about Google.
We spend a lot of time looking at the queries
(20:29):
where we're failing. That's one of the other reasons we
have a deep appreciation of how search is an unsolved
problem because we're just constantly looking at queries whether the
users clearly not getting what they're looking for. And I'll
use that as a as a siege to figure out
how to make things better. So do I understand you
that the fundamental thing you need now is just lots
(20:50):
of people to use this thing so you can see
the weird ways people search and the things they sort
of do that are hard to understand. That's certainly one
of the things we need. I mean, it is fundamentally
search works in service of our users, right, and so
understanding the the failures is critical to how we get better.
(21:13):
I think there are also just things that we know
that we need to do on the AI and the
model side that we'll continue working through, right, the ability
to really bring together more of the two step process
of how do you conceptually understand the words, conceptually understand
the image, and then bring those two things together and
(21:35):
have that be a bit deeper on both sides rather
than just the combination together and those sorts of things.
But yeah, I mean people coming in and using it
and then having a bad time, we'll then make it better.
It seems like there have been two main threads of
AI research. One is basically language and the other is
(21:57):
basically vision and images. I mean, it is it right
to think of what you're trying to do as the
synthesis of those two sort of main AI traditions. Yeah,
I think so. I think it is clearly the case
that just like uh, you know, with Bert, we took
(22:20):
all these words and we got down to concepts. Right,
It is clearly the case that human beings understand the
world through concepts, and they do that visually, and they
do that with language, and ultimately the concepts are the same. Right,
So being able being able to say, okay, here's here's
a concept, and we can attach to that what that
concept looks like or that visual representation of that concept
(22:43):
as much as it has one and the words surrounding
that concept. That's when we can really unlock this true
natural way of understanding the world that we think is
going to enable people to ask all those questions that
they have that they're not asking right now. Are there
(23:03):
applications that go beyond search that come to mind if
you figure this out? Yeah, I mean I think that
search has this connotation of kind of find what's out there.
I think there's something, you know, we're thinking about what
(23:26):
this looks like in the generative space. So for example,
if you're looking for you know, I bake birthday cakes,
and sometimes I for my kids, and sometimes what my
kids want, and a birthday cake just actually doesn't exist
on the internet right or there's like only one or two,
(23:47):
So then I have to come up with it myself.
And like, what if AI could help us generate as
sample image just based on these concepts that I could
then use for inspiration. I think that's a pretty interesting concept.
There's obviously a lot of things that we need to
be very thoughtful about in this space as we do it,
(24:09):
But I think this idea of extending search past the
notion of connecting you with the information that's out there.
To actually synthesizing new information for you is pretty interesting
and something we're talking about a lot. You know. One
of the things that has become clear to me talking
with you is clearly, I think too narrowly about search. Right.
(24:32):
I have this very kind of twenty years ago idea
of like searching text on the web, and the web
has become much less text based in that time. Right,
the web includes Instagram, the web includes TikTok, and those
are places where, weirdly to me, lots of people go
(24:54):
to search like people go on TikTok to find whatever,
where to go out to eat, which would never occur
to me. So I mean it's that part of the
sort of motivation on some level for you to figure out, Oh, right, text,
that's not enough, clue, we got to figure out how
to search in video and what does that even mean.
I think we're really driven by what our users are
(25:15):
telling us, and we just have really robust mechanisms for
understanding what our users are doing. And it's pretty clear
that people around the world find image and video content
to be pretty compelling, right, I Mean that's sort of
a very obvious statement, but you know the Internet in
(25:38):
the early days, it was banned with constrained. It was
technology constrained. It had to be words because that's what
the technology enabled, not necessarily because that's what human beings
most enjoy in terms of an information consumption experience. And
so we really are driven by what we're seeing in
the user trends, and we're really driven by just this
(26:00):
mission of how do we keep helping people get the
best answers to their questions that we can give them.
In a minute, the Lightning round, including what you learn
about the Internet when you spend six years working at
Google Search. Now, let's get back to the show. I
(26:27):
want to do a lightning round. We close usually with
a lightning round on this show, just a bunch of
relatively quick questions. So in this instance, I googled best
Lightning Round questions, and right at the top of the
search results page I didn't even have to click through,
is this bulleted list. I'm just going to give you
a few from there. Sounds good. Favorite day of the week,
(26:50):
oh Monday, because I get to go to work and
not deal with my kids all day. Who I love
very daily. Good favorite city in US besides the one
you live in just reading here New York's City. Thank you.
(27:12):
Would you rather be able to speak every language in
the world or be able to talk to animals? Speak
every language in the world. I'm shocked, to be honest,
Although I get that like a Google might actually figure
that out. Does it not seem like you can already
get a translator for every language. Talking that animals would
be like a revolution and human understanding of the natural world,
(27:35):
I guess. But I do not speak any language as really,
and I constantly feel bad about it. So maybe that
was just a fair personal feeling of weakness. So okay,
so we're now we're pivoting out of the Google lightning
round questions into my own bespoke lightning round questions. What's
(27:57):
your favorite kind of cake to bake? Oh? Well, so
I really make these, like quite elaborate cakes for my
children because I want them to be able to grow
up and say, wow, I remember you making great cakes
for us. Mostly so I recently made in one of
(28:20):
my kids plays Minecraft a lot, and there is a
character called a slime, which is a sort of jelly
blob that kind of jumps on top of you and
kills you if you don't fight it off. And so
I made a slime cake with a cake embedded in
the jelly. So big idea one here? What do you
think you understand about the Internet that most people don't understand? Oh?
(28:43):
I like this one. I think most people don't understand
how much it changes every day. And you know, so
we have this astonishing stat that even I didn't believe
when I heard it, which is that fifteen percent of
the queries Google sees every day we have never seen before.
And that happens every day. That there's fifteen percent, I
just completely new. And the same happens on the internet side.
(29:06):
Every day we index a ton of new content we've
never seen before about ideas that are completely new to
humanity at that time, right, And you know, we have
to be able to continually understand that and keep up.
And I think that people sort of have this idea
that there's a fixed amount of information out there, but
(29:27):
actually human beings are astonishingly productive and are constantly coming
up with new ideas. If everything goes well, what problem
will you be trying to solve in five years, I
will still be working on making Google Search better for
all our users. I think we will I think we
will be working on this for the next hundred years.
(29:48):
Is there a narrower answer, like this particular problem you're
working on now of integrating image and words basically like
you think you'll obviously it won't be completely solved, but
you think that'll basically work. And if so, is there
a next thing? I think the problem of a video
(30:10):
I think will continue to be hard because there's just
such a large amount of information in a given video.
The other problem that I'm really interested in is helping
people pause information with helpful context. So, like, how you
know we've unleashed the all of the world's information on people,
(30:32):
how do you actually help help them sift through that
and make good decisions, whether it's choosing a reliable merchant
to buy from or finding reliable medical information? How do
you help people make those decisions for themselves and be
literate with their information choices. What's one piece of advice
you'd give to someone trying to solve a hard problem.
(30:55):
I would say, find a really great group of people
to help solve up with you, because generally trying to
solve hard things by yourself enser being an active fustriction.
Kathy Edwards is vice president and g M of Search
at Google. Today's show was edited by Robert Smith, produced
(31:18):
by Edith Russelo, and engineered by Amanda k Waugh. I'm
Jacob Goldstein, and we'll be back next week with another
episode of What's Your Problem.