How Google Won the Search War

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Speaker 1 (00:04):
Welcome to tech Stuff, a production of I Heart Radios
How Stuff Works. Hey there, and welcome to tech Stuff.
I'm your host, Jonathan Strickland. I'm an executive producer with
How Stuff Works and I heart Radio and I love
all things tech, and today I thought i'd talk a
bit about Internet search engines and how Google was able

(00:26):
to sort of take the lead amongst a pack of competitors,
most of which came out well before Google did. Now
these days, lots of people use Google as a word
for web searching in general, even though the company does
way more than web search, and there's still plenty of
competitors that are still active that are out there. I'm
sure Microsoft would rather we all talk about binging the

(00:49):
heck of the things, but that doesn't happen. I think
we're now at the point where people will talk about Googling,
even if they're using a different search engine. So how
did that happen? How did we get to that point? Well,
to explain how we got there, it's a good idea
to walk down memory lane. I mean, you know, I
love to do this. Every episode begins with a history
lesson and to really look at how the idea of

(01:09):
search engines developed and what things were like in the
early days of the public Internet and the Web now. First,
the idea of search engines predates both of those concepts
by quite some time, and it rose out of necessity.
It kind of evolved out of older methods of indexing.
So a predecessor to search engines are the various library

(01:31):
classification systems UH. Three big ones are the Dewey Decimal system,
the Library of Congress system, and the Superintendent of Documents
systems UH. The first two of those designate books with
call numbers according to subject matter, so you divide the
books up based upon whatever subject they cover. This can

(01:51):
get a little complicated, it is and no pun intended subjective.
You have to determine where does the book best fit
in the grand taxonomy of subjects UH. Meanwhile, the Superintendent
of Documents system is totally different. It doesn't divide it
up by subject. It divides up books by the issuing
agency responsible for the publication of the work. So they

(02:15):
just divided up by where the book came from, not
what the book covers. Whatever the system, the purpose is
the same. It's to make it possible for someone to
track down a specific work in an enormous collection of works,
or to figure out where to place a new work
within an existing collection. By classifying each work and then

(02:36):
designating the physical location for that piece, people can find stuff. Otherwise,
you just have an enormous pile of books with no
organizational system at all, and finding anything would take ages. Now,
someday I'll have to do an episode about these systems
in more detail, to talk about how they were developed
and how they've evolved over time, because it's actually a
pretty interesting story. But we're gonna jump forward a bit,

(02:58):
not quite up to the com uter age, however. Rather
we're gonna jump forward to the nineteen forties. That's when
a forward thinking fellow named Vanavar Bush wrote an article
for The Atlantic Monthly. The piece had the title as
we May Think, and it contains some fairly prescient ideas
in it. Bush recognized that as we increase our knowledge,

(03:20):
we were beginning to specialize in certain fields out of necessity.
That you couldn't just be a general knowledge master. Eventually
you were starting to develop our our knowledge in different areas,
uh so far that you had to specialize. You couldn't
be an expert in everything to get get a really
deep understanding about a particular field, such as physics or chemistry,

(03:42):
we might dedicate all our resources to that pursuit as
an individual. Meanwhile, there are other people who are exploring
different subjects, like pure mathematics or cosmology or something like that. Now, this,
Bush argued, presented a new challenge. How do we create
a usable record of our discovery, one that's easily navigable
and remains relevant over time. While an older library classification

(04:06):
system might encompass several categories, it couldn't get as granular
as our knowledge was growing to be. For example, the
Library of Congress classification system has twenty one categories that
you can use to group books together. But as our
research and discoveries honed in on ever more precise slices
of those categories, the system becomes less relevant because you've

(04:29):
you've got, you know, minor categories within those major categories,
so it gets harder to start classifying things. Bush said
we needed to have a record that could be continuously
extended and easy to consult. But he went even further
out than that. He said, to make it a really

(04:50):
useful record, we need to structure it to respond to
our queries in a way similar to how the human
mind works. Bush argued that we think through associate. We
associate ideas with each other, sometimes in pretty unusual ways,
in ways that might seem intuitive to us. But on
the very surface of it, there there doesn't seem to

(05:11):
be any relation between those ideas. And you may have
experienced this where you're thinking about one thing and you
just start to think about a different thing that doesn't
seem to be related, and then you're able to relate
the two. This is really human ingenuity. It's where innovation
really takes off. Well, Bush, that would probably be impossible
for us to create an artificial system that could replicate

(05:33):
that tendency, but we could at the very least design
something that acknowledges that human trait so it works better
for us. So if we did that, if we designed
to search for a record for a particular type of information,
we might also see the opportunity to search for tangential
data that is relevant to our needs. A good system
would be able to anticipate that and serve up the

(05:54):
information for us. So Bush proposed a hypothetical system called
mimics m E M E X and that would use
associative factors to organize information in a virtually limitless storage space. Again,
this is hypothetical. It would be a system that one
could reference and send a retrieval command to get the
most relevant information related to whatever it was you were

(06:16):
asking for your query. Essentially, he was talking about a
conceptual model that the Internet attempts to realize. Now skip
ahead to the nineteen sixties. Then you've got a computer
scientist named Jerry Saltan. Jerry Salton taught at Cornell University,
and he developed an indexing strategy using a vector space model.

(06:37):
Now this gets a bit mind bendy for people who
haven't worked with vector space models, but follow me here. Now,
start with an imaginary virtual space kind of analogous to
the physical space we live in in our day to
day lives. Now, in our reality, we can perceive three dimensions,
and we experience a fourth one, that of time, but

(06:57):
we cannot directly perceive any more than that ourselves, So
most of the time we associate the physical world with
three physical dimensions. Now, the information retrieval method that Salton
set up, he defined the number of dimensions within his
virtual space by the number of terms in a retrieval request.
So if your request included five terms, the vector space

(07:20):
model would have five dimensions. Documents within the model would
virtually appear as vectors within the space according to which
of the search terms were present within those documents and
how frequently they were present within the documents. Uh, the
queries and the documents are both vectors of the term counts.
And just in case you're as rusty on your physics

(07:42):
terms as I am, a vector is a quantity that
has a magnitude and a direction. So your terms have vectors,
your documents have vectors, and the goal is to identify
the documents that are most similar to the initial query
in an effort to retrieve the most relevant results, well
leaving out anything that doesn't meet the criterion or doesn't
meant a predetermined threshold of relevance. So you might say,

(08:05):
I need to have x percentage match for the retrieval
to actually come through, and anything that doesn't meet that
threshold gets discarded. It's not it's not served to me,
and that saves you time when you start sorting through
the results to see if any of those actually represent
the information you were actually looking for. Now, suffice it
to say, this model really looks for the presence of

(08:27):
specific terms, but not necessarily their use within the document
their context, So you could end up retrieving a document
that technically contains all the terms you used in the search,
but it has no real relevance to your actual needs.
So that is a limitation of this model, but still
it was a pretty good starting point, so Saltan's work

(08:50):
was incredibly important. Another big thinker who helped shape the
course of what would become the Internet and the Web
is a guy named Ted Nelson who in the nineteenes
sixties proposed an idea he called Zanna Do. And I'm
not talking about the cheesy movie starring Olivia Newton John
about roller skating Greek muses, but as a side note,
I really love that movie now. Nelson's Zanna Do was

(09:12):
a hypothetical computer based writing system that would have a
means to link different documents within a global depository. So
essentially he was talking about hypertext links, which would allow
users to navigate from document to document to relate documents together,
so that you could have a collection of documents about
the same sort of of subject matter and make it

(09:35):
very easy to reference different research. It would also allow
document creators to add their work to a growing collection
of documents about similar subjects. Now, while the Web would
incorporate many of Nelson's ideas, he has stated that the
web falls far short of what Zanna do was meant
to do. Still, those links would become very important for
the web. Heck, I mean you could argue the links

(09:56):
or what make it a web in the first place.
The World Wide Web is a series of documents published
on servers that have connective tissue between them. That's the
web that you navigate. So it would be crucial in
Google's eventual successes. We'll see now. In the nineteen seventies,
the agency that would become DARPA, which at the time

(10:18):
was just ARPA, funded the development of the ARPA Net,
which would be the predecessor to the Internet. Computer scientists
worked on the rules that machines would have to follow
in order to communicate with one another over a network.
This was a non trivial problem at the time because
computers were dependent upon proprietary systems that were not compatible

(10:38):
with computers from other manufacturers. So, in other words, they
were talking in different languages. So you have to find
a common means of communication between these different machines. Solving
those problems laid the foundation for the Internet that was
to follow. Now skipping ahead to the late nineteen eighties,
this is still before the Web was a thing, but
college students Alan Mta and Bill Healen recognized the need

(11:03):
for a tool to search file databases. Effectively, they were
part of a project at the McGill University School of
Computer Science to develop that kind of a tool. It
would become known as Archie, and it was meant to
search archives of files. The original version was pretty primitive.
It would essentially just send an automated request to a

(11:24):
file Transfer Protocol server and it would just say, hey,
give me a list of all the files that are
stored on your server. That's it, just give me a
laundry list of all the files that are on there.
And it was once a month it would send this request,
and so really it was just a list of the
documents that were available on that FTP server, not anything more,

(11:46):
you know, sophisticated than that. But it would grow to
become a query search tool, allowing users to look for
files containing specific terms in them or with specific titles.
Other schools would develop similar search tools in the following years,
naming them after characters from Archie comics like Veronica and
jug Head. Now this is despite the fact that Mtaj

(12:07):
said he intended no association with Archie comics at all.
He chose the name Archie because it's archive but without
the V. But sometimes memes just take hold, even if
they're based off a misunderstanding. Also, both Veronica and Jugead
search for files in the Gopher index system, a predecessor
and alternative to the Worldwide Web. I did an episode

(12:29):
about Gopher a couple of years ago. I think, so
he can search the archives if you want to hear
about that. Now. This leads us to when Tim Burners
Lee built and published the world's first web page. Burners
Lee had done some work with hypertext documents at CERN
as a contractor in the early eighties. The goal then
was to help researchers share information between each other as

(12:51):
they were smashing particles against each other really really hard.
By burners Lee was thinking about pairing the hypertext capabilities
with the Internet to allow for an interconnected series of
documents hosted on networked Internet servers, and thus the World
Wide Web was born. It wouldn't take long for others
to jump on the idea, and that meant it wouldn't
be long before people needed a tool to search the

(13:13):
growing collection of documents on the Internet. And that kind
of sets me up for the next section, which I
will tackle in just a moment after we take this
quake break. So in the earliest days of the web,
when calling it a web might have been a little generous,

(13:36):
cern maintained a list of web servers that hosted web pages.
This was all part of the Worldwide Web Virtual Library
or vlib or sometimes www v lib. This was the
first index of web content and it relied upon real
life human beings to build out the index. As more
web pages were publishing, they volunteered their time to build

(13:58):
out the index. So this is automated. People were actually
doing this by hand adding the the names and the
addresses to these different sites on this index. Next we
have Matthew Gray's Worldwide Web Wanderer. Now, this was a
bot or an autonomous program on a network that can
interact in some significant way with the information on the network.

(14:22):
And we deal with bots all the time. Sometimes it's
in the background and humans don't really notice, and sometimes
like chat pots, it's very much in front of us.
The butt that Matthew Gray created would navigate the World
Wide Web to keep counting of how many active servers
there were in any network. It was essentially measuring the
growth of the web over time by counting up these servers.

(14:43):
As more servers came online, we learned that the World
Wide Web was growing. Gray upgraded the bot to actually
capture the u r l's of web pages, because earlier
it was just counting stuff. It wasn't actually making note
of anything in particular, and so I got a little
more sophisticated gray bill out of database of these captured
u r l s, called wand decks. The bought would

(15:05):
ping servers multiple times each day, and it actually became
a problem that was pinging so frequently. And a ping
is just a quick message that essentially says, hey, are
you there, and then it's waiting for a response of yeah,
I'm here. It's all good. But it was doing this
so many times each day that it was actually starting
to create lag on the Internet. Of course, this is

(15:26):
in the very very early days, so whoopsie daisy there.
Now toward the end of n some early web search
tools were starting to make their way to the general public. Though,
keep in mind that in the very early days of
the Worldwide Web, the general public accessing web pages was
really just a tiny fraction of the overall population. It's

(15:46):
like college students, some early adopters, some folks with various
government agencies, and a few companies, but not a whole lot. Uh.
There was largely a mysterious thing. You know. This is
when people were just starting to hear the terms of
Worldwide Web and information super Highway, because the Internet had
been around for a while, but most people didn't have

(16:07):
any regular way to access it. So these tools could
help you find stuff, but they weren't super sophisticated. There
was the Worldwide web Worm, which would pull together lists
of titles and u r l s for web pages.
There was jump Station, which would pull down information about
web pages titles and header sections, so sort of like

(16:30):
the title of the web page and a brief description
of what the web page was supposed to be. But
both of those tools were very simple, and they would
present results in the order that they were found by
the tools, so there was no ranking of the search results.
It was all by by uh, first come, first serve
kind of approach. So it might be that your results

(16:53):
all had whatever your query was in it, but the
most relevant ones could be buried much further down the
list because they didn't rank in any way. Then there
was the rb SC spider, which actually attempted to rank
results by relevance. But all three of these were limited
in what they could do, and often you needed to
know what you were looking for exactly in order to

(17:13):
get a hit. In other words, you couldn't just do
a string of words. You certainly couldn't write in natural
language what your query was, so you might have to
put in the actual title of a page in order
to get the response back. So you would have to
know what the page's title is, but you're not. You
obviously don't know what the U r L is, or

(17:35):
else you would just navigate to the page directly. You
just type in the address and your browser's address bar
and go there. So it was kind of limited in
its utility. If you were to do anything outside of
the actual title of a page, you might not find
any hits, even if such pages actually existed out there. Also,

(17:55):
in some Stanford undergraduates decided to take the work they
had been doing on a project called Architect and develop
a web crawling search tool based off of that work.
Architect was all about using statistical analysis of word relationships
in an effort to kind of build a basic understanding
of what the subject matter was and that would then

(18:18):
be able to help you create more relevant search results
on queries. So you run a search request and this
tool would statistically analyze various indexed pages in its database
and return the results that appeared to be the most relevant. Um.
It was an interesting approach. It was definitely one that

(18:40):
was needed because it wasn't just listing the the sites
chronologically based on how they were attained. But it would
take about two years for this project to actually turn
into something that the group could unveil uh and when
they did, they called the will Excite and they held

(19:02):
a commercial release for the product in n But in
between the founding and the release of Excite, we hit
a banner year for early search engines. Nineteen four was
the year that web crawler, lycos Info, Seek, and Yahoo

(19:23):
all got their start. Now, with the case of Yahoo,
the company was not relying on bots to crawl through
web servers to index all the pages that the bots
came across. Instead, Yahoo initially was relying on actual human
beings to curate an index, so they were actually going
to web pages deciding whether or not those web pages

(19:44):
were good enough to be listed on Yahoo on the
various subjects that Yahoo was covering, and then they would
be grouped together if they passed muster. Now, there are
pros and cons to that approach. One of the pros
is that because it is human curated, there a much
better possibility that the web pages on Yeah whose lists
were good ones with good content. But the conside was

(20:07):
that as the web grew and began growing at an
even faster rate, it really limited Yahoo's usefulness. It would
only be later that Yahoo would branch out into the
web search in general, and even then it relied very
heavily on third parties for the actual search tools. They
didn't really dive into developing their own. They were more
about making deals with other search engines to power their search.

(20:31):
In fact, that happened on and off throughout Yahoo's entire existence.
But let's get back to web Crawler, Lycos and Infoseek. Now,
of those three, WebCrawler was the first to provide full
text search of web pages, so not just headers and titles.
You could search terms, and if they appeared in the
web page at all, then, in theory, WebCrawler would be

(20:52):
able to bring that back as long as it was
indexed in Webcrawler's index. Um it was the work of
universe the a Washington student named Brian Pinkerton, and Pinkerton's
web Crawler built out this big index of pages, and
Pinkerton started rather modestly. He first released a list of
the top twenty five websites on the web on March fifteenth,

(21:16):
and the following month he announced that the web Crawler's
index included four thousand websites, and by June of ninety four,
he made the index searchable for everyone. So again, this
is just a slice of all the websites that were
out there, but it was a decent enough slice to
start off with, and the endeavor proved to be successful.

(21:37):
Pinkerton received financial investments from a couple of big companies,
and within a year he had managed to support the
service through advertising revenue, a model that other search engines
would follow, so he was able to actually make money
by serving up advertising on his search engine pages. By June,
A O. L had become interested in WebCrawler and would
purchase the company a O L would lay Eater sell

(22:00):
the company a little less than two years later to
excite that company that I had mentioned earlier in this episode.
I'll get back to them and to web Crawler a
bit later, but I will say that web Crawler was
my search engine of choice when I first started using
the web in the mid nineteen nineties. I was actually
pretty slow to move over to that crazy Google thing
that we're gonna get to later in this episode. Lycos meanwhile,

(22:21):
started off as a project at Carnegie Mellon University. Michael
Malden headed up the project and the name came from
wolf spiders that have the scientific name Lycos sedilla. When
Lycos became a company, Bob Davis took the helm to
turn it into a revenue generating business that it gets
cash from advertising like web Crawler, and it also was

(22:43):
a success, and by the end of nine the Lycos
index was the largest web search index available on the web.
It held more than sixty million documents in it. The
service grew tremendously, as did the company, and the full
story of Like Coast is one I'll have to cover
in another episode because it gets pretty bonkers. But for

(23:04):
this episode, it's just important to note that it was
another early search service that grew and became diversified and
tried to do lots of other stuff. Um Steve Kirsch
would be the guy behind info Seque. That one originally
launched as a pay for use service, so it's an
original revenue model. Wasn't advertising, it was you would pay

(23:27):
to use it. Now that only lasted about half a year,
a little more than half a year before a kurse
dropped the fee and it became free to use, and
by February the service became known as info see Search
and also Netscape and Infoseque negotiated a deal in which
info sque would become the default search engine and Netscape's

(23:49):
web browser, so that really helped info squ's penetration quite
a bit in those days. Now, one thing Infoseque incorporated
in its service after a couple of years is the
option to use boolean operators. Now, these are a collection
of simple words that can help you narrow down searches.
The words include and or and not, so with an

(24:11):
and operator you are narrowing your focus. So if you
search for the terms Superman and movies, the results you
get should be relevant to both of those terms. You
should only get results that include information about Superman and movies.
If you're looking for specific Superman movie, hopefully those would

(24:34):
be right in that list. Some of them should have
the information you're looking for, and maybe that you still
have to do some digging to find them, because you're
going to get all the web pages that have both
Superman and movies inside of them. Now you could make
it more specific. You could say Superman and movies and
Christopher Reeve. That would end up narrowing the results for

(24:57):
those to look for. Any pages have all three of
those terms inside of them. The boolean operator or does
the opposite. It broadens your search. Maybe you want to
search for Batman or Superman, then you should get results
that have either or both Superman or Batman in them. Um,
so you would get all the Superman results, all the

(25:19):
Batman results. You probably also get all the Superman and
Batman results, so you're you're increasing the number that you receive.
The not boolean operator helps you eliminate options from search.
So if you searched comic books not Superman, you should
get results about comic books that don't mention or include
Superman in the web pages, so it should be discussions

(25:43):
about comic books, but they're not Superman comic books, or
at least Superman's name isn't appearing in the web page. Now,
Boolean search is still a great tool to help you
get the results you want, but as time has gone on,
search has become much more sophisticated, so it's not really
as necessary to become familiar with booleyan search. It's good
to know how to use it, but it's not key

(26:05):
because searches not only just grown more sophisticated, but growing
more intrusive. A lot of searches today rely on information
that various browsers and web pages are gathering about you,
so they're using your past behavior as a predictive tool
to help serve up relevant results. But that's a topic

(26:26):
for a different podcast episode. I'll do another podcast episode
about this at some point. Now. Infacyque had a search
tool that allowed users to include different modifiers on search
results to narrow down the return sites, which was becoming
important because the web was growing enormously in the mid
to late nineties and would only continue to do so.

(26:48):
The Walt Disney Company took notice of infoseque and would
purchase more than of the company, effectively incorporating the business
into the media empire ruled by the hand of the mouse.
Infoseque at that point had made several acquisitions of its own,
including sites like ESPN dot com and ABC news dot com,
which then became part of Disney's media Empire, and infoseque

(27:11):
we get rolled into Disney's Go dot com network of
services and sites, and effectively, eventually, after several years, it
would disappear into that network of sites, Infoseque would begin
to offer up manually curated search results along with automated ones. Again,
this was an effort to return the most relevant results.

(27:31):
You'll see if you look at the history of search
engines that a lot of them kind of experimented with
this human curated approach because that was a real issue,
was that you would use these search engines and you
would get a ton of results and only a few
of them ever seemed to be even remotely connected to
what you wanted. So putting humans in charge of that

(27:52):
made it a little easier to do relevant results. Because
humans understand context. They understand when a site is actually
about something versus when a site just mentions something off hand,
but it's not really about that thing. You even saw
this relatively recently. I remember, uh, shortly after I started

(28:15):
How Stuff works, how the service mahallow was kind of struggling,
but it was also a human curated search engine. And
we're talking like two thousand seven when I was looking
into that. Um my friend Veronica Belmont used to work
for that company, so it was still something that people
were trying even as late as the late two thousand's era,

(28:37):
or by late two thousand's, I mean the first decade
of two thousands. Anyway, info seq uh what tried that out.
And also one of the engineers from Infoseek, lie Yan Hong,
relocated to China and became a co founder of a
different search engine company called Bai Do be a i
du as a company that has become truly enormous, with

(28:59):
asset approaching a value of three hundred billion dollars. That's
actually more than what Google's parent company, Alphabet has at
its disposal. So you could argue by do one the
search wars, but then by Do is not widely known
in the West. It's a very huge company over in Asia,

(29:21):
but not not as well known here. Back to our
search engine history. Excite, the company I talked about earlier,
finally debuts and it did well. In fact, it did
so well that it would end up purchasing web Crawler
in But by nine it's numbers were starting to decline
thanks to you know who That rhymes was Shmoogle, and

(29:45):
it merged with a company called at home dot com,
the at symbol home dot com. It was a deal
that was worth nearly seven billion dollars, but that deal
did not ultimately work out. The merged company would file
for bankruptcy in two thousand one, one of the many
victims of the dot com bubble bursting um that was

(30:07):
at least one of the big contributing factors to that.
The company also just had a lot of debt even
heading into two thousand two thousand one, so that was
kind of the nail in the coffin. Now, Infospace, which
once upon a time owned what would become stuffed Media,
So technically I was an Infospace employee for a short while,

(30:29):
purchased Excites assets and domain names, and so web Crawler
and and uh Excite all became wrapped up with infospaces offerings,
and uh yeah, there you just technically, it's still part
of that. You can still use some of that, although

(30:50):
um it's a much different tool than what it used
to be. Also in Alta Vista emerged from the Western
Research Laboratory at the Digital equip Mint Corporation or d
e C. Alta Vista allowed for natural language queries, meaning
you could type in a query similar to how you
would ask a person to look for something for you.
You didn't have to focus on asking in a way

(31:12):
that would only make sense to a machine. This is
that barrier of entry we often see with technology, where
we have to adjust our behavior so that whatever technology
we're working with understands, quote unquote what we want from it.
Um Alta Vista was trying to reverse that, to make
the technology attempt to understand what we want, rather than

(31:35):
making us work so that the technology can understand us.
The researchers who designed it had to do a full
scale web crawl in August, indexed ten million pages in
that web crawl, and this was compelling enough to launch
as a spinoff company by Alta Vista was powering search
results for Yahoo. So, like I mentioned earlier, where Yahoo

(31:58):
would use other company to to run their web search
Altivista was one of those, but also at that time,
Compact would acquire d e C, which in turn owned Altivista,
and Compact turned Ultivista into more of a portal service
than than a search engine, a true search engine, which
put it more in direct competition with Yahoo, and Ultivista's

(32:20):
numbers went into decline, possibly because of that shift to
a portal service rather than as a more straightforward search tool.
Now we're not quite done covering all the major players
in the space before Google came on board. I'm going
to cover a couple more right after we take this
quick break. Okay, So in addition to the services I've

(32:48):
already mentioned, there were a couple more. There was ink
Tomy that it's a project that was headed by Eric
Brewer and Paul Gautier. They founded inc Tomy in the
two of them been working on a parallel processing computing
project for DARPA when they came up with this approach
to search, and rather than launching a dedicated search tool

(33:10):
of their own, they said, oh, well, we offer to
use our technology to power other people's search engines. So essentially,
you you put up the front and will power the
back end. And one of those was run by a
company called hot Wired, and they introduced a search tool
called hot bot. Ink Tomy worked largely as sort of

(33:33):
a business to business entity, growing far beyond a search
engine company. But the dot com crash of two thousand
one also hit ink Tony really hard, and a couple
of years later it was swept up by Yahoo. So
you see, you see a lot of these companies end
up kind of getting gulped up by each other. Now,
the last of our pre Google search engines that I'm
going to talk about is ask Jeeves. Later on, it

(33:56):
was just known as Ask. It launched in nineteen nine
d seven, having been developed by David Warthen and Garrett Gruner,
and like some of the other services I mentioned in
this episode, it would present curated lists that were created
by sort of an editorial board, along with some paid listing.
So if you're a company that wanted your website to

(34:17):
be listed alongside quote unquote legitimate research returns rather, you
could pony up the cash have your website put on
that list. That still happens today on search engines. Happens
today on Google, where you'll see the first couple of
results tend to be ones that say, you know, add

(34:39):
At the end of it, Google has to label them
as ads, not as just natural search results based on
your query. Though sometimes those ads actually are the things
you're looking for, so it's not always a bad thing,
but it is good to just pay attention. So eventually
Ask would develop its own search engine technology that would
automate things. They stopped lying exclusively on people curating lists,

(35:03):
and Ask would go on to acquire Excite, so you saw,
you know, Excite what WebCrawler will ask with later on
by Excite, So you see, there's a lot of shuffling
with these companies. And then came Google, which had started
as a research project at Stanford. Larry Page and Sarage
Brenn had developed the tool and they were running it

(35:24):
out of a garage for a little while. They had
built a search tool they originally called BackRub, and their
goal is to create a search engine that could index
the web and then present the most relevant results to
any query. But how would you do that? Well, the
actual answer, if we're being totally transparent, is kind of
like Coke's secret formula, and that we know in general

(35:46):
what has to go into it, but we don't know
the specifics that would allow us to replicate the results precisely.
The algorithm that Google uses is peculiar to Google, and
they also change it a lot. They tweak it, so
even if we did learn how it used to work,
it doesn't work that way anymore. So Brendan Page would

(36:07):
refer to this process as page rank. And here's how
it worked from a theoretical standpoint. So first, you index
the web. So you need to get a kind of
a ah, a complete look at all the websites that
are available out there on the web and inventory if
you will, of all the web. To do this, you

(36:29):
send out bots to index all the pages that are
listed on the web that you can find. Um, you
can actually in the HTML of a web page, you
can designate it so that it will instruct bots to
ignore the page and not index it. So you can
do that and it won't show up on any search
result page because the bot will see that message and

(36:51):
we'll just move on. This is useful if you want
a page that only people who know about it can
navigate to it, and you don't want folks just stumbling
on it through search, So that is an option. So
for all of the pages that are discoverable. The bots
will crawl through, they follow all the links, they try
and index out the web and get it as good
as snapshot of what the World Wide Web is as

(37:13):
is possible. Now, these spots aren't just looking for the
location of the web pages, like what server those web
pages are stored on, or even get just a full
understanding of what the text is inside those pages, so
that when you do a search query and you put
your search terms in, they can return the pages that
have those search terms. They're also looking for links, both

(37:36):
going into the page and coming from the page to
go elsewhere, and the links will become a really important
part of page rink. So here's the basic idea. Brand
and Page figured out that if a web page about
a given subject is really good, other pages tend to
link to it. They do so because they recognize the quality,

(37:57):
and that helps boost the pages position in search results.
So let's use an example to kind of understand this.
Let's say you are one of these early web developers
in the late nineteen nineties, and you're also a big
music fans, so you decided to create a blog that's
completely focused on the music industry, and you cover the

(38:18):
news in the industry. You post reviews of albums that
you've listened to. Maybe you even do some interviews with
people who are in the industry. And as you write
this blog, other people take notice. Some of them also
have a web presence and cover the industry, and they
really dig your stuff, so they linked to your page.
They say, there's a really cool music industry blog. It's

(38:38):
being run by this person over here. Follow this link
to go check it out. Google's bots would register that
they would see that those links were out there pointing
to your page, and the more sites that link back
to your blog, the higher your blog would rank and
search results. So if someone searched music industry news or
something along those lines, there's a chance that your blog

(38:59):
would pop up fairly high and results. Now how high
would be dependent on something other than just how many
pages are linking to you. That's one factor that matters
a lot, the number of sites linking to your page.
But the other one is how trustworthy those linking sites were.
So let's consider two scenarios. In our first scenario, you've

(39:22):
got your music blog and you've got a lot of
sites that are linking to your page, but they're all
small time sites like some our personal sites, run by
people who are interested in music, but they don't really
have any presence in the industry and no one's really
linking to their page, so they're not ranked super high
in Google's estimation. Some of them might be even worse

(39:44):
than that. Some of them might be link farms. Link farms.
You don't really see them that much these days, but
in the nineties they were everywhere. They only existed to
link to other pages, and it was in an effort
to boost those other pages rankings in search. So if
you navigated to one, let's say you do a search
for a term and you click on the link, you

(40:07):
end up looking at a bunch of completely disconnected titles
and U r l s and that's it. There's no
other content on the page. It's just a listing of
links to different sites with no rhyme or reason to them.
Those would also be very low in Google's trustworthiness according
to its algorithm, because obviously the only reason they're existing

(40:30):
is to try and game the system, to try and say, well,
let's just add a lot more links to this page
and that will boost its its relevance. So if that
were the case, if most of the links going to
your page were either from small potatoes websites or they
were from link farms. Your page rank wouldn't be boosted

(40:51):
very high. It might be higher than it would be
if there were no links going to your page at all,
but it's not a huge help. Now let's consider scenario two.
Let's say your blog only has a few sites linking
to it, a couple of dozen maybe, But those sites
are doozies. Maybe they include record labels that are in
the music industry. Maybe it's other outlets that cover music news.

(41:14):
Maybe it includes some news websites that use your blog
as a source for stories. Now those sites have a
much higher level of trustworthiness for Google, and so or
you know, in Google's estimation, I should say, and those
links matter more. So maybe in scenario one you have
a thousand tiny sites linking to you, and scenario to
you just have a couple of dozen of the really

(41:36):
big sites linking to you. Page rank would favor scenario
two over scenario one, reasoning that if your blog is
good enough to get the attention and support of those
trusted entities, it must be a really good resource, and
so your site would get boosted in search results. Now
that helped address a troublesome trend with search. I mentioned

(41:56):
link farms. That was one problem. So any search engine
that looked at back linking UM could be fooled through
link farms that were just there to to boost that number.
In the nineties, it wasn't unusual to encounter that. I
can't tell you how many times it happened to me
when I was doing a search for, you know, a
fairly obscure type of topic, and I just would come

(42:19):
across a link farm to all sorts of stuff that
was most of which was totally not relevant to what
I wanted. UM. Those were really frustrating, and so that
that was one thing that people would do to try
and game the system. But another was an equally annoying tactic. UH.

(42:39):
People wanted folks to come to their web pages really badly.
They were in the old old days. There were even
web page counters, a little it looked like a little
digit counter that would tell you how many people had
been to that website, and it became kind of a
badge of honor among early web developers if that number
were particularly high, because it showed that a lot of

(42:59):
people were visiting your site, and it was kind of
a prestige thing um and also could mean money because
if you were using web advertising to support your your
web site and that number was getting really really high,
and then you had more page views, and more page
views would mean more cash from the advertisers, So there
was an actual, you know, financial reason to try and

(43:21):
get more people to come to your web page, and
not everybody played fair and square. Sometimes web developers would
include an incredibly long list of popular search terms on
the web page. Usually would be at the very bottom
of the web page in tiny font and so that's

(43:42):
the only place where your search terms would show up.
The rest of the web page would be about something
entirely different, and then you do a search on the
web page for the terms you were looking for. It
turns out there just in this list of random or
seemingly random search terms, it's really the most popular search
terms that people could come across, and they were just

(44:02):
dumping them all at the bottom of their web pages,
and that way their web page would pop up in
all these sorts of searches, and people would end up
going to their web page without knowing that it wasn't
really about what they were hoping for That was really
frustrating for a lot of people, including myself, because you know,
you're obviously you're searching for something because you want to
get that content, but then you end up going to

(44:25):
a web page that's not about that at all. It's
not a good experience, So it was a terrible way
to have people come to your web page. However, if
your goal was just to get those views so that
you could get that ad money, people were willing to
do it. Um maybe it was a successful strategy for
people who were maybe running an online store, but I

(44:46):
can't imagine it would be worked too well. I mean,
if I'm looking for information about quantum mechanics and I
end up being dumped in some store that's selling baseball
caps that have nothing to do with anything, I'm probably
just gonna be mad. But anyway, that was one of
the other approaches people were taking, was trying to include
this text. Sometimes they would even hide it. They would

(45:07):
have a big section of the web page where the
font had the same color as the background text, so
you couldn't see it just when you're reading through the
web page, but it could be read by bots as
they're crawling through all this material. Uh So, search engine
developers got into kind of a seesaw battle with web

(45:31):
developers to try and get around these tricks. One of
the things they started to do Google was one of
them was focused on the text in the actual body
of the document itself and then ignore information that might
be in the headers or footers, which was typically where
people were putting these laundry lists of popular search terms.

(45:52):
So Google got around that by saying, Okay, well, we're
no longer worried about the text that's in the head
or the footer. We're just concentrating on what's in the
body of the page. And Google's approach really improved upon relevance,
the search results were just better than most of the competitors.
You know, you you were more likely to come across
something the stuff that you know represented what you wanted,

(46:14):
and so Google was able to tap into advertising revenue
because they were able to really give people what they wanted.
Advertisers wanted to be included with that, and Google began
listing ads supported results with the top returns for queries.
So it meant that you know, the stuff that people
most wanted to see, you would get ads served right

(46:36):
with that Uh, there's a very attractive proposition, and it
positioned the company well enough to survive the dot com
bubble burst of two thousand and two thousand one, and
many of its competitors either merged with other companies as
I mentioned, or they completely went under. The Google remained
around and then was able to actually seriously grow in

(46:56):
the two thousand's. Uh. There were a couple of discussions
with other company needs early on, including Excite, that could
have led to Google getting acquired, but none of that
came to fruition, and Google remained its own company and
continue to build on its success. And Google would evolve
its algorithm trying to crack the nut of deciphering the
meaning of text inside web pages. So not just here

(47:20):
are the web pages that include the terms that you
search for, but here are the ones that included in
the way that you meant including a improving it so
that it can recognize natural language and not just you know,
lists of search terms. Pairing that with the page rank
kind of approach would give Google the information and needed

(47:41):
to really rank its results and necessitated the search engine
optimization strategy that that became a whole new industry. Ranking
well in search was a really good way to get
serious Internet traffic to a site. People made entire careers
out of figuring out the best way to rank well
in search, which honestly mostly involves creating a compelling and

(48:03):
relevant web page or website that makes people want to
link to it. Um it's easier said than done. It's
that was the best way to rank well within Google's search. Occasionally,
Google would tweak things so that your site, if it
was particularly good, which just rise to the top because
Google recognized that they might wait your site more heavily

(48:24):
than other sites. Um. But it also led to companies
learning the hard lesson that depending upon search traffic is
a risky thing to do. Every time Google changes its
search algorithm, it affects search rankings. So you might be
doing really well for years, and then suddenly you see
a massive drop off and visitor numbers because Google changed

(48:47):
its algorithm and your page no longer ranks as well
in search results as it used to. So in a
future episode, I plan on getting some s e O
experts on the show and have them talk about the
challenges of developing a good strategy to rank well in
search and what other strategies people might consider if they
want to promote their traffic to sites and services. You know,

(49:10):
it's it's tricky stuff because again, it might work great
today and then tomorrow it might not work at all.
So there there's a real strong push among web developers
to try and find alternatives to search engine traffic being
your main way of getting people into your website. Um. Also,

(49:32):
if people are just searching for content and then popping
over to your site, uh, and they read one page
that is relevant to whatever their search engine query was,
they're not likely to stick around unless they go down
sort of the Wikipedia rabbit hole. They're more likely to bounce.
And this was a problem we saw at the House

(49:52):
Stuff Works website all the time, is that we could
get great search engine traffic. People were looking for specific
answers to question and we had articles that answered those questions,
so people would come and read those articles. Now, what
would be ideal for us is that people say, this
is a great site, I want to read more articles.
Let's just see what's here. But the reality was most

(50:12):
people would come in, read whatever they wanted and then leave. Um,
they wouldn't stick around to read other stuff. And and
it was a real challenge One of the things that
we always tried to do was figure out how to
create a site that was a destination all of its own.
That you're not going there because a search engine told
you to. You're going there because you love the site

(50:33):
and you want to read more of the stuff on there. Um.
That was always our goal. It was always very, very
challenging because there's a ton of websites out there, and
there's a ton of really great content, So making sure
that yours can stand up to everybody else's is a
heck of a challenge. It's a hard thing to do.
I think the site does a great job of it, um,

(50:53):
but it was one of those things that we were
always striving toward. In the end, Google one out because
it had grown too large before the bubble burst, so
it hadn't spread its assets out too thin, it wasn't
in incredible amounts of debt, so it was able to
weather that storm, and then it was able to build
on its success, and it had developed a search engine

(51:14):
tool that people felt returned the best results and they
put a ton of trust in it. Ultimately, Google would
become this enormous company that would be able to gather
huge amounts of data from its users and put that
to use as well, and that made it a very
valuable resource for advertisers, and that's kind of how Google
won the search engine war. Now, we'll talk about other

(51:38):
stuff related search engines in the future, but our next
episode is going to be about something totally different. Um
And I'm just doing a few one off episodes because
after doing that arc of seven episodes about the media
and its relationship to us and and technology, I felt
like we kind of needed to do some one offs.

(51:58):
So the next one's gonna be an their entertainment related podcast,
but it will be another one off. If you guys
have suggestions for future topics I should tackle, why not
send me an email address is tech stuff at how
stuff works dot com or hop on over to our
website that's tech stuff podcast dot com. You will find
the archive of all of our shows. There, you'll find

(52:18):
links to our social media sites. You'll find a link
to our online store, where every purchase you make goes
to help the show and we greatly appreciate it and
I will talk to you again really soon. Text Stuff
is a production of I Heart Radio's How Stuff Works.
For more podcasts from my Heart radio, visit the I

(52:40):
heart Radio app, Apple podcasts, or wherever you listen to
your favorite shows.

All Episodes

Episode Transcript

TechStuff News

Follow Us On

Hosts And Creators

Oz Woloshyn

Karah Preiss

Show Links

Popular Podcasts

Bookmarked by Reese's Book Club

Dateline NBC

Stuff You Should Know

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}How Google Won the Search War