How Does the Wayback Machine Work?

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Speaker 1 (00:01):
Welcome to brain Stuff production of I Heart Radio. Hey
brain Stuff, Lauren vog Obam here. If a tree falls
in a forest doesn't really make a sound? And if
a website changes overnight, did its previous homepage ever really
exist in the first place. Because so much of our
world is increasingly digital and ephemeral, it's not just a

(00:23):
philosophical question, it's also a simple matter of history. That's
why the way Back Machine, which features step shots of
websites as they age and change, is such a fascinating
glimpse into the dusty corners of the web. The way
Back Machine is a massive digital archive meant to preserve
web pages that would otherwise be permanently lost to time.
Without this horde of data, every time a page was

(00:45):
updated or deleted, it would simply vanish, as if it
had never been there. Mark Graham, the director of the
way Back Machine, noted in Entrepreneur article that the average
life expectancy of a web page is about a hundred days.
There are a multitude of reasons why these web pages disappear.
A site creators move on to other projects, web hosting
companies go bankrupt, or maybe the pages moved or replaced

(01:09):
with new data and content. One place you may have
seen the way back machines work. More than eleven million
web pages referenced in Wikipedia articles have gone bad over
the years. In other words, they now return a four
oh four or page not found error because they've been archived.
In the way Back Machine. Technicians there were able to
edit those Wikipedia pages, so the references now point to

(01:31):
archived versions of those defunct u r l s. The
way Back Machine is the brainchild of Brewster Kale and
Bruce Giliad, who also founded the Internet Archive, which is
a digital library of websites, books, audio and video recordings,
and software. Both projects are San Francisco based nonprofits. Kale
and Gilliatt also created Alexa Internet, which analyzes web traffic

(01:52):
patterns and was sold to Amazon. Project director Graham said
via email they with Kale and Gilad, had started to
archive web pages in and in two thousand one launched
the way Back Machine to support discovery and playback of
those archived web resources and yes, the name was inspired
by the nineteen sixties cartoon series The Rocky and Bullwinkle Show.

(02:15):
In the cartoon, the way Back w A B a c.
Machine was a plot device used to transport the characters Mr.
Peabody and Sherman back in time to visit important events
in human history. In a world where there are more
than one point seven billion websites, with the number climbing
dramatically by the day, how can anyone possibly hope to

(02:35):
catalog so many web pages? The way Back Machine uses
what are called crawlers, a type of software that automatically
moves through the web, taking snapshots of billions of sites
as it goes. Some of the process is automated, but
many of the requests are generated manually by a network
of librarians who prioritize certain types of sites that they
think are important to preserve for posterity and for future generations.

(03:00):
The crawlers don't capture every iteration of sites. The frequency
of snapshots differs by these sites importance. Very significant sites
might be recorded every few hours. Others might be logged
weeks or months apart. Most aren't logged at all, So
don't worry that embarrassing fan website you made in high
school is probably long gone by now. The way Back

(03:20):
Machine aims to capture snapshots of important content, say the
breaking news headlines created by major media companies, Furthermore, it
doesn't necessarily recreate the entire site, and it doesn't preserve
the data in a way that you'd experience it with
your browser. It may only capture a few images of
a few pages and not preserve content that's linked to

(03:41):
other sites outside of the domain. But on a more
practical level, you've probably had the experience of clicking on
a link on a web page and getting a four
oh four or page dot found notation, and now you're
wondering what was on the page originally. That's where the
way back machine can help. To use the way back machine,
go to archive dot org slash web type the ur

(04:04):
L of the site you want to investigate in the
browse history search bar, and the results you'll see a
chronological barograph that shows how many times the site was
crawled and saved in a given year. Click the year
and blow You'll see a twelve month calendar with various
dates highlighted. Blue highlights mean the site was saved properly,
red means it was not. Click one of the highlighted

(04:24):
dates and the site stop shots will appear. Click on
one of those snapshots, and just like that, you've traveled
back in time to that older version of the site.
If you want to make sure that a particular site
is recorded to the archive, you can do so manually
use the save page now option to save a specific
page once, but realize that doing so only saves that
one page, not an entire website, and it doesn't guarantee

(04:47):
that the site will be crawled in the future. And
if content owners want their material excluded from the Wayback Machine,
they can submit a request by sending an email to
info at archive dot org. Graham's as that the most
amazing thing about the way Back Machine is that it
exists at all, and how much of the public web
it's able to preserve. Given that it has such a
small budget and team, they do use volunteers as well,

(05:11):
he said, with more support, we can do an even
better job of backing up more of the public web.
Funding for the Internet Archive and the way Back Machine
comes from a combination of earned income from our subscription
based web arcing service archive it dot org, major donors
and foundations, as well as contributions from more than a
hundred thousand individual donors. We love being able to give

(05:31):
away our services and don't run ads on our web pages.
He's sure that the way Back Machine will become even
more important in the future. Quote. As the nature of
how people communicate and share information evolves, so too we
will need to build technologies, processes, and partnerships to continue
to do the best job we can to preserve as
much of this public information as possible. All in support

(05:53):
of the way Back machines mission to help make the
web more useful and reliable, and in particular, to help
support your lists, activists, academics, historians, researchers, and the general public.
Today's episode was written by Nathan Chandler and produced by
Tyler Clay. Brain Stuff is production of I Heart Radio's

(06:14):
How Stuff Works. For more on this and lots of
other well archived topics, visit our home planet how stuff
Works dot com and for more podcasts for my heart
Radio but it's the I Heart Radio app, Apple Podcasts,
or wherever you listen to your favorite shows.

All Episodes

Episode Transcript

BrainStuff News

Follow Us On

Hosts And Creators

Josh Clark

Jonathan Strickland

Ben Bowlin

Lauren Vogelbaum

Cristen Conger

Christian Sager

Show Links

Popular Podcasts

Stuff You Should Know

24/7 News: The Latest

Crime Junkie

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}How Does the Wayback Machine Work?