Drowning in Data

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Speaker 1 (00:00):
Brought to you by Toyota. Let's go places. Welcome to
Forward Thinking, either everyone, and welcome to Forward Thinking, the
podcast that looks at the future and says, what's the story,
Morning Glory, what's the word hummingbird? I'm Jonathan Strickland, I'm

(00:21):
laurenco and I'm Joe McCormick, and we have jazz hands
and today you know it totally does not translate to radio.
But anyway, we're doing Yeah, and we were. Actually I wasn't,
but you two were honesty and podcasting. That's true. So
today we wanted to talk about the concept of big
data or big data, which is not a gigantic character

(00:44):
from Star Trek the Next Generation. Unfortunately, No, that would
have been both interesting and terrifying. But that is not
the case. Actually, I guess you could still argue that
big data is both interesting and to some people terrifying. Yeah,
it's it's so what is data? I've got a couple
of official definitions, and then Joe, I think you have
your own definition, So let me let me go through

(01:06):
these quote unquote official definitions. These are from IBM, and
IBM is one of those companies that has a lot
invested in big data in general and big data management
and so in a paper called Demystifying Big Data. Here's
one of the definitions, which is big data is a
phenomenon defined by the rapid acceleration and the expanding volume
of high velocity, complex, and diverse types of data. Big

(01:29):
data is often defined along three dimensions, volume, velocity, and variety.
And then the other definition is big data as a
term that describes large volumes of high velocity, complex and
variable data that require advanced techniques and technologies to enable
to capture, storage, distribution, management, and analysis of the information. So, Joe,
what's your what's your definition? That was a lot of words. Yeah,

(01:51):
it was, uh, well it just seems to me and
I'm no expert here, but big what's the difference between
just a lot of data to and then big data.
That's a good question because we've had a lot of
data before, but suddenly there's sort of this new paradigm
where we have to think about, oh, it's big data.
It's a separate thing. It's not just a matter of degrees.

(02:13):
And I think it's the point at which our intuition
kicks in and tells us something weird is going on, right,
there's so much here to handle that I can't even
imagine handling it. Yeah, it's it's it's when you suddenly realize, oh,
hold onto your butts. The phrase was actually entered into
the Oxford English Dictionary just this quarter, like this month.

(02:35):
As of June, they entered it into their quarterly online
update and and there there there's a definition that is
very much like the one that Jonathan read off, but
slightly more succinct. Um, well, it's the O E. D Well.
They also include several several examples of it of its
use um the original going back to when a social

(02:58):
historian um by name of C. Tilly referenced it. But
in terms of meaning being obscured when you are in
the presence of an increasingly complex system of of of information. Right,
so it has something to do with it's hard for
us to grasp intuitively. Right. Well, well, yeah, it's kind

(03:19):
of like, if you want to boil it down to
a cliche, it's the whole not being able to see
the forest for the trees. Like you're able to see
the stuff that's immediately around you, but when you're trying
to get a bigger picture, your perspective is blocked by
the fact that there's just so much there. So if
you can't get a good grasp on the big picture,
so exactly how many trees are we talking about here, Well,

(03:40):
let's let's boil that down. Let's boil that down to
talking about how we measure data in the computer world.
And for anyone who has any background in computers at all,
this is probably going to seem super basic to you.
But it's important to have the building blocks there for
us to understand the enormity of big data. Right, if
you go look on the internet, it seems like a
lot of times people confuse terms like data and facts

(04:04):
and information like they just use them interchangeably. Data and
information are you know, of course that they are synonyms.
But when we're talking about data in the terms of computation,
and then we're talking about bites, right, So a bite
is eight bits, and one bite can represent one character.
So when I'm talking about character, I'm talking about like

(04:24):
a letter or a number or a symbol. I'm not
talking about Jean Valjean from La Misseraba. That's totally different
kind of character. So eight bits can be one character,
So it takes about ten or bytes, So eighty bits
total ten bytes to make up about you know, one
word or so. Uh So that's your basic unit of data.

(04:45):
So if we look at kill a bite, that's we
we say it's a thousand bytes. Technically it's one thousand,
twenty four bytes. And in counting, yeah, this this gets
a little, this gets a little complex as we go,
high are up, so I will be rounding down to
the nearest. Uh yeah, because otherwise, uh we'll be spending

(05:10):
the entire podcast just listening to me read out an
incredibly long number. I'll give an example of that when
I get to it, but I'll skip over most of
them anyway. So a kilobyte will rough says roughly a
thousand bites. So if you were to type out a
page of text, that would be about two kilobytes of information.
Now that's if you're just typing text, not like images
or anything else. But that would be about two kilobytes. Now,

(05:31):
if you had a low res photo, that's probably around
a hundred kilobytes, maybe fewer, depending upon the resolution. I
mean that you know some things that if you save
it for the web, it can be between like thirty
and a hundred kilobytes or so. Uh, the next step
up is megabyte. So that's technically one million, forty eight thousand,

(05:51):
five D seventy six bites, but we'll usually just say
one million. And that's the last I'm going to do
of the specific numbers. Uh, so high res photo could
be at east two megabytes. Five megabytes is enough to
hold the complete works of Shakespeare. So now wait a minute,
would that be plain text or formatted documents? It would
be essentially plain text? Yeah, plain text would be about

(06:14):
five megabytes Shakespeare. Yeah, now if you want to if
you want there's there's all these bears pursuing you in
Winter's Tale, so you got to fill that out. Uh.
The a CD ROM, do you guys remember those? There's
still they still exist. I have one in my computer
in front of us, an optical drive. They round, they

(06:36):
were uh you know, along one side, but if you
looked at them in profile, they were flat. That's what
you used to play missed right, Yes it was in fact,
but a CD ROM could hold between about six fifty
and nine hundred megabytes. And we're talking about the twelve
centimeter discs, not the eight centimeter discs, because they had
many discs as well. It wasn't as popular here in

(06:57):
the United States, but in Asia they were very popular.
Then you have gigabyte that ends up being the billion
mark of bytes. Again, I'm just simplifying here about one
gigabyte can hold a broadcast quality movie. By the way,
I remember when I had a computer that had a
I think three gigabyte hard drive, and that was so huge. Yeah.

(07:20):
I remember when I got a two and fifty six
megabyte hard drive and I thought that there's no way
anyone could fill up that budget space. And look at
what I know. A twenty gigabyte drive could hold the
high fidelity recordings of the entire works of Beethoven. A
fifty gigabyte hard drive is equivalent to a floor of
books in a typical library. The next step up is terabyte,

(07:42):
which is one trillion bytes. That two terabytes would be
equivalent to an academic research library, and ten terabytes would
be equivalent to all of the printed collection at the
United States Library of Congress, just the printed materials. It's
amazing how much cheaper the storage has been come over
the years. What does it cost to go buy like
a one tero byte external hard drate? It all depends

(08:06):
on where you go, but you're talking around a hundred
couple hundred dollars at most for most places. And and
the thing is that you know, the the technology has
improved over time and the manufacturing processes have improved over time,
which has brought the price down over time. But we're
not done yet. We gotta go back. So if you
that was one trillion bites, let's say you one of

(08:26):
the levels, let's say you want to quadrup Well, the
numbers we're going to be talking about are way bigger
than well, they don't know, we haven't said it yet.
So next is A is one quadrillion bites. That's a
that's a pedo byte and uh, to pedal bytes would
be all US academic research libraries combined. Two hundred peda

(08:46):
bytes would be all printed material everywhere. Uh. If you
went up to one quintillion bites, I don't know how
many zeros that is, it's a lot. Uh, that's exabyte
and five exabytes would be enough to contain all the
words ever spoken by human beings. If you were to

(09:07):
break down all the words we've ever spoken into bites,
it would fit within five exabytes. Did we say whose
estimates these are? Oh? Well, these are actually estimates that
are up all over the place there. You know. IBM
actually cites these as their their benchmarks as well. Yeah,

(09:28):
and then the next two levels up you want to
go up. So we've done quintillions, So sextillion would be
zeta bite and I guess septilian would be YadA bite.
So anyway, or yakta YadA, not yoda, not yoda, not
yakta not yakta YadA. So so anyway, those those are

(09:50):
the scales exabite. We get to the point where it's
all the words we've ever spoken, right, So that gives
you the idea of of the basics of bites and
how much information is equivalent to you know, various real
world examples. So let's talk about the amount of information
that we create on a daily basis. Soil for example,

(10:11):
of corey DIABM, we create about twelve terabytes of tweets
in one day. That's twelve trillion bytes of messages at
a hundred forty characters or fewer tweets alone. And Twitter
isn't even by far the most popular social networking No
is not is not the most popular at all. And

(10:32):
yet that's not even rich text isn't. That's just plain text.
It's plain texts short messaging service really is what it is.
So so remember too, terabytes is equivalent to a single
academic research library. So you've got the equivalent of six
research libraries academic research libraries, just in tweets alone. Now,
I'm not saying that you're going to be able to

(10:52):
research your next uh thesis only using Twitter. It's not
necessarily useful data that's out there, but that's how much,
you know, once you put it together. Lots of people
are are tracking trends and keywords and and you know,
sure all kinds of things. There are lots of useful
Twitter to predict the stock market and earthquakes, and they're

(11:15):
using Twitter for all sorts of stuff. Yeah, consumer behavior,
Not that we're endorsing those ideas, by the way, but
going back to the kind of data that we actually
are producing on a daily basis. Back in two thousand twelve,
when Facebook still had just under one billion registered users,
they were collecting According to Facebook, they did an earnings

(11:36):
call where they said they were collecting about five hundred
terabytes per day from users. So that's all the stuff
that everyone is doing on Facebook, whether they are posting
a status update or liking a page or sharing a link.
All of that was folded into this number. But five
hundred terabytes every day. So then you've got YouTube. You guys,

(12:00):
of course have heard the famous UH stat Now it
is one hundred hours of video that's uploaded every minute
between the time we shot that video and by the
time we but between the time we wrote it and
the time we shot, really time we shot in the
time it published exactly. Yeah, it went from seventy two.

(12:21):
Technically it had been growing all that time, but Google
gave the official announcement. Yeah, so one hundred hours of
YouTube footage is upload every minute. So that means that
in a single day you get about a hundred and
forty four thousand hours of video on YouTube added every day.
That's sick. It's a little it's a little stomach churning actually. Um.

(12:44):
And so if you were to look at all the
data that we are creating, and this is beyond social media,
we're talking about all the information being created not just
by human beings, but by things like sensors that are
connected to computer systems and are sending that data in
So things like whether uh sensors or traffic sensors, the
cameras we've talked about in the past. All of this

(13:05):
stuff combined generates about two point five quintillion bytes of
data also known as XA bytes every single day. Well,
that makes me curious, does more data come from human
entry or from other machines? Right now, humans are actually
generating most of that data. Eight percent of it, in fact,

(13:28):
is coming from unstructured information, which includes things like email, video, blogs, uh,
social media, call center conversations. All of this goes into
that that data that's going to change. Probably if you
remember our Internet of Things episode, we talked a lot
about how we're going to be living in this world
where our environments are going to be constantly collecting Yeah,

(13:50):
the more devices are collecting information about us and yeah,
yeah it Well, it'll it'll get to a point where
we'll start to see that number probably uh fluctuate quite
a bit. We'll see that percentage drop for human produced
data versus you know, automated sensors. So two point five
quintillion bytes of data produced every day, that means every

(14:11):
two days we produce as much information as all the
words we've ever spoken. So that's kind of incredible and also,
according to ibm UH, in the last two years, we
have produced of all the world's data, meaning that everything
prior to those two years represents of all the data. Ever,

(14:33):
it's a it's a pretty crazy, crazily curved graph as
of So, so, you know, years ago at this point,
Eric Schmidt of Google said that we were creating as
much info every two days as as we had from
the dawn of human kind up through two thousand three. Yeah,
so in one point we created one point eight zeta
bytes of information globally UH in that year, and that

(14:57):
amount is expected to double every year. So that means
one point a zeta bites. If you want to know, like, okay,
well what does that mean to me? Like, how how
can I conceptualize this amount of information? That's equivalent to
two hundred billion two our HD movies? And if you

(15:19):
wanted to watch those two hundred billion HD movies and
just sit down and have a marathon, it would take
you forty seven million years to do it, no bathroom breaks. Yeah,
that that those these these numbers don't even make any
sense to me at that point. It's just you know,
and that's and that's the that's the key, right that

(15:41):
that's why we call it big data, because they are
such huge numbers that when you think about you're like,
what what can I even do with all this information?
How can I make use of it? But that's exactly
the thing, right, We're going to a point where it's
not so much that you're going to do something with it,
but the machines are going to do something right And
and there's some very creative processes that people have come

(16:04):
up with that break this down into more manageable problems
that machines can handle. Well, I shouldn't say that you're
not going to use it for sure. I mean, this
kind of thing is probably really useful to people who
are involved in, say marketing. Well, it's useful for marketing,
but it's also and we'll do yeah, well, we'll do
a full episode on some of the applications of using data. Yeah,

(16:27):
that will be our next episode that we recorded. Me
But but yeah, there here's an example that all of
us in this room could use UH with big data.
Now we would not be actually doing the analyzing ourselves,
but we would benefit from the UH, the the the
actual work, and that would be traffic, real time traffic

(16:47):
reports So if you are using some sort of GPS
that allows you to get incoming traffic information that's being
gathered by various means, and there are different ways of
doing it depending upon what system you're using, then you
are essentially benefiting from the from the analysis of big

(17:07):
data because it's taking all this information about the actual
environment around you that's gathered from multiple sources and helping
you route the most efficient way. Right, does it dynamic routing?
Which is really a cool thing and and and obvious benefit.
But that's just one application. So how do you end

(17:27):
up navigating this much information? Like what what's the what's
the magic key to it? And there are actually a
couple different ways. I don't want to I don't want
to throw my co host under the bus because I
specifically look this stuff up. And so I ask the question.
They're like, I, uh, it's a lot throw a dart.
I don't know. Um, you use computers, that's good, Joe, supercomputers.

(17:53):
Perhaps you could use supercomputers. You could also use grid computing,
which is where you end up using a lot of
computers to work on a single problem. So you guys
have probably heard the term parallel processing, parallel parallel processing
is is an idea where you are able to take
certain kinds of computer problems where you can divide up
the problem into different UH sections, maybe even subsections of data,

(18:17):
and then assign each of those parts to a different processor.
And this this could happen within a computer, or you
could be talking about spreading it out lots of different computers.
So if you have a computer that has multiple cores,
for example a multi core processor, each core could be
taking part of this problem and working on it separately.
And really what you have is you have you essentially

(18:39):
have one unit acting as the director, and the director's
job is to take the problem and to divide it
up into manageable chunks, and then to parcel that out
to all the other elements of the system, whether it's
other computers or other processors or other cores. Their job
is to work on that particular part of the problem

(18:59):
and then send the results back to the master. The
master then takes all the results and has a collective
result as as the final product, and that's where you
get the answer that you're looking for UH. It's called
often this this approach is called a map reduce framework.
You're mapping out the problem and then routing it out

(19:20):
and then you reduce the problem that way, when all
the uh and when all the answers are sent back
to you, you you reduce all those answers into one answer.
So that's the whole process for taking generally a huge
problem and making it manageable. Now, the key to all
of this, and a lot of people and companies that
specialize in big data will tell everyone this is that

(19:42):
you can't just do anything with this information. What you
have to do is decide what is it that you
want to do with the information, specific thing are we
looking for, right, and then you build a system that
lets you get that from derive that from all the
data you have. So it's not like you just look
at it. Yeah, you can't just look at this big

(20:02):
ball of zeros and wines and then just magically draw
out the information you need. What you do and or
you just sit there and you look at and say,
you know, what can we learn from this? If you
look for a specific kind of pattern, like if you're
looking for a needle in a haystack, you're you're looking
for something shiny, for example, and and you know, if
you if you just go like, yeah, if you just say, well,

(20:23):
I'm looking for something kind of pointy and short. Then
that's not What is interesting is that when you get
information on this scale, this huge amount of information, you
can actually start to recognize patterns that otherwise would have
been completely Yeah, you would never have been able to Again,
it's the forest for the trees thing. You would never
have been able to have seen the forest because you
were right there in the middle of all those trees.

(20:45):
So the same sort of thing, you'd be able to
see these big patterns that happen. And that's where especially
things like marketing ends up being a big deal, because
you can see things like tendencies for customers to behave
in a certain way, and if you want them to
behave of a particular way, you can start to focus
on things that kind of guide them in that direction.
But I've even seen bizarre representations a thing online that

(21:07):
was the strange corn ucopia shape and it was just
labeled like the geometry of big data. Yeah, it's it's
a little like again, when we're talking about something so
huge that it's difficult for us to get a mental
grasp on it, trying to find a representation of that
that makes sense. To us is something of an uphill

(21:28):
battle it. You know, it's a lot of people have tried,
but it's really difficult to make this and make it
understandable in a way that doesn't just blow out the
scale immediately, where you know, you have the manageable amount
of data and then the spike just goes all the
way up through the top of the graph and you
can't see the topic. That's so it's really big. Other

(21:52):
ways to handle all that data, there's also uh the
approach of doing just real time analytics and streaming of data.
In this case, this would be kind of like traffic
example I gave earlier. So with traffic, you have all
these sensors gathering data, and then you have uh that
analysis and streaming of the data happening immediately, and then
you get the results. UH. In this case, you don't

(22:14):
have to worry so much about storing lots of data
because it doesn't really matter if there was a slow
spot on the highway two hours ago. What matters is
what's going on right now, So you don't have to
worry about building these huge data centers to store all
that information. You just have to build a system that's
large enough to handle incoming information and give outgoing information.

(22:38):
So you have to have a good input output basis. Uh,
that's what's important in those types of systems. Now, in
the other types of systems where you are collecting and
analyzing enormous amounts of information, you have to have a
place for that information to live, and that's where storage
comes into play. And that's where we see these enormous
data centers things that buildings are specifically made to house

(22:59):
data servers. So it's if you were to walk into
one of these places, it essentially would look like a
huge warehouse or maybe even like an airline hangar. I mean,
these buildings can be enormous and they're filled with shelves
of servers. They usually have massive HVAC systems. Sometimes they

(23:21):
ideally they are because of course, the warmer things get,
the more poorly technology can perform. To an extent. You
don't want to super cool everything because then it can
have its own problems, but you do want it to
maintain a decent operable temperature. So you might have even
a water cooling system and not just air cooling. They
have to have sort of a distributed redundancy to don't

(23:44):
like because if one machine dies and with that many machines,
you know, just every so often several machines are going
to die. You you can't lose something, right. So for example, Google,
which is that's a great example, because Google has lots
of out of centers, uh, and that Google uses famously.
They use fairly inexpensive servers in the grand scheme of things.

(24:08):
They're not buying the top of the line, fresh off
the manufacturing plant servers. They want things that are plentiful
and easy to replace. Yeah, they're going for efficiency, not
high power. Yeah, they don't. They don't need each server
to be able to handle the workload of three other servers.
They want things that are going to be reliable and
if it does break down, make it easy to switch

(24:30):
it out with something else. But on their system, they
do have lots of redundancy. And it's this idea that
you know, stuff breaks, machines go down, power goes out.
With that many it's just statistically guaranteed exactly you know
what's going to happen. So the way you protect against
it is that you build extra you have extra machines

(24:50):
involved there so that some of them have a little
bit of information from like if you have servers A
through Z, server DEM might have a little bit of
information from server A, and then maybe even server you know,
J has a little bit from server A. So it's
the ideas that you've spread it out so that if
anyone server goes down, you still have access to that information,
so that there's no interruption in service. Uh. Now there

(25:13):
are cases of servers that have gone down where that
was the only really source of that information and that
has been a spectacular failure. You mean my data is
not safe. Well, I'm not going to say that, Joe.
That's not Let's not spread fear, uncertainty, in doubt. That's
not what this podcast is about. Now, It's not that.

(25:34):
It's just that there have been times in the past
where it became clear like this is the way to go,
The redundancy way is the way to go, and I
would I would say that, you know, I can't imagine
any operation of the size that would involve big data
would not also have redundancy plans there. And then of
course that does mean that you have to have even
more machines than what you would require at minimum, and

(25:57):
that requirement is constantly going up. But as we're generating
more and more data every day. So then the question
becomes data management. You know, how long do you keep
that information? At what point do you you know, do
you ever wipe uh a drive so that you can
you know, fill it up again? Or it all depends
on what your your services and what the purpose of

(26:18):
it is. But uh, yeah, I mean that's it's it's
kind of crazy, like how much how much infrastructure needs
to exist just so that these zeros and ones have
a home? Well, so if we extrapolate that outward, that
leads me to what seems like kind of a weird
philosophical question almost Uh, is there a limit to the

(26:41):
kind of data we can process? And ultimately, if you
say no, there's no real physical limit, there's no necessary limit.
Is it possible to represent the entire universe, all of
reality as data? Or is there something about the universe
that can't ever be reduced to information? I am going

(27:01):
to tackle your question in multiple parts. Part the first, uh,
is there a limit? I I hesitate to ever say
that there is a limit in the sense that there
there's always more innovation that allows us to do bigger
and better things. But I will say there is a
limit to the amount of energy that is in the universe, right, yeah, yeah,

(27:24):
I was just thinking about like, well, they're say physical constants,
and there's like the speed of light and stuff like that,
but we're talking about things that ever represent a barrier. Well,
I don't think we found any kind of physical law
that states that once you get to this amount of data,
there's no amount of parsing that you can do to
make it useful. I don't think, well, certainly, we haven't

(27:47):
encountered that yet. I don't think it's I don't think
that's possible, simply because as we get more and more data,
we're also building more and more powerful machines that can
handle larger amounts of data. And if we're able to
break down those problems into smaller bits anyway, then really
the limitation we're the limiting factor we're looking at here
is energy, not computering power. So although at the current moment,

(28:12):
some computer scientists are concerned about the amount of data
that we're crunching versus there they're saying that the amount
of data that we're creating is um fast, outstripping More's
law in terms of how fast processors are going. Sure,
there will there will be bottlenecks. I mean, there will
obviously be bald next. But if you're looking at truly philosophical,
idealistic approach, you're essentially saying that eventually you could create

(28:35):
more data than you could possibly process, only because you
don't have enough there's not enough energy in the universe
to run all the processors you would need to handle
that much data. There's that or like I had the
crazy absurd idea. I know this is silly, but like
you've got so much data that the server farm is
so big that the pieces of information are too far

(28:55):
apart to communicate with each other efficiently. Like physically, I
can see what you're saying. So you're saying like like
we had we had yeah, if we had planets, it
all depends on on Yeah, I can see what you're saying.
So let's say that Let's say that we're filling up
space with servers. This again is a very philosophical kind

(29:18):
of you know, thought experiment approach. But let's say we're
filling up fifty years out where he filled up space
with with with computer servers, and you were just packing
space with the servers so that you could process more
and more data. You could get to all right, enough
of the comedy. I'm trying to make a point here.

(29:39):
You can't. You could get to a point in that
in that theoretically where you've got a server that literally
is light years away, could be physically next to billions
of other servers, but it's away from where. Yeah, then
you're talking about you're limited by the speed of communication.
Not it's again not really the p assessing limitation. It's

(30:01):
the speed of light that's limiting you. But but you'd
be able to do it, it would just take time. Um,
your other question, could all of the universe, all reality itself,
be broken down into data? I don't know, because first
of all, we're only able to observe part of the universe,
and we don't know how much of the rest of
it there is. Uh. But assuming that we could, that

(30:25):
would mean that all right, let's let's assume that it
is possible to break down all of reality into zeros
and one sure, sure, in that twenty fifty years, we
have figured out what dark matter is and we have
observed all of it. Right, We've got we've got to
we've got it down. Our fingers on the pulse of
the universe. And we know what makes it tick, and
then we can actually create a simulation of that because
we know that, we can break down the universe itself

(30:47):
into what you know, into data, make that transition. That
would then raise the argument of well, if we could
do that, then theoretically we could create a simulation of
our universe on a smaller scale digitally and then be
able to run interesting numbers on you know, what would
have happened if we had tweaked this protein in this protozoa,
Or what would have happened if there had been more

(31:09):
antimatter particles rather than matter particles, or what would have
happened if yeah, if I mean that that age old question,
if if someone had gone back and killed Hitler, and yeah,
to the point where you could actually, you know, theoretically
create life virtual life, which then raises the question, wait,
if that is possible, If all of that is possible
for us to be able to make this, to break
down the universe into a simulation and make it ourselves

(31:31):
and be able to watch it, really we will. That means, yes,
we will one day do that because we're people and
we're curious and we want to do that, which means
which means Yeah, that that means that that the highest
likelihood is that we are in fact living in a
computer simulation right now, because if it's possible, then we
will do it, and if we will do it, then

(31:52):
we probably already have done it, and that we the
people who are living in this reality right now, are
in fact living through a computer simulation, which could be
a computer simulation, which could be a computer simulation. Yeah,
that's what what I started thinking about when we really
got into the absurdities. Here is this kind of snake
eating its own tail sort of thing, like, well, imagine

(32:12):
you could represent the entire universe somehow is data that
universe representation would have to include all of the simulations
and data within the universe. Um. Well, if you were
creating a simulation of the universe, you could be selective
in what you were including and what you weren't claim.
But if you were trying to build a an actual like,
it's essentially like making a map to scale in a

(32:35):
one to one scale. Yeah, Like, here's my map of
Atlanta at one to one scale. It's the size of Atlanta.
Not very useful, Um, but anyway, this is this is
a philosophical argument that's been made before about whether or
not we're in a computer simulation, which really was more
about the idea that we probably will never get there
in the sense that you know, it wasn't It wasn't

(32:56):
that we definitely are living in a computer simulation, but
rather that humankind would very likely end its own existence
before reaching a point where we were capable of doing
such a thing. And you're talking about the point where
you're actually harnessing the power of stars themselves in order
to generate computer power you need. Well, that's pretty cool,
and I'm not so much skeptical about that as I

(33:17):
am about housing consciousness inside a you know, a computer processor, right, Well,
and and again we this is kind of getting way
off track, but but but there's the idea that if
you were able to create a simulation of a human brain,
there's no way of predicting whether or not it would
develop its own consciousness. Yeah, we don't know, because we

(33:38):
haven't been able to build a human brain on a
real time, uh scale, like a one to one scale
without you know, we we've built very small models that
could run in very slow amount of time. But well,
and and again, I mean, much like the universe we
really don't know what's going on inside the human brain.
There's so much of it that we don't understand simulating
the brain. That's it is big data. Yeah, so anyway, uh,

(34:02):
you know that, I guess that kind of wraps up
this whole overview of what big data is and why
it's And I know I keep saying big data and
switching to big data, but that's what I do every day.
But but yeah, this is big business, is what it
really boils down to, because companies are are trying to
harness all this information to make it meaningful in some way.

(34:22):
If if it weren't possible to do that, then we'd
probably see a lot of these services die off pretty
quickly because there just wouldn't be the support there to
financially to have them continue. They make money by serving advertising,
right right, there's the advertising, and then there's you know,
there's some companies that are not revenue supported, but they

(34:43):
are supported by investors, and a lot of these these
investors are saying, look, I know that right now there's
no direct way that this service is making money, but
the data it generates is intrinsically valuable, and as soon
as we figure out a way of leveraging that data.
We make all of our investments back, So I mean,
you know, it's a it's a money game to step

(35:04):
for profits. It's exactly all right. Well, that wraps up
this conversation about big data and underpants gnomes. If you
guys have any suggestions for future episodes of forward Thinking,
I recommend you get in touch with us. Send us
an email. That's FW Thinking at discovery dot com, or
go to f W thinking dot com. Check out our blogs,
check out our podcasts, check out the social media, check

(35:27):
out all of the links we have there. We've got
some really cool content that we want to share with
you guys, and we want you to be part of
the conversation. So come on and join us and we
will talk to you again really soon. For more on
this topic and the future of technology, visit forward thinking
dot com, brought to you by Toyota. Let's go Places,

All Episodes

Episode Transcript

Fw:Thinking News

Follow Us On

Hosts And Creators

Jonathan Strickland

Joe McCormick

Lauren Vogelbaum

Show Links

Popular Podcasts

United States of Kennedy

Dateline NBC

Bookmarked by Reese's Book Club

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}Drowning in Data