Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Lynne Malcolm (00:05):
This podcast is made on the lands of the Wurundjeri people, the Woi Wurrung and the Boon Wurrung. We'd like to acknowledge and pay respects to their elders, both past and present and emerging.
From the Melbourne School of Psychological Sciences at the University
of Melbourne. This is PsychTalks.
Hello, I'm Lynne Malcolm here again with PsychTalks, our series which explores
today's big issues from the perspective of psychology research, This
(00:30):
episode is all about big data. Increasingly, it feels like
the information about us on the Web is being used
against us from insidious advertising to cyber attacks. But could
this data about our lives also be harnessed for the
greater good? To better understand the inner workings of the
(00:50):
human mind
To help get a better picture of what big data
means for us today? We're starting off in the world
of video sharing social media platform. TikTok - I will not be lectured about sexism
and misogyny by this man. I make feminist comedic content.
A lot of it can be
(01:11):
lip sinking or things off TV shows that I think
can apply to political situations.
This is Tiktoker Abbey Hansen, who goes by the Handle
Bar "Minorfauna", Abbey first gained popularity on the app with
(01:32):
her video lip sinking former Australian Prime Minister Julia Gillard's
iconic misogyny speech. But just as her video went viral,
she and other tiktokers faced having it all disappear. The
government announced that they were growing concerns around Tiktok users data,
and they sort of were making it very clear that
(01:54):
TikTok was owned by this company ByteDance which was a
Chinese owned company.
And they sort of made out as though
TikTok was this Chinese data mining tool that was disguised as
a social media platform. And so because of that, they
really wanted to ban it.
This was back in mid 2020 and Abbey found herself
(02:15):
caught up in the ensuing media storm. That video about
Julia Gillard's misogyny speech was circulating around on different social
media platforms and in the media and stuff. So different
media were reaching out to me for comments and stuff
like that.
And do you have privacy concerns about Tiktok? I mean,
(02:35):
not really. I have read the terms and conditions of
TikTok privacy stuff and it's very similar to Instagram and
Facebook and then TikTok . Australia also explained that user
data is not stored in China. It's actually stored
in Singapore and the U.S. And they had no intention
of sharing this data with the Chinese government, even if
(02:57):
it was requested that they share it. So
that sort of put me at ease. Intense of Privacy
Concerns a TikTok The Australian government ended up backing down
on its proposed TikTok ban. Yet the debate about what
harm TikTok and other online platforms and services pose
(03:17):
the data security and our privacy still rages on.
Big data means a lot of different things to different people,
I think. But I guess more recently, what it's really
come to mean is big personal data. So your clicks,
your GPS locations, your purchase histories. This is privacy and
data expert Professor Simon Dennis from the Melbourne School of
(03:39):
Psychological Sciences.
Companies have become more adept at gaining insights from that data,
and so that happens as it gets accumulated more. There
is the more insight you can draw. But it also
comes because the algorithms that the companies are using are
becoming more sophisticated.
This is particularly true in text analysis, where so called
(04:03):
neural net, or deep learning networks have become more and
more capable in terms of understanding what the text really means.
Simon Dennis (04:10):
Very often, you see these services represented as you're the
value of your customers sort of thing.
Lynne Malcolm (04:18):
We're also hearing from data security consultant Troy Hunt. Troy
says that businesses are increasingly turning to data aggregators. These
middlemen
are capable of analysing personal data across multiple sources to
build up a picture of who a person is and
what their habits and preferences are.
Troy Hunt (04:39):
Let's take the person's email address, and then we'll go
to one of these data aggregators and we can take
that email address, and we can then turn that into
where they live. Their demographic, their age group. Crikey! Look
at how far you can target with Facebook ads like
What do you like to eat for breakfast? Let's target
the people who eat musically rather than toast.
But then, if we go a little bit into the
(04:59):
shadier end, there have been numerous services out there that
have simply taken data breaches and sold the data. I
myself have been found in multiple data breaches from aggregators
who simply collect this information.
Lynne Malcolm (05:12):
According to Troy. The fact that there is so much
data out there makes us vulnerable to exploitation.
Troy Hunt (05:19):
I think maybe one way of looking at this is
that
because of the significant size of the data and the
amount of personal information in there, it does give nefarious
parties a whole range of different options in terms of
how they can target people from identity theft through to
account takeover through to just the simple frustration of endless
amounts of spam
(05:40):
that inevitably, very often are sourced from data breaches.
Lynne Malcolm (05:43):
One of the most notorious data breaches in recent memory
is the hacking of Ashley Madison. The attackers followed through
with a threat to publish the data of people who
signed up to the controversial dating service.
Troy Hunt (05:58):
Their modus operandi is, in fact, the strap line is
Life is short, have an affair,
and they rose to fame. I mean, the data breach
was 2015, but they rose to fame years before that
because they were so brazen about. Look, for many people,
they just need to have an affair. This is this
is a perfectly normal thing, and we built a service
to do it.
So they got a lot of press just based on
(06:20):
that brazenness. And then we got through to July 2015,
and someone claimed to have breached them, and they provided
a proof. They provided a couple of records which were verified,
and they said, Look,
either Ashley Madison shuts down or we dump all the data,
and they held good on that promise. So Ashley Madison
(06:40):
didn't shut down, the data got dumped. It was around
about 35 million records with very, very personal information. And
when you think if we sort of put the salaciousness
of it aside from you, think, what information would you
actually need to collect from someone
in order for them
to hook up with other people? Well, you would need
(07:00):
things like, What are your religious views, your ethnicity, your age,
maybe your socioeconomic group.
So you start getting all of these pieces of very
personal information, which you need to provide the service. But
then you have an incident like this and they get leaked.
And then, of course, the the twist of the knife,
if you like with Ashley Madison was that it wasn't
(07:22):
just all this personal information, but there was the stigma
of you being there to have an affair.
Lynne Malcolm (07:31):
Big Data has huge potential for positive consequences. Simon believes
it's possible to flip the script on big data. He
heads up the Complex Human Data Hub at the University
of Melbourne's School of Psychological Sciences.
Simon Dennis (07:46):
We're kind of on the nerdy side of psychologies. We're
building the math models and the computational models.
Lynne Malcolm (07:51):
But of course, those are the things that you need.
If you're going to deal with this big data and
be able to relate it to human behaviour
projects at the Complex Human Data Hub try to harness
the power of big data to get a more accurate
picture of what makes a person tick.
Simon Dennis (08:06):
What we're all about in the Hub is understanding. How
do people make decisions? How do we characterise individuals? How
do we characterise how those individuals operate within their groups
and big data? There is no doubt has a huge
part to play in better understanding how those processes work.
Lynne Malcolm (08:23):
Simon thinks big data could be particularly revolutionary in the
field of mental health capacity. To intervene in mental health,
even experts in the field will agree, is pretty limited actually,
so we have a number of techniques that we employ.
But the efficacy of those techniques is not great, and
(08:46):
we really haven't made huge progress over the last 50
years or so in being able to address those issues.
Simon Dennis (08:53):
For one thing, that big data may play a significant role,
because if you think about how we go about doing
this at the moment we have these sessions that clinician
will do with people. They'll take some neuro psych tests
at one point in time, and we use that to
get our picture of what that person is all about
right
and design an intervention for them.
(09:14):
And, yes, there's multiple problems with that one. That's just
one point in time, and we don't know what happened
for all of the rest of the time, and secondly,
sometimes it might be shocking. But sometimes people actually don't
tell you the entire truth.
The hope is that using big data will be able
to get a much finer grained understanding of that person
(09:34):
in the same way that Facebook or Google is getting
a very fine grained understanding of you in order to
figure out what to try to sell you. If we
were able to utilise the data in order to be
able to get that kind of level of precision. I
think that would allow us to design interventions that are
much more
appropriate for that given individual.
Lynne Malcolm (09:53):
One example of how this model could be applied is
in the treatment of bipolar disorder.
Simon Dennis (10:00):
As you may be aware, bipolar patients go through ups
and downs,
and sometimes when they're in their downs, it can be
quite problematic and even tragic in some circumstances. And so
you'd like to be able to predict when that's going
to happen so that health care professionals and family can
be alerted to the fact that they need to be
(10:21):
paying attention
so that they can provide support to those individuals.
Lynne Malcolm (10:26):
Simon and his colleagues ran a study that tracked bipolar
individuals using accelerometer data so data about their movements and
GPS data.
Simon Dennis (10:37):
The reason for that is to create these kind of
predictive models to be able to say okay, in two
weeks time. We think that this person is at risk
and so be careful
and in fact it looks like that might be possible.
There's still a lot of work to be done, but
it does seem like it is possible to predict when
these adverse events are going to occur.
(10:57):
The other main reason that you might want to do
it in the bipolar case is that bipolar patients are
often taking different kinds of drugs to control their condition.
And those drugs are. The dosages of those drugs are
often set so that there are sufficient for when they
are in their worst state.
(11:18):
But of course, they're not always in their worst state.
And so that means that a lot of the time
the dosage is actually more than what they strictly require
at that particular time. And that's problematic because the drugs
have long term consequences of side effects. And so what
we'd like to be able to do is avoid those
(11:39):
side effects. And
so if we could adjust dosage based on the big
data that's coming in about an individual, then we'd be
able to decrease the total amount of drug that an
individual receives and therefore ameliorate the side effects. Simon also
wants to open up the benefits of using big data
(11:59):
to other researchers in science and medicine.
Lynne Malcolm (12:02):
To do this, he leveraged off a system he had co
developed for his own research, called Unforgettable.
Simon Dennis (12:09):
Unforgettable really is, I guess, the commercial arm of the
same enterprise. So what we're hoping to do there is
just make it easier for researchers to be able to
use big data. So at the moment,
it's dominated by the big tech companies Facebook and Google,
and so forth. Because they're the ones that have the infrastructure.
They're the ones that have the resources to be able
(12:31):
to access and analyse that data.
And for an individual researcher in a university or other
research institution that's just not available right, if they have
to set up their own infrastructure for that on a
case by case basis, that's just prohibitively time consuming and expensive.
So Simon's team offer an alternative. A pool of data
(12:54):
collected via unforgettable
Unforgettable has an app, and the app is available for
android users and collects GPS accelerometers, and some obfuscated audio.
But that's only one of many ways over 600 ways
that we collect data using the Unforgettable system. So
(13:16):
we're able to connect to
services like Facebook and Twitter and so forth we're able
to connect to devices like your scales or things like that, um,
your washing machine, your car and so forth, and we're
also able to take data after the fact. So if
(13:38):
you've collected, say, data on Twitter over a long period
of time,
then we can get a user to download that data,
and we can enter that into the system.
So there are lots and lots of ways in which
the data comes into the system. But the user always
decides what data there are actually supplying, and they can
(13:58):
always turn the stream on or off, depending on what
they want.
Lynne Malcolm (14:03):
The idea is that individuals hooked up to Unforgettable can
then opt to share their data with researchers.
Simon Dennis (14:11):
To give an example of one. Some of my own
works in my area of interest is human memory. And
so one of the things that's always been a challenge
for us in memory research is that
we're kind of stuck in the laboratory so we can
tell you a lot about
how people remember lists of words. But not everyone's all
that enthralled by how people remember lists of words
(14:33):
and the kinds of issues that people might be more
interested in, things like how often Can people remember where
they were at any given time, for instance? So that
might be important. For instance,
if you're a person is giving an alibi and you
want to have an understanding of, how likely are they
to misremember that or, for instance, for Covid tracking? It's
(14:55):
obviously important to know how often people are just going
to simply misremember.
So we ran a study like that, and so it
was using the GPS coordinates. And then we asked people
after a week
where they've been at different times and asked them to select.
And they were right about 66% of the time, which,
of course, means they were wrong about 33% of the time.
(15:17):
So that allowed us to kind of get a sense, right?
So if you think of it in practical terms, that means,
you know, when you're going into a Ccovid interview, for instance,
then about a third of the time you're going to
be wrong when you when you answer queries about where
you were, and so that's important to know, so you
get a sense of what the reliability is that you're
(15:38):
dealing with.
Lynne Malcolm (15:38):
Crucially, this model aims to empower and renew berate the individual.
Simon Dennis (15:44):
One of the key aspects of this, I guess, which
we're hoping will make it different from some of the
current approaches is that it is licensing the data. So
the data continues to be owned by the participants, and
they're free to license that too many different researchers and
they get compensated for it. So the fact that they're
making choices on a case by case basis means that
(16:04):
hopefully they're making informed choices.
Troy Hunt (16:07):
Yeah,
there's this term that's being used a lot, which is
data breach, fatigue. People are just like crap another day. Oh,
it's just the Internet is normal now, you know, and they
get on with life.
Lynne Malcolm (16:20):
We're back with data consultant Troy Hunt. For over a decade,
Troy has run. Have I been pawned dot com, a
website that lets the public check if their email has
been compromised.
His site processes thousands of search requests each day. Yet
Troy believes that the onslaught of data breaches in recent
(16:41):
times hasn't been much of a deterrent.
Troy Hunt (16:44):
For the most part, there is not immediate impact on
most individuals in a data breach now. That's not to
say that makes it okay, but I think the experience
shows us that unless there is some immediate impact on individuals,
I do believe that they become quite
nonchalant
now. The other thing I think that that's really interesting
(17:04):
socially about all this is that we now have a
new generation of adults who have never known a time
where you didn't share information very liberally at a time
without social media, without sharing frankly inane volumes of information online.
And I think that there is now a fundamentally different
(17:25):
social tolerance to the extent to which we share information.
Lynne Malcolm (17:28):
Professor Simon Dennis.
Simon Dennis (17:31):
I'm not sure I'd entirely agree with Troy about the
younger generation, because I interact with them because their participants
on our platform, and there's certainly a more sophisticated understanding
of what you're giving away.
Lynne Malcolm (17:45):
But Simon also recognises that creating a market for selling
personal data has plenty of challenges.
For one thing, putting a price on that data.
Simon Dennis (17:55):
Because most people are just giving up their data for
free these days, it's not even in people's consciousness, right,
so GPS might be worth one thing accelerometers, another thing,
an image, another thing. My genetic code, something entirely different,
and so there's no kind of market
with that includes the individual
and for that data. And so we don't have any
(18:17):
of those kind of market signals that you might expect
to have for other kinds of commodities.
We were asking researchers to set what they thought was
an appropriate price.
They didn't have any idea. And so they were basically
coming back to us saying, Well, what should we put here?
So as a consequence of that, we've We've actually come
up with our own set of prices and we take
(18:38):
into account how much data is it and what kind
of data it is and etcetera, how much effort was
involved for you to collect it.
Lynne Malcolm (18:45):
And how much our participants on average being paid?
Simon Dennis (18:48):
Perhaps if I give you just kind of the range.
So we typically in my memory experiments collect data for, say,
somewhere between two weeks and three months, and sometimes that
would be just passive data. But sometimes we're asking people
to fill out
surveys and sometimes micro surveys where it'll be like eight
times a day that they're doing this. So obviously it
(19:10):
varies dramatically, right, because if you're asking somebody to fill
out a survey eight times a day. There's a lot
of labour involved in that. But a typical participant would
get somewhere between, say, 25 $100 now.
Are they the right figures? I really don't know, You know,
I guess we'll find out over time.
Lynne Malcolm (19:29):
Another challenge is getting researchers on board with the fact
that they don't have a free reign over the data.
Simon Dennis (19:36):
This is a bit of a difficult thing for researchers
to accept, because they in the past they've just got
the data and whatever they liked with it, right.
And now we're trying to get them to accept that. No,
in fact, that data doesn't belong to you. It belongs
to the participants.
And you can't just give that to whoever you like.
But what I say to them is that when you
(19:57):
get into this big data, it's not just a matter
of collecting the data, and it'll it'll all be good.
It really matters that the participants is being diligent about
the way they collect the data.
And so if they have a stake in collecting and
curating a complete set of data, that really helps the
data quality, and it's a big issue in big data.
(20:18):
So I think there are positives to be had in
terms of how that works.
Lynne Malcolm (20:25):
Troy believes there are clear things online services can do
right now
to improve data security for individuals like collect as little
data as possible and store it for the shortest period possible.
Troy Hunt (20:39):
I
am gobsmacked by the number of times people go and
subscribe to a service, and the service asked for information,
which is just entirely useless.
Lynne Malcolm (20:47):
He points to the example of the feline focused website
cat forum dot com.
Troy Hunt (20:54):
There are questions on capturing dot com along the lines
of Is it safe to eat out of the same
bowl as My Cat? It is literally a forum to
talk about random cat
related crap.
This is a real thing. There is a catforum.com You go
there and you sign up. It will ask you for
your date of birth, and I sort of look at
this and go Well, why do you need your date
of birth to comment
on cats? It seems like a very irrelevant piece of information,
(21:17):
and I kid you not every time I show sort
of lament this on Twitter. People come back and say, Well,
you need to have data birth because of copper in
the US, the Child Online Protection Act. We need to
know that you're 13 or older
and I'll go. Okay, Well,
why don't you just ask them? Are you 13? Because
if you just asked if they were 13 and didn't
have their birthdate Now you don't have a piece of
(21:39):
information that you can't lose
and people every single time they come back and say, Well,
you can't do that because people could lie. Just got
Oh my God!
Lynne Malcolm (21:50):
As the potential and reach of big data grows, Simon
thinks regulation can make markets safer and fairer.
Simon Dennis (21:58):
What we're going to see right is over the next
decade or so.
is governments becoming much more involved in this process because
it's really the Wild West and there needs to be
a much tighter regulatory environment as people come to understand
their data because that's the big problem at the moment,
they just don't really even understand what's there, right? So
(22:21):
certainly what our hope and our vision is that unforgettable
is that by allowing them to monetise it. That then
gives them a real reason to be actually involved in
the process. And I think it won't be everybody, of course,
but there will be larger groups of people who are
really starting to grapple with some of these issues about
what should the data be worth and so forth. So
(22:42):
I'm hopeful that we will see a more informed
populace in time.
Lynne Malcolm (22:48):
And that's the episode. I'd like to thank our guests
for today. Professor Simon Dennis and Troy Hunt
PsychTalks is supported by the University of Melbourne's School of
Psychological Sciences. Our producer was Carly Godden, and our assistant
producers were Amy Bugeja and Mairead Murray.
Arch Cuthbertson was our sound engineer, and music was composed
(23:11):
by Chris Falk. Catch more episodes of psych talks with me,
Lynne Malcolm by subscribing to our show wherever you get
your podcasts from.