Shh! The Tech is Listening!

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Speaker 1 (00:04):
Welcome to Tech Stuff, a production of I Heart Radios
How Stuff Works. Hey there, and welcome to tech Stuff.
I'm your host, Jonathan Strickland. I'm an executive producer with
I Heart Radio and How Stuff Works, and I love
all things tech. And I'm sitting in the audience of
a local theater like Stage theater not long ago. I'm

(00:28):
waiting for the show to start, and there's a song
that's playing over the sound system, and I'm really kind
of digging the song, but I totally don't recognize it.
And I glanced down at my phone and I see
that on the phone below the time on the locked
phone screen, it says that the song is danger High

(00:48):
Voltage by Electric six. Now this is obviously a hypothetical
example because I would recognize that song anywhere, but you
get the point. Anyway, I'm thinking, that's so cool. My
phone knows what songs are playing around me. That's so neat.
I didn't even have to tell to do anything. And
then a couple of hours later, as I think back
on this moment, uncertainty and dreads start to see Ben,

(01:11):
wait a minute, if my phone can identify a song
that's playing around me, that means my phone is actually
listening to stuff. It wouldn't be able to tell me
the song title. Otherwise it has to be able to
pick up the audio. I didn't activate any app. I
didn't turn on shah Zam or ask my phone or anything.
My phone did it by itself. So my phone is

(01:33):
detecting the sounds around it even when it's not in
an active mode. Now, on a similar note, I'm sure
we all have had these personal assistant experiences out there.
Whether we use one ourselves, we've been around when someone
else uses them, things like Google Assistant or Alexa or
Siri or Cartana. There's more of them out there. You

(01:56):
can activate these assistants with a specific word or phrase,
and then you speak to them to carry out some
sort of task or to get you some sort of
information or something along those lines. We've got a Google
Home device in our house, so we might use it
to get a quick rundown on the weather Report. We
might ask it to play a track off an album
by the jazz Fusion band weather Report. But wait, that

(02:19):
means that device is listening to We didn't have to
take any physical action. We didn't have to push a
button to make it work. We just spoke the keyword
or a key phrase, and off it goes. And then
we get into stuff that seems super creepy. And I'm
sure most of you have had some sort of experience
like this. Say you're chatting with friends, maybe you're at

(02:40):
a restaurant or you're just hanging out, and you're talking
about this new snack food you just heard about, and
this is just one part of a conversation that rambles
all over the place. But then you talk a little
bit about the snack food for a couple of minutes.
You're like, you've heard about it, you wanted to try it,
you haven't tried it yet. Later on, you pop on

(03:01):
over to Facebook, and as you're scrolling through your feed,
there it is. There's an ad for the very same
snack food you mentioned to your friends just a little
earlier that day. You've never purchased the snack as far
as you remember, you haven't even searched for it on
the web, and there's the ad. So as Facebook listening
in on your conversation in an effort to serve up

(03:22):
a laser focused targeted ad. One this episode, we're gonna
take a look at the technology that allows our devices
to listen in on us, and we'll explore the studies
about whether or not anything hanky is going on and
try to separate fact from fud FU D that's fear,
uncertainty and doubt. And we'll also chat about some recent

(03:44):
news stories about how big companies have been handing over
audio messages to third party human contractors and what that
means in terms of privacy and ethics. Now, first, let's
address a big reason why devices aren't constantly recording or
broadcasting all the sounds within an environment that's reachable by microphone.

(04:06):
It's because that's truly enormous, Like, that's a huge amount
of data. So let's just take Facebook as an example.
There are more than two billion people using Facebook every month.
At least one and a half billion people pop on
Facebook every single day. Now that's not necessarily the same
one and a half billion people every day, but every

(04:27):
day one point five billion people check Facebook, and out
of that number, nearly one billion of them are accessing
Facebook on mobile devices. So, just from a data management standpoint,
there's no way any company, even one as large as Facebook,
could be actively monitoring, recording, or even analyzing all that

(04:49):
audio that would be coming in from a billion mobile handsets.
We are in the age of big data, but we
still have our limits. Plus you'd have to figure out
that you know that that large amount of data, most
of it wouldn't be useful to Facebook. Now, don't get
me wrong. At the end of the day, you and
I are the products being bought and sold on Facebook

(05:14):
and Google and other providers out there. We're potential customers
for all of the advertisers that use those companies like
Facebook as a platform. So it benefits the advertisers and
Facebook and sometimes even us as customers to match the
right ads to the right people. So there's definitely an

(05:34):
incentive to learn as much about users as possible to
leverage their interests and potentially convert them into paying customers
to an advertiser. Now, this is the very basic foundation
of Facebook's business model. So if Facebook could do this
from a technical standpoint, and if the company could get
away with it from a public perception standpoint, I think

(05:58):
there's little doubt that face Book would do it. But honestly,
it's just way too much information to process and to
boil down into actionable plans. We talk about a lot
of stuff in our day, you know, and some of
it we may not really be interested in. We're just
talking about something, So it wouldn't do Facebook any good
to serve up ads for stuff that we weren't actually

(06:20):
really interested in, So it has to pick and choose
its moments. Facebook has denied using phone microphones in this way.
In a June second, two thousand sixteen blog post on
the Facebook newsroom site, a company representative wrote this, and
here's a quote. Facebook does not use your phone's microphone
to inform ads or to change what you see in

(06:42):
news feed. Some recent articles have suggested that we must
be listening to people's conversations in order to show them
relevant ads. This is not true. We show ads based
on people's interests and other profile information, not what you're
talking out loud about. We only access your microphone if
you have given our app permission, and if you are

(07:02):
actively using a specific feature that requires audio. This might
include recording a video or using an optional feature we
introduced two years ago to include music or other audio
in your status updates. End quote. Now, it's understandable that
people would be a bit skeptical regarding Facebook's claims of innocence.
In this regard. The company has had several high profile

(07:25):
scandals and issues with privacy and security. Zuckerberg himself once
famously declared that privacy is dead. Also, he simultaneously does
his best to preserve his own privacy. But that's commentary
for another episode. So I don't blame people for thinking
that Facebook might actually be listening in on conversations because
the company has already proven it hasn't been the best

(07:49):
steward of user privacy in the past. But that doesn't
mean the company has actually been spying on people. It
doesn't have to, at least not in that way. And
this is where we get into some troubling territory because
it's where we start to learn how services like Google
and Facebook and others can glean information about us, whether

(08:10):
we have consciously shared that information or not, and it
helps explain how these companies can advertise to us so effectively.
One way Facebook does this is with an innovation called
Facebook Pixel. Now, this is a piece of code that
Facebook's clients advertisers really can put on their own websites.

(08:32):
So it's the type of code you would insert into
the website for a business. So let's say you own
a specialty niche marketing shop. We'll say you sell figurines
based off of iconic horror movie monsters and characters, and
you're going to advertise on Facebook. The pixel code is
one way Facebook can optimize that experience. The code pulls

(08:52):
information off of user behavior on your website and sends
it to Facebook. If people click over to your site
because of an ad on Facebook, pixel will register it.
This helps you see how effective or ineffective your ads
are on the site. It also can target your ads
to people on Facebook who would be most likely to

(09:13):
click on those ads. It might analyze the traits common
to people who are interacting with your ads, and then
extrapolate that to target people who have similar traits and
behaviors but they haven't yet seen your advertisements. Facebook, meanwhile,
can also use that data to serve up ads from
other companies to users based on similar findings, and it

(09:33):
can track other stuff too. Let's say you click over
to an article on a blog or news site that
incorporates Facebook pixel in the site's code. Facebook can see
how long you were on that article, which in turn
indicates your interest and investment level in that topic. Then
Facebook can serve up ads related to the contents of
that article to you. In the end, it's all about

(09:54):
analyzing user behavior to get the biggest return on investment,
and it doesn't require are using the microphone to do it.
They can just look at who you are, where you've been,
both in real life if it's tracking your location and
on the Internet if it's tracking your your browsing and
who your friends are. And all of this information combined

(10:16):
gives Facebook a ton of data about what kind of
ads to target towards you. Now, on top of that,
Facebook can purchase information from data brokers to supplement its
own guard Ganga and database. There are companies that manage
stuff like loyalty programs, which also track what you buy.
They have to for the loyalty programs to work, and

(10:36):
those purchases are linked to you as a person. They know, Oh,
Jonathan goes to Starbucks all the time and he always
gets those Nitro cold brews, So let's put an ad
that targets him based on that information. Now, that data
isn't just being used to help you get the best
deal on whatever it happens to be. That information is valuable.

(10:56):
So companies that manage these loyalty programs can and do
buy and sell sell that data you know are spending
habits are part of this sort of encyclopedia entry about
our interests, priorities, and behaviors. Now, none of this needs
to use a microphone to spy on us. So in
the case of seeing that snack food pop up on

(11:17):
the Facebook feed, it could simply be that you exhibit
behaviors similar to ones that people who have bought that
snack food tend to have. As well. You've liked the
same sort of pages. You may even have a lot
of friends who have already bought this stuff. You may
live in a region where it has recently been introduced.
These are the kinds of points of data that Facebook
might use in order to serve that add up to

(11:39):
you that have nothing to do with your microphone. So
you got the ad not because you talked about the
snack food, but because Facebook has sussed out you're the
type of person who would like that snack food because
spoiler alert, You're not as special as you think you are,
and I'm not as special as I think I am.
Now you could argue, and I would agree with you

(12:00):
on this, that what Facebook is doing is at least
as creepy as listening in on a microphone, perhaps even
more so. Facebook has filed patents that focus on technology
is meant to predict where you're going to go next
based on your history of location data. So, in other words,
Facebook is trying to figure out where you're going to
go before you go there. And it's not just you,

(12:23):
it's all the people you know who are using Facebook
two and so it's not just predicting where you'll go,
it's also predicting which people you may be running into,
because it's predicting those people are going to go to
that same place and whether or not you might encounter
one another. It can also use that to make suggestions
to add people on Facebook who are going to those

(12:44):
same places so that they become your friends online. Now
why does Facebook care who your friends are? Because the
more people who use Facebook and the more interconnected they become,
the more useful the information they generate for Facebook. That
that ends up becoming more valuable to the company. So

(13:05):
it is pretty creepy and invasive, and it doesn't have
to use the microphone. But when we come back, I'll
talk a bit more about these sound activated features and
what's actually going on, because there is some stuff we've
got to be worried about. But first, let's take a
quick break. When I opened this show, I talked about

(13:28):
how my phone could listen in on music and identify
the song even when the phone was in its locked mode.
Now that's because I have a Pixel to xcel phone.
It's an Android phone. It's actually a flagship Google phone,
and there's a feature on the Pixel too that's called
now playing. You have to activate this feature, you have

(13:48):
to choose to optimize it. So I want to make
that clear. I chose to activate this feature. It's not
just active by default, and with it active, the phone
can identify music that's playing, and it can tell me
the title even when the phone is in its locked position.
So what gives Well, this is not as creepy and
invasive as it sounds at first glance, because his feature,

(14:12):
this is incredible to me, is actually entirely local to
the Pixel two phones. It works on the phone itself.
It's not consulting the cloud at all, it's not sending
any information. So how can that be possible? How can
all this information exists on the phone already? Well, let's
boil it down first, if you've ever played with any

(14:36):
digital sound recording software, you've likely seen sound recorded as
a wave form, a visualization of sound, and typically it's
pretty simple stuff like if you're using a very basic
sound recording system, you're mostly looking at changes in amplitude
or volume. In other words, so you see a continuous
series of peaks and valleys over the course of a

(14:57):
sound recording. Those represent the loudest and the quietest parts
of the recording that changes in volume. You can also
graph frequency or pitch, and you can if you zoom
way in, see shapes in the wave form that indicates
specific phonetics and sounds. Anyone who has worked in audio
editing for a while can identify at a glance certain

(15:20):
distinctive sounds. Tari, my producer, can probably tell you just
by looking at a waveform of my recording which moments
represent the irritating mouth sounds she removes before publishing an episode.
It doesn't take long before you can do this yourself.
It's actually pretty easy to identify, say it like a
high hat symbol in a music recording, because it's very distinctive. Now,

(15:46):
that means that songs have these distinctive features like a
fingerprint that represent the sound of the song, and if
you can recognize the fingerprint, you can identify the song
even if you're not listening to the song at that moment.
And you could look at a print out of a
wave form of a song and you can try and

(16:06):
match it against a library of print outs. That's essentially
what the pixel Too is doing. The program runs in
the background, It activates when the sound profile indicates that
there's music present, so it then analyzes the sound that's
coming in through the microphone and it creates one of
these digital fingerprints that I was just saying. Then, just

(16:28):
like you would with a crime scene fingerprint, the pixel
Too will compare the digital analysis of the song that's
playing against a local database on the phone of fingerprints
that represent thousands of popular songs for your region. Now
exactly how many hasn't really been released, but supposedly in
the tens of thousands of songs range. And if the

(16:49):
pixel Too finds a match between the song that is
currently playing and the one that's in the database, it
returns the result. This works even if the phone has
cellular and WiFi data turned off, because again it's all local.
Now the now playing feature doesn't run constantly because that
would drain battery life like crazy. Instead, it samples the

(17:10):
audio approximately every sixty seconds, and it takes time to
match a song to an entry in the database. The
cleaner the audio, in other words, the less background noise
and less interference that's present, the faster this process tends
to be. This means that when songs transition from one
song to another, it can take a little bit before

(17:31):
the phone registers the change. It all depends on the
acoustic quality of the environment and where in this sampling
cycle the phone is at any given time, so that's
not quite as creepy because everything's local on the device.
It's not sending any data out anywhere else. It's not
listening to what I'm listening to and an alerting Google

(17:52):
to let them know, hey, Jonathan's once again listening to
the soundtrack to be More Chill, which would be an
accurate suggestion that it would make because I do listen
to that a lot. Anyway, you can use this feature
to learn more about the track, the artist, the album,
including potentially purchasing that music. And those features do connect

(18:13):
to the outside world through WiFi or cellular connections, but
that requires an extra step on the part of the user. Also,
Google pushes out updates to this database with the most
popular songs, and these are regionalized to reflect the country
you're in, because you're less likely to run into, say
a Peruvian pop song when you're in Scotland. The push

(18:35):
updates do happen over WiFi or cellular local connections. But
but this is just the reference data that analyze music
gets compared against. An app like Shazam, on the other hand,
connects to the cloud, but you also have to activate
the app to have it listened to the audio, so
it's a user choice to have the app listen. So

(18:56):
this is more like a push to talk device, except
it's pushed to listen. Shazam is also analyzing music to
sus out a digital fingerprint for the audio, but it
can compare the sampled audio against a much larger database
consisting of millions of songs, rather than the tens of
thousands you would find on the pixel to now playing feature.

(19:17):
More importantly, I think it's fair to say this isn't
a creepy use of the technology, since the listening feature
only activates on the user's command rather than just being
on by default. Now, this isn't that much different than
what virtual assistants are doing when you use them. Clearly,
the microphone on a virtual assistant like Google Home or

(19:38):
Siri or whatever, it has to be active all the time,
otherwise you wouldn't get a response when you used whatever
the keyword or phrase was to activate the assistant. I'm
going to try and avoid saying any of those phrases,
by the way, because I don't want those of you
who have those devices to deal with the frustration of
them going off in response to something I say. A Now,

(20:01):
those words or phrases have a specific sound, just like
music does. In this case, we're talking about phonemes, which
are recognizable sounds found in language. So in English there
are forty four phonemes. The order and combination of those
phonemes are the key. So if you say something that
has those phonemes in the right order, or if it's

(20:23):
close enough, if it's an a noisy environment, this can
activate the virtual assistant. It's like a key fitting into
a lock. Now, if you're saying other stuff, it's like
the wrong key is inserted and nothing happens. It's only
when you say something that fits the lock that the
assistant activates. This process continues after activation. When you talk

(20:45):
to the virtual assistant, it analyzes your speech by phonemes.
Software processes those to figure out what words you are
actually saying. Well for the first step, that is, because
it's actually more complicated than that. So, for example, there
are hominems. These are words that have a similar sound
but different meanings and often different spellings. An easy example

(21:08):
is the number eight in the past tense for to eat,
such as I ate an entire bowl of cao. Mm
hmm okay. So those two words eight and eight sound
exactly the same, but they have different meanings. Now that
means the software can't rely on just the sounds you're

(21:29):
making when you speak to figure out what you mean,
has to actually analyze syntax and context and make judgment
calls about what you are actually meaning when you say
these things. Sometimes it gets things right, sometimes it gets
things wrong. But don't be too hard on it. Because
humans misunderstand other humans all the time. Even when we

(21:50):
are both communicating with it in the same language, we
can misunderstand each other. Now, this is still just the
first step you can think of. This is essentially speed
each to text. From there, you have to determine what
is actually being asked by the speaker, what is the
intent behind the words. If someone speaks French very slowly

(22:11):
to me, I might be able to spell out what
is being said phonetically, but that doesn't mean I understand
the actual content of what was spoken. And to complicate matters,
there are a lot of different ways to ask for
the same information. I might say what's the weather for
this week? Or will I need an umbrella today, or
one of a dozen other ways to inquire about the weather.

(22:33):
The software has to be able to determine what the
intent was behind my question, and then there's another step,
which is matching intent with action. The assistant has to
respond to my request, and hopefully it does so in
a way that's relevant to whatever I was asking about
in the first place. So if I ask my virtual

(22:53):
assistant for an update on the weather, I'm not going
to be impressed if it instead tells me about the
track FAIC or vice versa. And as assistants get connected
into more systems like security systems, lights, apps, and more,
the software has to send appropriate commands to these other
elements to produce the expected results. Now, this is all impressive,

(23:17):
and because it's impressive, it could be a little scary
when we think about assistance as hanging on our every word.
What are are they always listening? Are they always paying attention? Now?
They're always monitoring sound, but they're not doing so in
an effort to broadcast or record information. They are on
alert for that initiating phrase or word. They ignore everything else.

(23:40):
More on that a little bit later. Now that being said,
there are ways in which someone could hack an assistant
or a phone, or really any connected device that has
a microphone in order to eavesdrop using that devices microphone.
Edward Snowden revealed that the n s A use such
tactics in the agency's surveillance efforts. Apps that have access

(24:03):
to your phone's camera and microphone for the purposes of
sharing video, audio, and related features can do some disturbing
stuff if they're compromised. They can also do some disturbing
stuff if they're not compromised, but if the party behind
it is malicious. Felix Krauss made such an app as
a proof of concept for iOS devices. The app, like

(24:26):
many others, asked the user for permission to access the camera.
Kraus stated that once a user agreed to this, the
app could access both the front and back camera anytime
the app was in the foreground of the iOS device.
It could take videos and pictures with no indication to
the user that such a thing was happening, and it
could upload that data to a remote server. It could

(24:47):
even run real time facial recognition software. Now does this
mean apps like Facebook's Messenger or YouTube are doing this? Well,
not necessarily, but it does mean it's at least possible
to do and nothing is stopping him. More, let's say
ethically unconcerned app from doing just that. So what can

(25:08):
you do to protect yourself from bad actors? Uh, here's
the bad news. Not much you could go without using
such devices and apps in the first place. That's pretty
darn restrictive. Crowds recommended using camera covers to obscure the
phone's cameras when you weren't actively using them, or revoking
camera access to the various apps on the phone. And

(25:30):
that's about it. Yikes. Now, when we come back, I'll
cover a related topic that's been in the news lately.
But first let's take another quick break. Okay, so we
know it's possible to use cameras and microphones against people,

(25:52):
either with malware or what amounts to a security loophole
between handset hardware and apps. But there's something us we
need to chat about, and that's humans listening in on
what were assumed to be private conversations and messages. Now
here's the context. In August two thousand nineteen, several major
media outlets reported an upsetting revelation, namely that Facebook had

(26:17):
been sending out audio files that users were creating in
Facebook Messenger, for example. And these were audio clips sent
through Messenger itself, so it's akin to a private text
to a friend. And Facebook was sending these audio files
to a third party contractor to transcribe that audio. So
imagine having a private text message thread set to a

(26:40):
complete stranger for review. It was similar to that, except
it was audio, not text. So what's actually going on? Well,
Facebook said this all had to do with users who
had opted into having their audio messages transcribed automatically. Essentially,
it was all about using the voice to text option
in Facebook. Now, according to Express Computer, this option didn't

(27:06):
really have a warning that let you know that those
audio files you were creating through this voice to text
feature would go to be heard by any humans out there.
In fact, they said that the warning that would pop up,
or the notification that popped up said, turn on voice
to text in this chat using Facebook Messenger, and above

(27:31):
the no and yes buttons where you would choose one
of these options. Facebook further would describe the option display
text of voice clips you send and receive. You can
control whether text is visible to you for each chat.
So again it makes it sound like, oh, this is
all automated. If I use voice to text, I just

(27:52):
say a phrase, the text shows up. I might have
to make some adjustments to the text, maybe it has
misinterpreted one of the words or whatever. But sort of
a hands free approach to sending messages in Messenger. Lots
of apps use voice to text features, and in theory
it's a pretty great feature. You can dictate a message

(28:12):
to be sent to your friend without having to stare
at the screen and type or swipe on a keyboard.
Tons of folks use features like this if they want
to interact with an app while they're driving, for example,
to minimize the distractions they have as they putter around.
But you'll notice those messages don't seem to indicate anywhere

(28:34):
that the voice to text recordings could be sent to
a human being for review. Express Computer further explains that
even on a supplemental page explaining the voice to text feature,
Facebook fails to mention that human beings will be reviewing
that material. Instead. The supplemental page talks about how voice

(28:56):
to text uses machine learning to get better at interpreting
what you saying, so that it becomes more useful to
you the more you actually use the feature. So the
concept here was that some voice recognition software would transcribe
this audio. Google Voice also used to do this for
voice messages. I remember getting voicemails from my mother, who

(29:17):
has a Southern US dialect as do I, but hers
is more pronounced. The Google Voice speech to text program
had problems interpreting my mother's messages, and frequently the transcription
would be hilariously off track, and most of the time
I wouldn't even be able to guess what the original
message was based off the transcription. It meant that I

(29:40):
would listen to the voicemail and then I would shake
my head a lot as I would read the transcription
at the same time and just see how far off
it was. This is a big challenge for voice recognition programs.
There are a lot of different dialects and accents. People
from different regions within the same country can sound very

(30:01):
different even if they're speaking the exact same language. If
you get someone from Savannah, Georgia, a native of Savannah, Georgia,
and a native from Boston, Massachusetts, they're going to be
able to have a conversation with each other, but they
will end up saying the same words very differently from
one another. And that's before you even start talking about

(30:22):
people who have a different native language, who have learned
English and have a foreign accent on top of the
English they speak. There's no hard and fast rule you
can create for a voice recognition program to follow to
interpret speech correctly throughout a language. Because there's so much
variation in how the words and that language are said,

(30:45):
training the model becomes a challenge. So one thing you
can do is you have a human being transcribe spoken
words and then compare the human transcription against the machine
produce transcription in an effort to train your model to
be more effective. Humans are pretty good, though not perfect,

(31:08):
at figuring out what some other humans says. Assuming both
parties are fluent in the same language. By comparing these
two records against each other and then making corrections to
the model, computer scientists can tweak their voice recognition software
models to be more accurate. Now, ideally you would do
this before unleashing such a system on the public, but

(31:29):
that's not really that practical. There is no in lab
project that is going to come close to generating the
amount of data and the sheer variety that you will
encounter out in the real world. Improving the model would
happen much faster with a larger sample of subjects using
the model, and a billion or so people is a

(31:50):
pretty darn big sample size. But that means sending these
audio files to humans in the first place. And Facebook
has said that the files were anonymized so that there
was no identifiable name or anything associated with each of
the audio files being sent for human review. But hey,
I hear you say. Earlier in this episode, you pointed

(32:12):
out how it's possible to really get an idea about
a person just from the other data they provide, and
you'd be right. These audio files had all sorts of
different types of content in them, some of it was
likely upsetting disturbing or inappropriate. Contractors who had been hired
to do the transcription came forward anonymously, I might add,

(32:34):
because they didn't want to get fired from their jobs,
and said they felt that the practice was an unethical one.
And media outlets looked into it and their conclusions were
pretty much the same. Right down the board, Facebook was
not transparent about what was happening with this audio, and
there were no clear indications to users that their audio
files might get sent to some stranger for the purposes

(32:55):
of transcription. Now, for its part, Facebook said it halted
the practice in early August two thousand nineteen, and third
party contractors have said that that is true that they
no longer are doing this work for Facebook. Facebook isn't
the only company to come under scrutiny for this kind
of thing. Google, Apple, and Microsoft have also been under
the microscope for very similar practices. Now, on the one hand,

(33:19):
it's understandable that these companies want to improve their voice
recognition capabilities. It's what makes these apps and products useful
and makes it more useful to a wider variety of
people by training the models on this stuff. But the
privacy concerns remain and it's something that isn't just troubling
to users, but to the people actually being paid to

(33:39):
transcribe the stuff in the first place. Now, it would
be another matter if the companies were transparent about this practice.
If users knew that there's a chance a real, live
human being would be listening in on some of those
voice messages for the purposes of quality control for the
voice to text feature, maybe they wouldn't opt into using
the voice to text in the first place, or they

(34:01):
might opt in and not care. In some cases, I'm
sure there'd be no shortage of people who would actually
say truly terrible things, hoping that some poor contractor would
have to listen to it all and check the audio
against the automated transcription, because some people would just play nasty.
Don't be nasty. By the way, there are better ways

(34:21):
to entertain yourself than by making some other person's life miserable.
Facebook could potentially face some serious charges based on this practice.
The company had settled with the Federal Trade Commission, or FTC,
earlier in the summer of two thousand nineteen. The settlement
was for an incredible five billion dollars, and it largely

(34:43):
revolved around the company's rather abysmal record with privacy. The
charges date all the way back to two thousand twelve,
when the FTC brought eight privacy related allegations against Facebook.
And again, this isn't a big surprise. Zuckerberg had already
cavalierly proclaimed privacy dead a couple of years before that. Now,

(35:03):
in the settlement, Facebook agreed to adhere to some rules.
Those rules said that Facebook was prohibited from making misrepresentations
about the privacy or security of consumers information, prohibited from
misrepresenting the extent to which it shares personal data, and
it required Facebook to implement a reasonable privacy program. Now

(35:24):
I'm no legal expert, not by a long shot, but
it seems to me that Facebook's failure to alert users
that their voice to text data could be sent to
non Facebook employees for review is in violation of this agreement.
That Facebook agreed to these terms in July two thousand nineteen,
and then continued the practice into August is a big problem.

(35:47):
Whether or not it will result in further legal action
against this company is unknown as I record this episode,
but it seems like it's at least possible, So I'm
gonna wrap this up. We know that microphones can sit
in on us without our knowledge. The n s A
worked on programs in the United States that did exactly that.
And while companies with virtual personal assistants tell us that

(36:09):
those assistants only activate when certain phrases are spoken, it's
also possible that that list of phrases could go well
beyond the ones published by the company. So, in other words,
I might know that to wake up my hypothetical virtual assistant,
I would have to say the alert phrase sky net awaken,

(36:29):
and then it pays attention. But what if there's a
whole laundry list of other words or phrases that could
wake it up so that it records or transcribes whatever
audio follows. What if, for example, the phrase shopping or
going shopping activates it so that whatever follows gets registered
by the device. So if I tell a friend tomorrow,

(36:50):
I'm going shopping for some new sneakers, the device has
registered the phrase new speakers because it paid attention once
I said the words going shopping, and then I starting
ads pop up everywhere I go online for sneakers. Now,
is that something that's possible, Well, yeah, it's possible. That
doesn't mean it's happening, but it could be It's also

(37:12):
possible that my other behaviors have indicated that I'm on
the lookout for some new kicks. Coincidence is a thing,
and it's frustrating because without seeing behind the scenes, it's
hard to draw any firm conclusions. Most of us, myself included,
have a limited understanding of exactly how much data we're
generating in our day to day lives and how that

(37:34):
data can be analyzed for patterns and predictions. We may
not even be aware that we're heading toward a particular
decision before an algorithm draws that conclusion, and it's spooky
and disturbing. But it doesn't necessarily mean that we're being
spied on by a microphone. It may mean we're just
broadcasting our decisions before we've known that we've made a decision,

(37:58):
and it does indicate that there is some sort of
eaves dropping going on, just not necessarily audio eaves dropping.
It's more about all of our other behaviors that humans
don't pick up on, so we've never had to worry
about it before, but machines can analyze it at a
level that is disturbing. In fact, an actual study at

(38:19):
Northeastern University looked into the possibility of whether or not
phones were getting activated by clandestine phrases and listening in
on conversations, and it found that there was no evidence
that this was happening. They did find that a lot
of apps were taking screenshots of stuff on phones and
sending those screenshots to third parties, though, so you know,

(38:39):
that's also disturbing, But it doesn't appear that these devices
are actively listening to you all the time and recording
or transcribing or broadcasting that information anywhere. There's a lot
to lose from doing that approach. The problem is it

(38:59):
is something that is possible, and the other problem is
that there are other behaviors were doing that are just
as revealing, if not more so, than recording what it
is we're saying, and that without being aware of that,
we are just giving away more and more information about
ourselves and more and more control over our own lives.

(39:21):
And we're going to see more and more targeted ads
that seem super creepy because there's mentioning things that we
didn't think anyone knew about, because most people wouldn't pick
up on it fun times, So I don't think this
was a particularly you know, um, I don't think this
show really helps allay any fears. It may just switch

(39:44):
fears from microphones to everything else. But I did want
to cover this because a lot of people have been
talking about it for the last few years, and with
these transcription services that has brought the whole conversation back
into you the forefront. So I wanted to take an
opportunity to really tackle it here on the show. If

(40:05):
you have a suggestion for a future episode of tech Stuff,
send me an email the addresses tech Stuff at how
stuff works dot com, or drop me a line. By
going to tech stuff podcast dot com. You will find
there a link to all of our archived episodes, as
well as links to our presence on social media where
you can get in touch with us, and also a

(40:25):
link to our online store, where every purchase you make
goes to help the show. We greatly appreciate your support
and I will talk to you again really soon. Text
Stuff is a production of I Heart Radio's How Stuff Works.
For more podcasts from my heart Radio, visit the i
heart Radio app, Apple Podcasts, or wherever you listen to

(40:48):
your favorite shows.

All Episodes

Episode Transcript

TechStuff News

Follow Us On

Hosts And Creators

Oz Woloshyn

Karah Preiss

Show Links

Popular Podcasts

Stuff You Should Know

24/7 News: The Latest

Crime Junkie

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}Shh! The Tech is Listening!