Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Speaker 1 (00:14):
Welcome to tech stuff. This is tech Support. I'm os
Valoshin and I'm here with Cara Price.
Speaker 2 (00:18):
Hey us, Hey Karra.
Speaker 1 (00:20):
So today we wanted to talk about this chatchypt feature,
which is now defunct, but our friends at four or
Form Media had a story with the headline nearly one
hundred thousand chatchypt conversations were searchable on Google. And as
soon as that email hit my in box, before I'd
even read it, I've forwarded it to you and to
our producer Eliza, and I said, let's jump on this.
Speaker 3 (00:41):
Yeah. You know, part of it is that it taps
into this fear that we all have about our most
intimate thoughts being made public. This isn't like having a
private Instagram account. This is very much between us and
chat gpt. It's a little bit like talking in our sleep.
And I think most people who have played around with
a chatbot have some questions or responses that they'd rather
the general public be blind to. I know I have
(01:03):
my fair share.
Speaker 2 (01:04):
Yeah.
Speaker 1 (01:04):
We did that piece recently with Kashmir Hill about AI
induced psychosis and the guy who'd fallen into the rabbit
hole by talking with chat Gibt about whether or not
he might be living in a simulation. So I started
talking about chat gpt with this to see if I
would also be taking down the rabbit hole, and then
I was like, oh my god, I'm not sure if
I want this to be made public at a later date.
So yeah, open Ai says they're now working with Google
(01:27):
to scrape these conversations off the web, but of course
some quick thinkers have already archived them.
Speaker 2 (01:34):
And I can't help but be rather.
Speaker 1 (01:35):
Curious about what it is that people are talking to
chat Gibt about.
Speaker 3 (01:40):
I mean, obviously, we do have a segment at the
end of every Friday episode called Chat and Me about
how our listeners are really using their chatbots, and now
we have hundreds of thousands of additional responses to explore.
Speaker 1 (01:52):
Of course, there's a difference between how our listeners tell
us they're using chambots and the reality which apparent from
these logs, and one researcher was actually created a data
set of all the responses that were indexed by Google,
and again our friends at four or four Media were
able to take a look here to tell us about
what everyone's asking chat is.
Speaker 2 (02:12):
Four or four Media's Joseph.
Speaker 3 (02:13):
Cox Joseph, Welcome back to tech stuff.
Speaker 4 (02:16):
Hi, thank you for having me.
Speaker 2 (02:17):
Joseph.
Speaker 1 (02:18):
Let's start at the beginning. How is it that one
hundred thousand chat GPT conversations ended up on Google Search.
I thought that these conversations were private.
Speaker 4 (02:27):
Yeah. So this starts with an article on Fast Company
on July thirtieth, and that outlook found that chat GPT
conversations were being indexed by Google. That is, as your
listeners will know, Google is constantly going around the web
and essentially grabbing content from websites. Of course, it can
(02:48):
use it to make its search engine. What was different
here was that while ordinarily, when you're talking to chat gpt,
thankfully all of the content of that conversation is private,
in this case, what some people have been doing was
using i think a little known feature where they could
share the contents of that communication. Now, maybe you want
(03:11):
to do that because you want to show your friend, wow,
look at this really wacky, crazy thing that chat GPT
told me. Or maybe there's a business need right like, hey,
I've done this with chat GPT, now I need to
show other people in my team. And you would select
the share feature and this would create a public essentially
a public web page version of that chat, and although
(03:35):
you can then send that to your friends or your
co workers, it can also be seen by Google obviously,
and OpenAI probably could have done some stuff to protect
it there. But the result is that a bunch of
these conversations and now publicly available, are indexed by Google,
and I seriously doubt that all of the people using
this share feature really understood what they were getting into.
Speaker 1 (03:59):
Yeah, can you elaborate on that, because I thinking about WhatsApp,
for example, where there's like a forward button, or like
on x, I can do like a share link to tweet.
Is this like a somebody thinks they're pressing a button
to share an individual version of the transcript with another person,
but in so doing is kind of making their whole
(04:21):
chat GPT history visible to Google. Or what's the practical
explanation of how this happened?
Speaker 4 (04:27):
Yeah, the users are making that particular conversation publicly available,
and it works in a very similar way to the
things you just outlined. I sometimes compare it a little
bit to a Google doc link where you will go
and you'll make that public and there's that setting you
can do that says Hey, anybody with this link is
(04:48):
going to be able to read your aw full article draft.
I mean that would be my case or whatever, or
your private thoughts or whatever. But you don't then go
and paste that link online and Google take steps so
that's not included in search engine results. Of course, if
you want to post it on a forum or you
post it on Twitter, that's going to be something else.
(05:08):
But that's usually how I think most people expect this
sort of sharing behavior to work. They expect that, well,
I'm going to just share it with one or two
people or you know, a dozen or whatever. They don't
expect typically that it's going to be available to anyone
on the Internet who knows where to look, or of
course anyone with Google now because Google has archived it
(05:31):
as well. It's sort of a big mix of the
user is partly at fault for perhaps not fully understanding
what is going on. Of course open AI, maybe not
fully explaining what is going on, and not taking steps
to stop Google indexing, and then of course Google indexing
it as well. There's a lot of maybe blame is
too strong a word, there's love blame to go around,
(05:51):
I think, to all parties.
Speaker 2 (05:53):
So this is one hundred thousand conversations.
Speaker 1 (05:55):
Do we know how many users those hundred thousand conversations represent?
And also you know what are some of the things
in those conversations.
Speaker 4 (06:03):
Yeah, I don't think I've seen figures that drill down
to how many users, but you're right, it's nearly one
hundred thousand conversations with this data set the researcher scraped
from Google. I mean, before this, some researchers were going
through hundreds of conversations and that was already bad enough,
and of course Newsworthy. Well, this researcher did was scrape
(06:24):
them on mass put them into a data set. And
I'm actually looking at it now and there's a lot
of benign stuff in here. It looks like somebody is
making their first iPhone app and they're using chat GPT
for that. There are others where people are clearly discussing
sensitive business materials, such as could you help me write
(06:45):
this contract? There is potentially, you know, some bank information
in here. I say potentially because it sure looks like
bank information. And then you have I mean you mentioned
at the top these sort of delusional conversation that some
people have with chatjeput and I'm sure there is some
of that in here. I have seen some people talking
(07:07):
about therapy. I have seen some people talking about relationship issues,
such as one it seems to be a man talking
about his ex girlfriend and wondering why she's not looking
at his Instagram stories, that sort of thing, which I
don't know if I would turn.
Speaker 2 (07:23):
It's just not that into you.
Speaker 4 (07:26):
That means yes, I think chat GPT was trying to
say that basically, so this is only what people have
decided to share, which is a very interesting caveat to
the data.
Speaker 1 (07:39):
They don't want to share it with the world, but
they've chosen at least one other person to share it with,
so therefore, by definition, is not their most private use case.
Speaker 4 (07:47):
Yes, and maybe the research or others will be able
to do some sort of deeper analysis on this than me.
But that's interesting and that what are the sorts of
things that people are willing to share with another person?
And of course, you know, what does that tell us
about the things they're not sharing. That being said, I
don't think anybody wants a security issue where we're actually
(08:07):
able to see all of that private data either.
Speaker 3 (08:10):
So this was something that was reported out a few
weeks ago, As you said, has there been any change
and how did open ai respond to the exclusive.
Speaker 4 (08:19):
So open ai has now disabled this like opt in
sharing feature because the company actually said they don't think
people fully understood what was going on. And then the
company also says it is working with Google to remove
some of those indexed results. Because of course there's a
few things going on here. There's the exposure in the
(08:40):
first place, there's the sharing, there's the indexing by Google.
But even if Google does remove these search results, these
chats have been archived by this researcher, and I presume
others as well, Like I seriously doubt there's only one
or two people who grabbed all of this data. It's
very much an interesting privacy issue that I think researchers
(09:02):
want to look into and learn from.
Speaker 3 (09:04):
I don't understand why open ai seem to think that
this tool would be useful, Like, have you given that
any thought?
Speaker 4 (09:10):
Yeah, I think that people do want to sometimes share
the interesting or crazy or insightful stuff they get from GPT. Now,
open ai probably should have taken steps to ensure that
people can share this in a much more private manner,
maybe something like you have to add a particular chat
(09:33):
GPT user to the conversation, then they can see it
in the same way you add somebody to a Google doc,
for example. That would be a little bit more laborious,
there'd be a bit more friction there. But I'm just
interested in why open ai did not take more steps
to protect this from being scraped by Google. It is
possible to share material online without it being touched by
(09:57):
search engines. You can ask search engines, hey, if you
come across this, please do not index it. I'm curious
why OpenAI did not take those steps, and I don't
have any insight either way. But the result is that
all of these chats have now been indexed on Google,
and I think that's pretty significant.
Speaker 2 (10:14):
What do you think might happen next?
Speaker 4 (10:15):
What happens next is that I think other companies are
going to start checking whether they also have similar issues
like this. And I do want to stress like, this
is not the vast majority of chat GPT conversations or
anything like that. Chat GPT was not hacked, it wasn't breached.
There was a somewhat niche security issue, but because these
(10:38):
tools are becoming so so popular now, even a relatively
niche issue can actually impact a ton of people.
Speaker 3 (10:51):
After the break, So how secure are AI chatbots stay
with us?
Speaker 1 (11:11):
It's interesting because Sam Altman was recently on THEO Vonn's
podcast and he was sort of pointing out some of
the risks to my surprise, about the privacy issues in
chat shept. He was saying, like therapists conversations are protected
by hippa lawyer conversations are protected by attorney client privilege,
(11:34):
and people assume that when they're talking with chat that
maybe some of these protections apply, whereas in fact they don't.
And I was kind of wondering why he, of all people,
was out there on this topic. I did read some
other reporting saying that it may be part of the
lawsuit with the New York Times. The New York Times
is part of their discovery in the lawsuit against open
(11:55):
Ai for copyright infringement. Are demanding I think one hundred
million open ai converse stations for analysis, But I was
surprised to hear Altman out there on this. Nonetheless, can
you kind of take a step back and maybe reflect
on this story about the breach in the broader context
of how people are using chatbots and what chatbot makers
(12:18):
are incentivized to do or not do to protect their users.
Speaker 4 (12:22):
Yeah, so I haven't seen those comments. But to zoom
out a little bit, Altman and other people in the space,
they enjoy kind of getting their cake and eating it too,
where on one side they will warn about the dangers
of AI. They'll say it needs to be regulated, it
needs to be taken really very seriously, and also it
(12:43):
is coming and there's nothing we can do about it,
while also building those tools at the same time and
making a lot of money from it. They actually benefit
from being on both sides of the conversation at the
same time, and Oltman and others very easily switch between
those positions depending on the context and which they're talking about.
So of course, you know, an AI developer can say
(13:05):
very very sensitive stuff is going on here and people
need to be careful, and then on the other side
they'll say, while our technology is absolutely suitable for that
because we take privacy very seriously or whatever. I've just
kind of got a little bit jaded by all of
these companies playing both sides at the same time, And
that's why I think you need outside journalists, outside experts, policymakers,
(13:28):
activists who can probe it a little bit more because
every time I hear Oltmann or someone similar make these
points about their own technology, I have to remember, yeah,
but they're making it.
Speaker 2 (13:38):
Yeah.
Speaker 3 (13:39):
Open ai is apparently trying to remove the shared content
from search engines, but smart people like this researcher accessed
and stored it while it was live. While they're using
it for an altruistic purpose. I'm wondering if you think
people should be concerned, like what if they do end
up in the wrong hands.
Speaker 4 (13:56):
I don't think people need to necessarily be concerned about
this specific breach. I mean that being said, maybe there's
something really really bad in there and I simply haven't
seen it, and the researcher and others are going to
continue to dig through it. But people should absolutely be
careful with how they are using chatbots. I mean, maybe
they use this now disabled feature and maybe they're going
(14:18):
to be concerned about that. But putting that aside, you
have to remember every single command, every single prompt, every
single sentence that you put into chatch, GPT or any
of these other ones. It is going somewhere. It's not
just sat on your computer. It's not being locally processed.
Is going off to their systems, and ultimately you don't
(14:40):
really know what it's being used for. That is, maybe
it's you retraining and improving the training of the system itself,
or whether there's some sort of quirk in its security
or privacy or sharing settings that ends up with it
now being publicly available. And I know that I'm a
little bit more extreme than others, but I would never
(15:01):
put sensitive information into one of these things. And I
know that plenty of companies are having to implement policies
where they tell employees, please do not put competential information
into the chatbot that we don't own. I think people
just have to be really, really cognizant of that. In
the same way that when we all first got smartphones,
(15:22):
we had to learn, oh, right, it's tracking my location
data if I turn location data on. I think we
need to remember and to learn, oh, when I put
this thing into chat GPT, I don't know exactly where
it's going, and it could potentially bite me later if
I'm not careful.
Speaker 2 (15:38):
Yeah, And I think it's an important point.
Speaker 1 (15:40):
Just we think about the stakes of the you know,
open AI or chatchbt logs being indexed and available on
Google because like information that you know, you share with
a chatbot that you may think is more or less harmless,
could have you know, identifying information or sensitive personal information
about addresses or accouncilor whatever it may be.
Speaker 2 (16:01):
And so I think there's this kind of almost.
Speaker 1 (16:04):
Willful ignorance which many of us, including me, persist with
despite knowing better in terms of how important proper security
practices around digital information are. And as you say, like
with all of a sudden standing on the doorstep of
a much more scary reality.
Speaker 4 (16:23):
Yeah, I would say that with security you really have
to be proactive rather than reactive after something has happened,
you know, your bank account got broken into or anything
like that. Sure, you can deal with it, but it's
going to be annoying, it's going to be hard, it's
going to be tricky, and maybe some people steal some
money from you, maybe somebody hacks into your company or
(16:44):
something like that. You really should do security proactively if
you can. And a really thing that applies to everybody,
which isn't to say that it should be on users
all of the time. It really is up to the
people who make these products such as chat, GPT by
open Ai or whatever else for them to put in
these guardrails so people can't make these mistakes in the
(17:06):
first place.
Speaker 3 (17:07):
You were lucky enough to get a hold of this
data set by this researcher. Do you know what the
researcher is planning to do with the information.
Speaker 4 (17:14):
Not specifically beyond analyzing it for trends. I believe seeing
what is in there absolutely no criminal activity or anything
like that. But again, that's not to say that other
people may not be doing that as well. I can
imagine the situation which let's say, and this is a hypothetical,
(17:34):
but I'm sure I can find something that would reflect
this in some sort of data set. They're say you
were using Chatchuputi or something similar to make a quick
prototype app for your company. In that you include your
username and password and access keys for the infrastructure of
your company to make that app. It's all well and good,
it works, and it accidentally gets shared in a database
(17:56):
like this, Someone who is malicious could then go in, well,
thank you very much for those access keys. I'm now
going to break into XYZ company. And although we haven't
seen that happen specifically with this data set, that sort
of stuff happens constantly where you know, an engineer company,
even a very junior one, will put those keys in
(18:18):
code which is accidentally exposed online. It's accidentally publicly available,
and that's how we end up with data breaches.
Speaker 1 (18:24):
Now, yeah, I mean as AI is being marketed as
a tool for work, obviously, the leverage like an individual
consumer has versus Open Ai or Google is really limited, right,
Like you know, I can complain and holler and post
on Reddit, and journalists like you can pick it up.
But when you know, PEPSI or Ernst and Young has
(18:45):
concerns about how its employees chats are being handled by
third party companies that perhaps you know, can can drive
change more rapidly, given these are like big corporate spenders.
So I'm curious do you know anything about what the
conversation alike but kind of B to B conversations around
operational security for NLMs, Well, I.
Speaker 4 (19:07):
Mean I would also draw a parallel even just with
the intellectual property one, where a lot of these companies
weren't really paying attention until somebody was taking Mickey Mouse
doing some very strange things with AI with it for example.
And now of course we have the lawsuit you know
between Disney and mid Journey, for example, which is an
AI image generator engine. When it comes to security, I
(19:30):
don't know about the specific conversations, but it's absolutely something
that people need to be educated at inside their companies.
Funny enough about Disney, there was a breach of Disney
I think a year ago at this point, and that
started because one of their employees downloaded the piece of
software that they believed was some sort of AI agent
(19:50):
or some sort of AI generation tool. Hidden inside that
was malware which then stole passwords, and which then logged
into Disney's slack and stole a mountain of data. And
it turns out the hacker behind this had been deliberately
putting malware into their own custom AI tools to try
to get unsuspecting people to download it. So this is
(20:13):
a real threare to anybody working I think in any
sort of company. Hackers do not care really who you are.
They only care what you may or may not have
access to, and AI is just another consideration of that,
whether that's the data that an employee is inversely putting
into chat, GPT or a sketchy tool that someone may download.
(20:38):
You know, like, this is something that we have to
live with now.
Speaker 2 (20:40):
Joseph, thank you, Thank you, Joseph, thank you so much.
Speaker 3 (20:58):
For Tech Stuff.
Speaker 1 (20:59):
I'm care and I'm os Valoshin. This episode was produced
by Eliza Dennis and Tyler Hill. It was executive produced
by me Karroen Price and Kate Osborne for Kaleidoscope and
Katrin norvelfa I Heart Podcasts. Jack Insley mixed this episode
and Kyle Murdoch rodel theme song.
Speaker 3 (21:15):
Join us on Friday for the weekend tech Ars and
I will run through the tech headlines you may have missed.
Speaker 1 (21:19):
And please do rate and review the show wherever you
listen to your podcasts, and also send us a note
at tech Stuff podcast at gmail dot com with any
comments or suggestions