Did AI Write This?

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Speaker 1 (00:04):
Welcome to Tech Stuff, a production from iHeartRadio. Hey there,
and welcome to tech Stuff. I'm your host, Jonathan Strickland.
I'm an executive producer with iHeartRadio. And how the tech
are you. I'm here to tell you something. You write
like a robot. But that's okay because I do too.

(00:25):
One of the founding fathers of the United States, James Madison,
wrote like a robot. Robots weren't even a thing when
he was writing back in the eighteenth century, all right,
so really, I guess it's more fair to say that
robots write like us. And while I'm having a little
bit of fun using the word robots, what I'm really

(00:46):
talking about is generative AI. You know, stuff like chat
GPT and Google Bard, that kind of thing, These AI
powered chat bots right like humans. Right, That's one of
the big suff features of the chatbots. One that they
can understand a prompt that we give them, That they
can understand what we mean when we give them a prompt,

(01:09):
and two that they then generate a response as if
it had been written by an actual person. But obviously
this also creates some challenges, some issues. So you might
remember that since chat GPT became publicly available last year
when OpenAI opened it up and let people start playing
with chat GPT, there were people in education, teachers and

(01:34):
administrators that sort of thing, who raise the alarm about
the possibility that students could use chat GPT and similar
tools to auto generate essays and stuff and thus bypass
school assignments. My robot wrote it for me. Beyond the
education sector, there are plenty of arenas where people are

(01:55):
worried that the less scrupulous folks out there will attempt
to pass off AI generated text as their own writing,
whether this is creative writing or business writing, whatever it
may be. So this then leads us to the concept
of AI writing detection tools, you know, some sort of

(02:19):
tool to determine if a piece of text originated from
a real human being or from that character that Haley
Joel Osmon played in that film about artificial intelligence. I
forget what that movie was called. Subsequent to the release
of these detection tools, we started hearing reports of teachers

(02:39):
failing students, sometimes an entire class of students, because the
detection tool indicated that the real source of the works
that were being turned in by the students it wasn't
from the students, but from AI. Now a lot of
students have actually come forward to argue that no, no,
they actually wrote those pieces themselves, that they authored that work,

(03:03):
they didn't use AI to do it, and that they
are the victim of false positives, that these writing detection
tools made a mistake, and as it turns out, at
least some of them, and likely a lot of them
were telling the truth. And we can say that because
these AI writing detection tools have abysmal accuracy rates, they

(03:25):
are worse than chance. That's how bad these tools can be.
So the success rate for an AI writing detector can
be so low that it has led some of the
companies to shut them down, and it led to a
lot of critics to just dismiss the concept of an
AI writing tool entirely. In fact, there are quite a

(03:48):
few who have argued that AI writing detection tools are
essentially snake oil. That there are companies that are making
what they say are reliable tools that can tell the
difference between text that was written by person and text
that was written by AI, but really they're just peddling
a hoax or a scam, and they're trying to make

(04:08):
money selling these tools to various organizations like schools and such,
but in fact those tools don't work, or at least
they don't work very well. Even open Ai, which is
the company that is responsible for chat GPT, they had
a tool that was meant to be a detection tool

(04:28):
to tell whether or not something was written by AI.
It was called AI Classifier, but they shut it down
earlier this year. Why because its accuracy rate was twenty
six percent. Twenty six percent accurate, that is bonkers. That
means nearly three quarters of the time that detection tool

(04:52):
came up with the wrong answer. Either it gave a
pass to an AI generated piece, or it accused a
work that a human being actually wrote, like definitively wrote,
as being the product of AI. This brings us to
James Madison. James Madison wrote the US Constitution, and folks

(05:14):
have fed the US Constitution into these AI writing detection
tools and received a notification that this piece was very
likely written by AI, which obviously led to lots of
jocularity on the Internet, as people said, I knew it.
I knew that the founding fathers of the United States
of America were really robots from the future sent back

(05:34):
in time to create a ultra capitalist society that preys
upon the disenfranchised or something like. There are a lot
of jokes about it, but the fact is no, it's
just that this writing detection tool is completely unreliable. So
you certainly cannot use these kinds of tools to justify

(05:55):
flunking an entire class of students when you know that
the reliability is so low. Now, I decided to do
this short episode about AI writing detection tools after reading
a couple of great pieces in Ours Technico. Those of
y'all who listen to my show frequently know that I
often reference Ours Technica because the folks there reliably post

(06:20):
great articles. So in this case, the author of both
pieces I read was BENJ. Edwards b E and J. Edwards,
And at some point I probably should reach out to
them and ask if they would like to join tech
stuff for an episode to talk about something like generative AI,
because Edwards has done some really good work. Anyways, as

(06:41):
we think about the issue about how this generative AI works,
the underlying technology that powers generative AI, we start to
see why there's this big reliability problem. Why are we
having such issues with an automated detection tool? Really determining

(07:04):
if something was written by a person or AI. And
it's because the tools like chat GPT are built on
top of large language models, also known as llms, And
if we take a moment to really understand llms, then
we start to get a handle on why these detector
tools are so unreliable. So first off, let's actually talk

(07:27):
about a precursor to large language models. This would be
recurrent neural networks or r ends. Now I've talked a
lot about neural networks on this show, but just as
a refresher. Neural network is an attempt to create a
computer system or computer model that processes information in a

(07:48):
way that is similar to how our brains process information.
So you have layers of artificial neurons, or you can
think of them as nodes. These layers connect to other
artificial neurons. You have multiple connections from neuron to other neurons,
and you have layers that go from top to bottom.

(08:09):
You can think of it like at the top that's
where you put input and at the bottom that's where
you get output. So essentially, you feed information into the
model and then the information goes through a series of
operations in which data passes through these different nodes, and
the nodes make decisions based upon the input, and then
they send output to different nodes and eventually you get

(08:33):
the ultimate output. And sometimes that output is correct. It
gives you the answer that is correct. Sometimes it's wrong.
And typically what that means is that you then have
to adjust how those artificial neurons are making decisions. Those
neurons apply a sort of bias to input, we call
it a weight, so they will favor some types of

(08:56):
input over others in an effort to make a decision.
If they didn't, then the data would never go anywhere.
You would never be able to have it processed through
the system. So the weighting affects how the neuron actually
processes the data, where does it pass it on to.
So it may say, if value is greater than X,
send to node A. If value is less than x,

(09:20):
send to node B. That could be a very basic weight.
X would be the weight in that case, and maybe
that would lead you to a correct outcome. So by
adjusting the weighting, you can change how these neurons make decisions.
And if you build a neural network for the purposes,
let's give it a hypothetical. Let's say it's identifying pictures

(09:42):
of cats. It's always my go to. And you start
looking at the output and you see that it is
mistakenly saying that pictures of flowers are pictures of cats.
You would say, all right, these artificial neural networks, the
nodes in this artificial neural network are making the wrong decisions.
The waiting is wrong in these nodes. I need to

(10:04):
go and start adjusting things so that I can start
to get back to this correctly saying whether or not
an image has a cat in it or doesn't. And
your goal is to train this model over and over
and over again until it gets better and better at
this task, so that then you can just send it
any raw data you like and not have to worry

(10:26):
about checking up on it afterward because its accuracy level
will be high enough to be reliable. That's your ultimate goal,
But there's a whole process of learning of training that
you have to go through first. Now, a recurrent neural network,
it's a little more specific than just artificial neural network.
Recurrent neural networks use sequential data. These networks can and

(10:50):
do take information from earlier inputs into consideration when processing
a new input, so there's a different model, the convolutional
neural network CNN, not the news channel. This is the
other big type of neural network where every time data
goes into an input, it's like a blank slate. It's

(11:11):
its own thing, it has nothing about That decision is
based upon any past decision. It's an instance by instance
kind of case. So you're starting from scratch. But with
recurrent neural networks, the network can actually incorporate past inputs
as part of how it processes a current input. But

(11:31):
one issue with these types of networks, the recurrent neural
networks is that they need a full sequence before they
can process the information. So when we're talking about text,
like if we wanted to process text through a recurrent
neural network, it would need to work over the entire
text before producing a result in order to understand things

(11:53):
like context. Sometimes this approach can lead to errors because
the model essentially forgets the stuff that was at the
beginning of the text by the time it gets to
the end, which sounds a lot like me honestly, where
I will finish a book and then I'll think, like
I'll have a discussion with someone about a book that

(12:13):
we've both read and they'll be like, Oh, I like
that part where in early in the book blah blah
blah blah blah, and it pays off much later, and meanwhile,
I'm thinking, I totally forgot that that happened earlier in
the book. I remember where we ended up, but I
don't remember how we got there. Recurrent neural networks can
fall into the same sort of trap, and so that

(12:34):
creates a bit of a hurdle when it comes to
things like analyzing text for the purposes of building natural
language systems. But I'll explain how that all started to
change in twenty seventeen. First, however, we need to take
a quick break to thank our sponsors. Okay, before the break,

(13:05):
I was talking about recurrent neural networks and how those
have certain limitations when it comes to the way they
process data because it has to be sequential. Well, in
twenty seventeen, a group of AI researchers who were working
specifically over at Google were coming up with an alternative
to this approach, and they published a paper, and the

(13:27):
paper's title was Attention is All You Need, in which
they suggested that you could do something differently from the
recurrent neural network approach for the purposes of analyzing stuff
like text. Their approach was what they called a transformer model.
While you're old, RNN would analyze text essentially a character

(13:49):
at a time, not even a word at a time,
but a character at a time, and thus that's sequential, right.
The sequential data is character by character. It builds this
up and then analyzes the whole thing. The transformer model
instead would tackle a sentence as a unit as opposed
to a character or even an entire passage of text

(14:10):
would be a single unit, and so it would analyze
this to understand the context of what was being said,
and that's a huge benefit you. Getting a handle on
context is absolutely critical to understanding what someone means, because
words can have multiple meanings, right, and without context, we
can't really be sure which meaning someone intended. So here's

(14:33):
an example. The English word late. That can mean a
lot of things if you're an English speaker. So if
you're talking about the time of day, if you say
it's late, you usually mean it's getting close to night time.
You could say it's late at night, which means it's
actually close to morning time, or maybe it even is
the morning because it's still dark. And so you think

(14:55):
of it as night, but technically speaking, it's morning and
you're just saying it's late at night. If you're saying
somebody is late, you could either mean they are not
on time for some appointment, or tragically, you could mean
that this is a person who has passed away. They
are late. But you need the rest of the sentence.

(15:17):
You need that context to understand what meaning of late
was actually intended. So you need that contextual vision to
be able to understand the whole thing. So transformer models
began to revolutionize certain types of AI applications, specifically in

(15:37):
the realm of natural language processing and generative AI, and
it's what led to the development of large language models
the lms. Essentially, a large language model is just a
huge transformer model. And to make a large language model,
you need a lot of text to train your model,

(16:01):
like a lot a lot. Open AI trained its large
language model known as GPT, which stands for Generative pre
Trained Transformer. They trained it on countless documents, millions and
millions of documents found across the web. Some authors allege

(16:22):
that the training material included copyrighted material and that the
authors did not give permission for their works to be
part of the information that fed into this model. That
leads into its own set of problems that are a
little bit beyond the scope of what I'm talking about today,
but they are big problems and they're ongoing now. Stephen
King argued that his works were clearly used to train

(16:45):
up large language models. A dead giveaway is if you
ask a chatbot built on top of a large language
model to recite passages from specific authors works, and if
it can do that like accurate, like it's really giving
you an accurate representation of that text. Yeah, there's no

(17:06):
way could have received that information without having trained on
the original text at least somewhere. Now, if it's just
making stuff up, that's different. That falls into the category
of hallucinations, which we might touch upon again before we
finish shut this episode. Anyway, the benefit of feeding so

(17:30):
much information to a transformer model is that the transformer model,
the large language model, gets pretty darn good at sussing
out context. Even stuff that you would expect would trip
up an AI chatbot can become a breeze. You know,
you might think that slang or idioms could trip up
an AI tool, but then you have to remember that

(17:52):
these tools rely on essentially all the stuff that's on
the Internet, at least all the stuff that's publicly available
that's not locked behind something, and maybe even some stuff
that is locked behind stuff. As it turns out, and
as such, that means that these models have trained with
data sets that originate from the same communities that are

(18:13):
creating the culture that generates certain slang and idioms in
the first place. So if your AI model is using
the same source material where these turns of phrase and
certain slang terms are are originating from, well, of course
it's going to understand it because that was part of
its training, so it has that grounding. It's not like me,

(18:36):
where I am old. I don't understand slang that the
kids use these days because I'm not in those communities.
You wouldn't expect me to understand. I am definitely the
stereotypical out of touch old dude. So when I hear
people about, you know, people rizing up, I'm like, wait what?

(18:56):
And I have to look things up. And as we
all know, urban dictionary is not the most reliable of resources.
It is frequently entertaining, usually in a way that is
incredibly offensive, but it's not always accurate anyway. This ultimately
starts to lead us to why these AI writing detection
tools are not very good. The material that AI generates

(19:21):
is built upon how we communicate. It's a built on
how we write. That's how it was trained. So it's
not like AI or robots, as I was facetiously saying
earlier in the episode. It's not like AI has a
different path toward writing than we do. The AI is
not following an established set of rules that's unique to AI. Right,

(19:43):
They're not saying, write this like artificial intelligence. So the
stuff that AI produces can come across as very human
and vice versa. Now, this does not mean that it
is absolutely impossible for someone like a teacher to tell
if something was written by AI or a student. If

(20:04):
the teacher is actually really familiar with the writing style
of that student or students in question, it's entirely possible
that the teacher might notice if that writing style were
to suddenly and maybe significantly change between assignments. This can
be a big ask, by the way, for certain teachers,
because class sizes can get huge depending on where you are,

(20:27):
and if you're talking about an overworked English teacher who's
teaching multiple classes and each class has got, you know,
thirty kids in it. It can be hard to really
build up a working knowledge and memory of the writing
styles of every single person in every single class. But
that is one way that teachers can tell. If teachers

(20:49):
read an essay and think, wow, you know, Robert didn't
write like this in the essay we did last month,
this is a very different approach to writing and per
perhaps that's an indicator that someone else wrote the piece,
whether that was AI or maybe you know, another human being,
and that can be an indication something hinky is going on. Also,

(21:12):
I mean, obviously some people get sloppy. This happens a
lot too when people just aren't paying attention as they're
using AI to generate either you know, an educational assignment
or business or whatever. There have been so many examples
of how people have accidentally copied and pasted not just

(21:33):
the body of the text, but stuff that's outside the
body of the text, like it might even be a
little disclaimer saying it was made by AI, or it
could be a command like regenerate response. That's something you
find in certain chat bots, and that is just what
regenerate response means. It just means, hey, can you create

(21:55):
a new AI response to the initial prompt I gave you.
So I wrote a prompt, I had you generate response.
I want you to create a whole new response based
on that original prompt. If you have regenerate response written
at in your essay, that's a dead giveaway that you

(22:18):
copied and pasted that essay off of an AI chatbot.
So there are ways that teachers can tell the difference,
but they aren't. It's not as granular as saying, oh,
this is clearly something that was written by artificial intelligence
versus this was written by a human. It's more like
this is different from what I have received before from

(22:41):
this particular student, or this contains obvious errors that reveal
that the student has used AI. Now, the AI writing
detection tools are at least claiming to use a couple
of strategies to try and determine if something was written
by AI or a human. So they're saying, we can

(23:04):
automate that process, and we can actually analyze a block
of text and give you a determination as to whether
or not that was made by AI or a human,
which suggests that maybe there is some sort of fundamental
difference between the way AI generates content and the way
people do. But these strategies that the AI writing detection

(23:27):
tools are built upon have fundamental flaws, and we know
that because we know the tools are bad. It was
bad enough for open ai to shut down its version
back in June. So this isn't like just us postulating
that these tools are bad. We know they're bad. We
know they create things like false positives. So knowing that

(23:49):
already they are unreliable, you then have to start asking, well,
why are they unreliable? What are the things that are
leading these tools to make these wrong determinations? And when
we come back, I'll talk about how Bene Edwards and
those OURS Technica articles really kind of digs into two
main concepts that end up leading to these writing detection

(24:14):
tools trying to make a determination and why they are
fundamentally flawed. But first let's take another quick break. So
before the break, I mentioned that I was going to

(24:35):
talk about some strategies that Binge Edwards outlines in his
RS Technica articles, and they fall into two categories. The
first is called perplexity, and that really means how surprising
or perplexing are the word choices, how creative are the
sentences in a given piece of text compared to an

(24:59):
AI training model. So the thinking behind this is that
if a block of text seems to conform to the
same sort of stuff that the language model would produce,
then AI probably created the text. That's the idea if
they're saying essentially that if the text is really similar

(25:22):
to what AI would create, then AI probably created it.
And let's think about how some tools use autocomplete to
help you write a text or sentence. Using a purely
hypothetical scenario to kind of get our minds wrapped around this,
Let's say that you were typing into something that has
autocomplete built into it, the sentence or the phrase I'm

(25:43):
going to go for a and then whatever tool you're
typing it into suggests the word walk as an autocomplete option. Well,
that would be because the language model that is powering
this autocomplete function has a has sampled millions of passages,

(26:04):
millions and millions and millions of documents, and has found
that the word walk has been the most common word
to follow the phrase I'm going to go for a
and so therefore it offers that as the suggestion, and
maybe it would even offer you a few options. Maybe
it would say walk, maybe it'd say swim in the UK,

(26:27):
maybe it'd say a curry. Who knows so, but you know,
it would give you maybe a couple of different options,
but they would be the ones that would most likely
follow that phrase based upon the training material that that
large language model had used to build itself up. Right,
So if you were to measure the perplexity of the

(26:49):
sentence I'm going to go for a walk, it would
be very very low, very low perplexity because it's in
line with what the language model would expect. So the
thought is, if a passage in general has a very
low perplexity, these tools tend to suspect that the passage
as a whole could have come from AI. So let's

(27:11):
say that it had a very hyperplexity. Let's say that
instead of saying I'm going to go for a walk,
you said I'm going to go for a zebra or
zebra if you're in the UK. Well, that doesn't want
it doesn't really make any sense. But two, that would
be very perplexing, right, that would not be something that
the large language model would expect. And so if it
has high perplexity, then the writing detection tool is more

(27:35):
likely to say this was written by a human, because
what generative chat system would have made that sentence, And
he's like, no, sane robot would say I'm going to
go for a zebra. Clearly some human wrote this. Now,
the problem is these companies are training their large language
models on enormous amounts of human generated text. And unless

(27:56):
you're purposefully trying to be really a original in your writing,
that's a kind way of saying you're being a weirdo
as you're writing your sentences. Chances are a lot of
the stuff you're writing is going to have a fairly
low perplexity, unless you're trying to write in like the
milieu of humor or absurdity. If unless you're purposely trying

(28:18):
to do that, then chances are your perplexity is going
to be pretty low too. Particularly for very structured writing
like business writing or academic writing, that perplexity is going
to be very low. So unless you're prone to throwing
in very odd, random, weird sentences like William Shakespeare's Othello

(28:38):
is one of the great tragedies of English theater, and
also I enjoy shoving hot dogs through mail slots. Well,
there's a pretty good chance that an AI detector tool
is going to think that your human written, legitimate essay
was in fact an AI's work, because the perplexity would
likely be pretty low, again unless you're doing something really

(29:00):
avant garde, So that there's a fundamental flaw and logic
of using perplexity as one of your metrics for determining
if something was written by AI versus a human. Ben
Jedwards also goes on to explain that another factor that
AI detection tools will take into consideration is one that's
called burstiness. Perplexity and burstiness makes me feel like I've

(29:25):
fallen into a Lewis Carroll novel. But anyway, burstiness really
has to do with variability, particularly between sentences. So y'all
probably have noticed I have a tendency toward really long sentences,
and often with a lot of parentheticals thrown in there. Now,
if I also incorporate short sentences on occasion, breaking up

(29:48):
these very long sentences, this creates a lot more variety,
a lot more dynamic elements between my sentences, because I'm
switching back and forth between these very long, pontificating sentences
and then short ones to make a point. Maybe in
some sentences I use tons of adverbs to describe action.

(30:09):
Maybe in the next sentence I don't use any adverbs
at all. This is what creates that variability. The conventional
wisdom is that AI generated work is more uniform, it's
more consistent, it has less variability from sentence to sentence.
Your sentence length and complexity is going to remain more

(30:30):
or less the same throughout an entire passage. So if
you're able to qualify how dynamic a writing style is,
the thinking goes. You could potentially determine if a human
wrote it or if an AI wrote that specific piece.
If it's not very dynamic, well that leads more toward AI.

(30:51):
But that approach depends upon a couple of things that
are not always reliable. So first up, it assumes that
AI generated content is going to contain you to show
more consistency than the stuff that humans. Right, that's going
to continue to be this very consistent approach to sentence structure.
But the language models and the generative AI that are

(31:13):
built on top of the language models are growing more
sophisticated all the time. A lot of these companies that
make these language models are mining platforms like x formerly
known as Twitter or Reddit in order to train their
language models. They're reading these sort of idiosyncratic messages that
people write. Sometimes they're writing purposefully in a way that

(31:36):
is not consistent, and it can get to be a
little unpredictable. Well, if you're training your language model on
these things, then over time the language models and the
tools that are built on top of them begin to
reflect that training material. It means that we should expect
generative AI to start increasing variability in sentence because that's

(32:01):
what we're training it on. You can't expect to train
it on one thing and it generates something totally different.
It's going to kind of mimic the material that was
used to teach it in the first place. So that
means you're going to see a reduction in the gap
between how AI creates text and how humans do. But
on top of that, again, for certain types of writing,

(32:25):
human authors may take a more structured approach and they
may purposefully reduce variability between sentences or unconsciously reduce variability.
That means that their writing is going to start looking
more like the stuff that these writing detection tools assume.
Is a marker for AI generated content. If I were

(32:45):
to write a term paper, I would probably take a
more consistent, uniform approach to my writing style. That's not
to suggest that would be the right choice, right, Like,
I'm not saying that if you write a term paper
you need to have this very consistent, uniform approach because
they can get really boring to read papers that are
written in a style like that. But that would probably

(33:07):
be my inclination, like thinking in my head, I'd be
I want to make sure I'm consistent, I'm academic, i
am thoughtful, I'm methodical. That means that the work I
would produce would have this low burstiness because I was
purposefully doing it. Even if that was the wrong decision,
it probably be the one that I would make because
I'd be working under the mistaken belief that this is

(33:29):
somehow more academic. So these AI writing detection tools are
looking for texts that has low burstiness and low perplexity
before suggesting that AI had created that particular block of text.
But as we've talked about, humans right in that kind
of style too, particularly for formal writing, and so you
get a lot of false positives, like if you feed

(33:49):
the US Constitution to a writing detection tool, and it says, well,
Ai wrote this, Well, a lot of stuff has been
written about the Constitution, including passages from the content Institution.
The Constitution itself is clearly available on the web, so
it's obviously part of these large language models training sets.
So of course it's going to reflect what's in the

(34:12):
training set. It was literally incorporated into it. So if
you're working backward from that logic, then your conclusion, oh
Ai wrote this because it reflects what the language model
was trained on. Well, yeah, but that's because the language
model was literally trained on the material you were analyzing.

(34:34):
It becomes the sort of catch twenty two sort of situation.
So we cannot rely on these detection tools in large part. Now,
this doesn't even touch upon the challenges that non native
English speakers face with their writing. When they're writing in
English and these AI detection tools are used on their work,
they can face disproportionate bias when it comes to these

(34:57):
detection tools. They get a lot more false positive So
you're already seeing a lot of false positives anyway, because
as we've discussed, the criteria being used by these AI
writing detection tools are faulty because it's making assumptions that
humans are not writing in those styles when in fact
they are, and that AI is writing in one specific style,

(35:20):
when in fact, at least over time, it migrates away
from that. So you got a double whammy here. Now,
there are some applications of AI detection tools where it
works and it makes sense, just not in writing, but
for stuff like photo or video manipulation. AI detection tools
can still look for telltale signs that can indicate that

(35:45):
maybe what you're looking at has at least in some
part been created by a generative AI tool, right like
an image creation tool. Obviously, there are examples of this
where you take one look and you know immediately that
this was made by AI, because you look at it
and you're like, no one has that many fingers on
one hand, but there are other cases where it may not.

(36:06):
It may be far more subtle to a human perception,
but if you were to actually analyze the image deeply
with a very well trained AI detection tool, it could
indicate this was made by AI because of little subtle things.
Maybe it's inconsistent lighting, Maybe it's a blinking pattern of

(36:30):
a person in a video, things like that, Little things
that would be hard for us to spot as human beings,
but easy for a detection tool to spot. These AI
detection tools make sense. They're not necessarily foolproof or flawless,
but they have a better success rate than when it

(36:51):
comes to writing, because it's just not that clear cut
when we're talking about writing. This is unfortunate when teachers
may rely heavily on AI writing detection tools in order
to determine if their students are actually doing their own
work or not. If the teachers are unaware that these
detection tools are unreliable, they can make some really drastic

(37:13):
decisions that will have a huge negative impact on their students'
work and lives, and that's not really fair. Hopefully, the
educators out there are themselves educating themselves to be repetitive
about these tools and their unreliability, because otherwise they're going

(37:33):
to be punishing students and they can't justify it because
it's all based on a tool that has proven to
be unreliable at the get go, unless, of course, we're
talking about instances where someone has copy and pasted some
ridiculous part of an AI generated response that just gives

(37:54):
it away. That's a different case. Entirely obviously, But yeah,
I think it's important to understand the limitations of these
As we explore generative AI, and we look at the
pros and the cons and we consider the impact the
generative AI has on multiple segments of our lives, we

(38:15):
also have to really think about how do we know
when it's in use, and how do we know that
the tools we're using to make those determinations are actually
good tools. In the case of these AI writing detection tools,
it looks to me like you might as well not
even look at them. You are more likely than not

(38:37):
to get an incorrect answer, because again, we train these
generative tools to communicate very much the way humans do,
at least in certain use cases, and those use cases
typically are the ones where we're most concerned about whether
or not AI was put to use in the first place.
So really interesting articles over on Ours Technica. It leads

(39:00):
to this really deep discussion about generative AI, the limitations
that we have in detecting it, And obviously there are
a lot of other things we could touch on. I
mentioned copyright. That's a big one, because if AI can
regurgitate copyrighted works with no flaws, then that can be

(39:24):
a huge blow to authors, for example, or we talked
a little bit about hallucinations. Hallucinations are when an AI
tool does not have the information to be able to
determine what should come next in a sentence. You have
to remember when you really boil it down these AI

(39:46):
generative tools, what they're doing is they're following a very
sophisticated statistical model to determine what should come next in
its answer. So you give it a prompt and it's
referencing this incredibly complicated statistical model to say, all right,
what should I put as a response. Some of the

(40:07):
information involves things like the actual answers to questions, but
there are cases where the AI model may be unable
to identify what the answer to the question is, but
it still needs to answer your query. It doesn't have
the answer, so it makes it up, but following this
very sophisticated statistical model so that the answer it generates

(40:29):
appears to be valid even though it's just completely made up.
This is what we call hallucinations in AI. It's when
AI generates an answer in order to respond to a query,
but that answer is fabricated. It's a confabulation. That's another
word that some people are using rather than hallucination, and

(40:50):
it comes across as being very much legitimate because again,
these very sophisticated statistical models make it seem authoritative and knowledge.
The way the sentences are structured, it doesn't come across
wishy washy. It's not like maybe it's blah blah blah.
It ends up being it's blah blah blah and presented
in such a way that you feel like it's reliable,

(41:12):
even though ultimately it's not. That's another issue. It's related
to what we're talking about. And it's also means that
as a student, or as a business writer or as
a lawyer, as one person found out earlier this year,
you should not rely on generative AI as your one
and only source for anything AI. Generative AI has even

(41:36):
been found to fabricate quotations from people. Obviously that's not
good either. There are lots of issues here. Anyway. I
hope that was some food for thought for y'all. I
hope you're doing well. I will talk to you again
really soon. Tech Stuff is an eye Heart Radio production.

(42:01):
For more podcasts from iHeartRadio, visit the iHeartRadio app, Apple Podcasts,
or wherever you listen to your favorite shows

All Episodes

Episode Transcript

TechStuff News

Follow Us On

Hosts And Creators

Oz Woloshyn

Karah Preiss

Show Links

Popular Podcasts

Crime Junkie

24/7 News: The Latest

Stuff You Should Know

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}Did AI Write This?