A Quick Chat About CAPTCHAs

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Speaker 1 (00:04):
Welcome to Tech Stuff, a production from iHeartRadio. Hey there,
and welcome to tech Stuff. I'm your host, Jonathan Strickland.
I'm an executive producer with iHeart Podcasts. And how the
tech are you. I thought today we could do a
real quickie, because you know, sometimes it's nice just to

(00:25):
do a short thing to talk about a subject in tech.
And there are a whole bunch of tech Stuff episodes
in which I have talked about the Turing test. So
there are a lot of different variations of the Turing test.
It's based off a thought experiment from Alan Turing, the
computer scientist, very influential, very important in World War Two.

(00:49):
He helped crack the Enigma code. And actually the movie
that sort of depicted his efforts in cracking Enigma is
called the Imitation Game. Well, the Imitation Game makes reference
that is like part of the Turing test. So he
kind of proposed this test when people would ask him

(01:09):
if he thought machines would be capable of thought. Now
keep in mind, like this is back in the forties
and fifties, do you think that machines will be capable
of thinking? And he said, I don't really think that's
a very interesting question. For one thing, I don't know
that there's any meaningful way to answer it. However, I
do think we can be a little more precise if
we think about it in terms of kind of a

(01:32):
thought experiment, a test. So imagine this is the situation
you find yourself in. You go into a room and
there's a computer terminal there, and that's it. You know,
you can't see into any other rooms or anything. It's
just a desk with a computer terminal. And you sit
down at this terminal and there's a little prompt there

(01:52):
that lets you get into a chat session. And you
enter into this chat session, and you have five minutes
and you can ask the person on the other end
of the chat session any questions you want within that
five minute time frame. And once those five minutes are up,
you're asked to determine was the person on the other

(02:13):
end of the chat an actual human being or was
it a computer program? Was it some form of artificial intelligence?
A bot is what we would call it today. And
if you are unable to determine whether the subject on
the other end of the chat is human or a
bot to any reliable degree, then you could say that, oh,

(02:35):
that that program passed the Turing test, I could find
no way of telling the difference between that computer program
and an actual person. Touring suggests that due to advancements
in computer science, he suspected that people would have at
best a success rate of around seventy percent to be

(02:56):
able to tell whether or not the quote unquote person
on the other end of the church chat was a
human being or a computer program, And he said he
expected that to be capable in just a few years time.
It took a little bit longer than that, but I
would say that with the sophistication we've reached with chatbots
these days, I think you could fairly conclusively say that

(03:17):
we've got programs out there that can quote unquote beat
the Turing test. Part of the problem is that Turing
was saying that in the future, these programs are going
to be sophisticated enough that they will fool people into
thinking it's another person. He wasn't saying, oh, you have
to meet this specific threshold for your system to have

(03:39):
achieved beating the Turing test. That would come afterward. Other
people would kind of create the criteria. But since then
people have subsequently used the phrase Turing test to reference.
Essentially any kind of task designed to determine if a
machine has or at least appears to have the property
of intell and when I say that, I mean really

(04:01):
general intelligence. But there's another specific use of Turing tests
that I would like to bring up today, and that
is the completely automated public Turing test to tell computers
and humans apart that once you turn it into an acronym,
becomes cap chop. So these are those little tasks that

(04:25):
you occasionally encounter on certain websites, and they require you
to do something like you type in a string of
characters that are displayed on screen. They're usually deformed in
some way and up against a crazy background. Or you
might be given a big selection of images and told
to pick out all the ones that have a cat

(04:45):
in them or something. Or you might have to drag
a little picture of a puzzle piece into an image
where it fits into a very specific spot. All of
these are meant to separate out actual human visitors to
a website service versus all the automated programs or bots
or whatever you might want to call it. So I

(05:05):
thought it would be fun to do a quick episode
on where captures came from and what purpose they serve
kind of touched on that already, but also how they
fit into the grand picture of artificial intelligence, because interestingly
they play a pretty important part. They have helped drive
the development and advancement of artificial intelligence, not necessarily in

(05:26):
a way that is helpful to everybody out there, but
it certainly has served as a way to get people
thinking about how to tackle certain AI problems. So our
story actually begins with the good old website Yahoo. Y'all
remember Yahoo. I mean, it's still a thing, but I
remember a time where Yahoo was practically synonymous with Internet

(05:50):
for a lot of folks. You may not even remember
this if you haven't been on Yahoo in ages, but
once upon a time, Yahoo was sort of an ordal
to the rest of the Internet. Yahoo was kind of
like a landing page. A lot of people had it
set as their homepage, so when they would go into
a web browser, they'd go right into Yahoo and you
would find articles there and all sorts of other links,

(06:14):
as well as chat rooms and of course the search
engine where you could search for other stuff online besides
the stuff that just popped up on Yahoo. Well, in
those chat rooms, moderators were running into a really serious problem.
The chat spaces were becoming invaded by bots posing as people.

(06:35):
Now this is in two thousand. The bots were not
particularly sophisticated, but they were creating a lot of spam, Like,
they were jamming up chat spaces with just spam messages
while people are trying to chat. In some cases, they
were gathering personal information of users in an effort to
exploit those users in some way or another. So Yahoo

(06:56):
didn't want this to keep going. It wasn't reflecting well
on the companies. So they turned to the computer science
department at Carnegie Mellon University in order to see, Hey,
is there some way that we could, you know, kind
of like have a bouncer out front, a gate keeper
if you will, that would allow humans into the various

(07:18):
systems so that they can make use of them the
way they were intended, but prevent all the robots, all
the AI programs, all the computer software or algorithms, however
you want to define it, keep them from getting access.
So a team led by Manuel Bloom and including folks
or Blum, I suppose, and including folks like John Langford,

(07:40):
Louis von On, Nicholas Hopper, and others tackled this challenge,
so they needed to come up with a test. Now,
in an ideal world, the test would be a cinch
for a human being to complete, but it would be
a real stumper for algorithmically driven by And that is

(08:01):
the basic philosophy of capture. Make a test that humans
find really easy to complete, perhaps even trivial, like it's
just a mild inconvenience, as they say, but for bots
it's like a turn away. You're never going to be
able to get this. Now, some of y'all might be
saying something along the lines of but Jonathan, whenever I

(08:25):
run into captures these days, they're sometimes really hard, Like
it's hard to see what they spell out. I'll try
and type things in three four times and get kicked out.
And you're right, that is a problem. It is something
that actually is happening. It doesn't mean that you're not human.
If you're having like existential crises. I would like to

(08:46):
set your mind at ease by saying you're probably human.
I mean, I don't think I could say anything for certain,
but I feel fairly confidence saying you're probably human. But
the reason why captures have become really difficult in some
way cases anyway with some specific types of captures is
largely because other programmers figured out how to make better

(09:07):
automated programs that can parse and respond to captures. So
as one group of programmers figured out how to design
tools to defeat a capture, the captured designers would go
back to the drawing board to create new tests to
be more challenging for those bots, to say, well, they
got good at this, let's change these things and reintroduce

(09:31):
the capture so that this will trip up those systems
because while they're good at what we used to use
for gatekeeping, they've never run into this before, and unfortunately
that sometimes means that the tests become more challenging for
human beings as well. It no longer is a case

(09:52):
where something is trivial for a human but difficult for robots,
at least for certain types of captures. And that's particularly
true so if the human has some impairments like if
they have color blindness for example, or some other visual
or impairment like there are real issues in making captures
that do what they're supposed to do, that is weed

(10:14):
out all the non humans but also be accessible to
all humans, even those who might have impairments that would
otherwise make it difficult or challenging to complete a capture.
It is not an easy path to walk. We're going
to take a quick break. When we come back, i'll
talk more about the capture story we're back. So in

(10:46):
the early days of captures, they mostly took on the
form of distorted text that was printed over a busy background.
And the idea was that most automated programs would not
be able to recognize distorted texts like it would be
an image, not just text letters where it would be
able to read like the code used to generate the

(11:08):
letters and then say, oh, well that's these letters that
can replicate that and get through no problem. You had
to have something that was going to really stump them. Now,
image recognition is a pretty tricky science. I've talked about
it on this show before, Like, training computer systems to
recognize images takes a lot of time and effort and

(11:29):
lots and lots and lots of samples so that the
computer system can quote unquote learn what those images represent. Now,
it's one thing to teach a computer how to recognize
standard letters that are in a recognizable font. So if
the Internet only ever used one font and only used
one size of that font. Then it would be relatively

(11:53):
trivial for those who want to defeat captures, because once
you train a computer vision system on what a lower
case T looks like, for example, then the system would
recognize a lowercase tea every time one popped up. But
of course, there are lots of different fonts and typefaces
on the Internet, and they come in different sizes and
colors and on different backgrounds. So teaching a computer system

(12:15):
what a times new Roman lowercase tea looks like against
a blank background doesn't mean it's also going to recognize
a lowercase tea and some other font on some crazy background.
Plus maybe the tea is a little wavy, a little distorted,
so distorting that text makes it more challenging for image
recognition systems, like they're looking for defining features to be

(12:36):
able to recognize the image of a letter with the
actual letter. You see, humans, when we teach a human
what something looks like, it's a lot easier for humans
to associate other things that look kind of the way
the first example did, but maybe not exactly the same.
So in other words, like the example, I always use

(12:57):
our coffee mugs right. If I show you a coffee mugdug,
and I say this is a coffee mug, and then
I show you a second kind that looks totally different,
different color, different size, you know, whatever, maybe has different
writing on it, whatever it might be. And I say,
this is also a coffee mug. And then I show
you a third example that looks unlike the first two.
You could say, oh, okay, I get the idea. I

(13:18):
get the different features that make up what a coffee
mug is. I understand now. And now when I encounter
different types of coffee mugs, even though they might not
look anything like any of the other ones I've encountered,
I know, Okay, that's probably a coffee mug. Until someone says, no,
that's a teacup, and then your world is turned upside down.

(13:39):
But you get what I'm saying. Computers don't work that way.
Computers like, if you teach it an example is a thing,
it doesn't necessarily understand that similar but distinctly different versions
of that same thing fall into the same category. That
takes lots and lots of training. So the whole idea
of distortion was that this would make it very tricky

(14:03):
for most systems to be able to parse that information
and be able to put it in reliably and to
fool the capture system doesn't mean that it was fool proof.
Over time, those systems did get better at being able
to recognize those figures that were on screen, even better
than humans could in some cases, which is obviously a problem. Now,

(14:25):
there have been lots of other capture systems, not just Capture.
For example, there's one called Asira Asira. Asira did something
I mentioned earlier in the episode. It would present the
visitor with a collection of photographs and they would include
cats and dogs, and it would ask you, okay, identify
the pictures that have cats in them. So that was

(14:47):
one way to get around this was that it wasn't
just figuring out text. It was differentiating between cats and dogs,
something that again computer systems couldn't do just natively. They
had to be taught how to recognize the features that
belonged to a cat versus those that belonged to a dog,
just the same as all other image recognition software. The

(15:08):
folks over at Google developed Recapture, and that actually served
a dual purpose. It was kind of sneaky. So with Recapture,
you would go to a website and you would be
greeted by some you know, kind of grainy text, and
you'd be asked to type it out. You'd actually get
a couple of different ones, not just one. And this
text was from scans made of physical digitized books, so

(15:32):
in other words, books where they had put the page
down on a scanner and created a scan. So some
of these books were in you know, pretty bad shape.
They were at all crisp, clear images. So your first
capture you'd be presented with, Google actually knew the answer
to whatever the word was. So let's say the word

(15:54):
is salamander and you type in salamander, and so Google says,
all right, I already knew that this scanned word is salamander.
This is obviously a person who has typed this in.
But the second image would be a scan from a book.
Maybe it'd be a really smudged one, like one that's
harder to read, and it would ask you, okay, was

(16:16):
this word. Let's say it's surgeon and you type insurgeon. Well,
the secret sauce here is that Google didn't know that
that scanned word was surgeon. What Google was doing was
crowdsourcing crowdsourcing the effort to figure out what the text
in this scanned image actually said. So if you and

(16:40):
thousands of other people all put the same word in
when you were encountering this particular scan, Google would say,
all right, that word is very likely surgeon. Because you know,
ninety eight percent of the people who were shown this
recapture typed surgeon in. So now we know that that
word is which meant that they could essentially transcribe these

(17:04):
digitized texts by using the crowd to do the work
for them. And that is kind of the heart of
where capture and AI meet. That captures have been used
one to help train AI so that it's more effective.
Like if you've encountered other Google ones where it's like
pick all the images here that have motorcycles in them

(17:27):
or stairs. Well, part of that is training Google's image
recognition systems so that they're more accurate. Right, Like an
image recognition system might have trouble differentiating an actual like
stone staircase out in front of a building with a
pedestrian crosswalk, because you know you've got those those broken

(17:48):
lines on a crosswalk, those could look like stairs to
a computer image recognition system. So by giving users the
task of hey, identify all the excit samples in this
list that have stairs in them, Google starts to train
its own image recognition algorithms to be more effective and

(18:09):
more accurate. So in a way, we were essentially being
used as free labor to make these AI systems more accurate,
just so that we could get access to whatever it
was we were trying to visit, whether that was an
online shop or a chat room, or you know, whatever
it might be. So, yeah, we we've been working for free, y'all.

(18:32):
Actually it's in some cases we've been working for free
and denied access to tools that we wanted to use
because the captures were too hard for us to be
able to solve. But yeah, that's that's the quick story
about the history and evolution of captures. Clearly they're still
used today. Sometimes it's something simple like click this box
to prove your human that kind of thing where it

(18:54):
requires you to take an action. Those obviously are much
more simple for humans to comp than for robots, so
those still follow the philosophy of the original captions. A
lot of other ones, though they get pretty tricky, to
the point where sometimes I'm discouraged from even going further
and visiting the website in particular, just like, you know what,

(19:15):
I don't need to feel stupid because I couldn't find
all the fire hydrants in these photographs, so I'm just out.
But yeah, that's it. And like I said, it plays
a really important part with AI. It's kind of a
seesaw effect, right, Like you create a barrier that AI
can't get over until it can, and then you have

(19:36):
to go back and create a harder barrier. And meanwhile,
the folks developing the AI keep making advancements that the
AI gets more sophisticated and powerful over time. So yeah,
delicate balance and not everybody benefits. As I said, hope
that that was interesting and informative to y'all. I hope
you're all doing well, and I'll talk to you again

(19:57):
really soon. Tech Stuff is an iHeartRadio production. For more
podcasts from iHeartRadio, visit the iHeartRadio app, Apple Podcasts, or
wherever you listen to your favorite shows.

All Episodes

Episode Transcript

TechStuff News

Follow Us On

Hosts And Creators

Oz Woloshyn

Karah Preiss

Show Links

Popular Podcasts

Stuff You Should Know

Dateline NBC

Las Culturistas with Matt Rogers and Bowen Yang

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}A Quick Chat About CAPTCHAs