All Episodes

April 29, 2024 46 mins

David is taking his birthday week off and wanted to re-share this episode due to it's ongoing relevance.

Modern AI is blowing everyone’s mind. But is it intelligent like humans, or is it just playing impressive statistical games? Could AI reach or exceed our level of intelligence, and how would we know when it gets there? Traditional tests for intelligence (Turing test, Lovelace test, etc) have long been surpassed, so Eagleman proposes a new kind of test. 

Mark as Played
Transcript

Episode Transcript

Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Speaker 1 (00:00):
Hey, this is David Eagleman and this past week was
my birthday, so I took a week off. So I'm
going to run an episode that I did earlier, episode
number seven. This is called is AI actually intelligent? And
how would we know if it gets there? This episode
is from one year ago, but as time goes on
this becomes more and more relevant, So please enjoy and

(00:21):
I will see you next week with a new episode.
Modern AI is blowing everybody's mind. But is it intelligent
in the same way as the human brain? And could
AI reach sentience? And how would we know when it

(00:41):
gets there? Welcome to Inner Cosmos with me, David Eagleman.
I'm a neuroscientist and an author at Stanford University, and
I've spent my whole career studying the intersection between how
the brain works and how we experience life. Like most

(01:04):
brain researchers, I've been obsessed with questions of intelligence and consciousness.
How do these arise from collections of billions of cells
in our brains? And could intelligence and consciousness arise in
artificial brains? Say on chat GPT. Those are the questions

(01:25):
that we're going to attack today. Early efforts to figure
out the brain, looked at all the billions of cells
and the trillions of connections, and said, look, what if
we just think of each cell as a unit, and
each unit is connected to other units and where they connect,
which is called the sinnapps, or one cell gives a

(01:46):
little signal to the next cell. What if we just
looked at that like a simple connection that has a
strength between zero and one, or zero means there's no connection,
and one means it's the strongest possible connection. So this
was a massive oversimplification of the very complicated biology, but

(02:06):
it allowed people to start thinking about networks and writing
down different ways that you could put artificial neural networks together.
And for more than fifty years now people have been
doing research to show how artificial neural networks can do
really cool things. It's a totally new kind of way
of doing computation. So you've got these units, and you've

(02:29):
got these connections between them, and you've change the strength
of the connections and information flows through the network in
different ways. Now, my colleagues and I have long pointed
out the ways in which biological brands are different and
how artificial neural networks just push around numbers and play

(02:49):
statistical tricks. But we're entering a revolution right now. Large
language models like GPT four or BARD consume trillions of
words on the Internet and they figure out probabilistically which
word is going to come next given the massive context
of all the words that have come before. So these networks,

(03:12):
as I talked about on the previous episode, are showing
incredible successes in everything from writing to art, to coding
to generating three dimensional worlds. They're changing everything, and they're
doing so at a pace that we've never seen before,
and in fact, the entire history of humankind has never

(03:35):
seen before. And there are all the societal questions that
everyone's starting to wrestle with right now, like the massive
potential for displacement of human jobs. But today I want
to zoom in on a question that has captured the
imagination of scientists and philosophers and the general public. Could

(03:58):
aim alive in some way, like become conscious or sentient. Now,
there are lots of ways to think about this. We
can ask whether AI can possess meaningful intelligence, or we
can ask if it is sentient, which means the ability
to feel or perceive things, particularly in terms of sensations

(04:22):
like pleasure and pain and emotions. Or we can ask
whether it is conscious, which involves being aware of one's
self and one's surrounding. Now, there are specific and important
differences between these questions, but really I don't care for
the present conversation. The question we're asking here is is
chat GPT just zeros and ones moving around through transistors

(04:46):
like a giant garage door opener. Or is it thinking?
Is it having some sort of experience? Is it having
a private inner life like the type that we humans have.
As we think about the possible of sentient AI, we
immediately find ourselves facing really deep ethical questions, the main

(05:07):
one being if we were to create a machine with consciousness,
what responsibility do we have to treat it as a
living being? Would you be able to turn it off
when you're done with it at night or would that
be murder? And what if you turn it off and
then you turn it back on. Would that be like
the way that we go into a sleep state at

(05:28):
night where we're totally gone, and then we find ourselves
back online in the morning and we think, yeah, I'm
the same person, but I guess eight hours just disappeared. Anyway,
more generally, would we feel obligated to treat it the
way we treat a sentient fellow human. With our current laptops,
we're used to saying, sure, I can sell it, I

(05:50):
can trade it, I can upgrade it. But what happens
when we reach sentient machines? Can we still do this
or would it somehow be like putting a child up
for adoption or giving your pet away? Things that we
don't take lately. And eventually we're going to have entire
legal precedence built around the question of AI rights and responsibilities.

(06:14):
So that's why today I want to talk about these
issues of intelligence and sentience. Does an AI like chat
gpt experience anything when chat gpt writes a poem? Does
it appreciate the beauty when it types out a joke?
Does it find itself amused and chuckling to itself. Let's

(06:36):
start with a guy named Blake Lemoyne who was a
programmer at Google and in June of twenty twenty two,
he was exchanging messages with a version of Google's conversational AI,
which was called Lambda at the time. So he asked
Namda for an example of what it was afraid of
and it gave him this very eloquent response about how

(07:00):
was afraid of being turned off, So he wrote an
internal memo to Google leadership than which he said, I
think this AI is sentient. And the leadership at Google
felt that this was an entirely unsubstantiated claim, and so
they made the decision to fire him for what they

(07:20):
took as an inappropriate conclusion that just didn't have enough
evidence beyond his intuition to qualify for raising the alarm
on this. So obviously this immediately fired up the news
cycles and the rumor mill and conspiracy theorists thought, Wait,
if AI isn't conscious, why would they fire him. They're
firing of him as all the evidence I need to

(07:41):
tell me that AI is sentient? Okay, but is it?
What does it mean to be conscious or sentient? How
the heck would we know when we have created something
that gets there? How do we know whether the AI
is sentient or instead whether humans are fooling them so
into believing that it is well. One way to make

(08:03):
this distinction would be to see if the AI could
conceptualize things, if it could take lots of words and
facts on the web and abstract those to some bigger idea.
So one of my friends here in Silicon Valley said
to me the other day, I asked chat gpt the
following question, Take a capital letter D and turn it

(08:26):
flat side down. Now take the letter J and slide
it underneath. What does that look like? And chat gpt said,
and umbrella. And my friend was blown away by this,
and he said, this is conceptualization. It's just done three
dimensional reasoning. There's something deeper happening here than just parenting words.

(08:51):
But I pointed out to him that this particular question
about the D on its side and the J underneath
it is one of the oldest examples in psychology classes
when talking about visual imagery, and it's on the Internet
in thousands of places, so of course it got it right.
It's just parroting the answer because it has read the

(09:11):
question and it has read the answer before. So it's
not always easy to determine what's going on for these
models in terms of whether some human somewhere has discussed
this point and written down the answer. And the general
story is that with trillions of words written by humans
over centuries, there are many things beyond your capacity to

(09:35):
read them or to even imagine that they've been written
down before, but maybe they have. If any human has
discussed a question before has conceptualized something, then chat GPT
can find that and mimic that. But that's not conceptualization.
Chat GPT is doing a thousand amazing things, and we

(09:56):
have an enormous amount to learn about it. But we
shouldn't let ourselves get fooled and mesmerized into believing that
it's doing something more than it is. And our ability
to get fooled is not only about the massive statistics
of what it takes in. There are other examples of

(10:16):
seeming sentience that result from the reinforcement learning that it
does with humans. So here's what that means. The network
generates lots of sentences and thousands of humans are involved
in giving it feedback, like a thumbs up or a
thumbs down, to say whether they appreciated the answer, whether

(10:37):
they thought that was a good answer. So, because humans
are giving reward to the machine, sometimes that pushes things
in weird directions that can be mistaken for sentience. For example,
scholars have shown that reinforcement learning with humans makes networks
more likely to say, don't turn me off, just like

(11:01):
Blake had heard but don't mistake this for sentience. It's
only a sign that the machine is saying this because
some of the human participants gave it a thumbs up
when the large language model said this before, and so
it learned to do this again. The fact is, it's
sometimes hard to know why. Sometimes we see an answer

(11:22):
that feels very impressive. But we'd agree that pulling text
from the Internet and parroting it back is not by
itself intelligence or sentience. Chat GPT presumably has no idea
of what it's saying, whether that's a poem or a
terrorist manifesto, or instructions for building a spaceship or a

(11:45):
heartbreaking story about an orphaned child. Chat GPT doesn't know,
and it doesn't care. It's words in and statistical correlations out.
And in fact, there has been a fundamental philosophical point
made about this in the nineteen eighties when the philosopher
John Surrele was wondering about this question of whether a

(12:09):
computer could ever be programmed so that it has a mind,
and he came up with a thought experiment that he
called the Chinese room argument, and it goes like this,
I am locked in a room and questions are passed
to me through a small letter slot, and these messages

(12:30):
are written only in Chinese, and I don't speak Chinese.
I have no clue what's written on these pieces of paper. However,
inside this room, I have a library of books, and
they contain step by step instructions that tell me exactly
what to do with these symbols. So I look at
the grouping of symbols, and I simply follow steps in

(12:52):
the book to tell me what Chinese symbols to copy
down in response. So I write those on the slip
of paper. And when I pass the paper back out
of the slot. Now, when the Chinese speaker receives my
reply message, it makes perfect sense to her. It seems
as though whoever is in the room is answering her

(13:14):
questions perfectly, and therefore it seems obvious that the person
in the room must understand Chinese. I've fooled her, of course,
because I'm only following a set of instructions with no
understanding of what's going on. With enough time and with
a big enough set of instructions, I can answer almost
any question posed to me in Chinese. But I, the operator,

(13:37):
do not understand Chinese. I manipulate symbols all day long,
but I have no idea what the symbols mean. Now,
The philosopher John Searle argued, this is just what's happening
inside a computer. No matter how intelligent a program like
chat GPT seems to be, it's only following sets of

(14:01):
instructions to spit out answers. It's manipulating symbols without ever
really understanding what it's doing. Or think about what Google
is doing. When you send Google a query, it doesn't
understand your question or even its own answer. It simply
moves around zeros and ones and logicates and returns zeros

(14:24):
and ones to you. Or with a mind blowing program
like Google Translate, I can write a sentence in Russian
and it can return the translation in Amharic. But it's
all algorithmic. It's just symbol manipulation. Like the operator inside
the Chinese room, Google Translate doesn't understand anything about the sentence.

(14:47):
Nothing carries any meaning to it. So the Chinese room
argument suggests that AI that mimics human intelligence doesn't actually
understand what it's talking about. There's no meaning to anything,
CHATCHYPT says, and Serle used this thought experiment to argue
that there's something about human brains that won't be explained

(15:10):
if we simply analogize them to digital computers. There's a
gap between symbols that have no meaning and our conscious experience. Now,
there's an ongoing debate about the interpretation of the Chinese

(15:31):
room argument, but however one construes it, the argument exposes
the difficulty in the mystery of how zeros and ones
would ever come to equal our experience of being alive
in the world. Now, just to be very clear on
this point, we don't understand why we are conscious. There's

(15:51):
still a huge amount of work that has to be
done in biology to understand that. But this is just
to say that simply having zeros in one moving around
wouldn't by itself seem to be sufficient for conscious experience.
In other words, how do zeros and ones ever equal
the sting of a hot pepper, or the yellowness of

(16:15):
yellow or the beauty of a sunset. By the way,
I've covered the Chinese room argument in my TV show
The Brain, and if you're interested in that, I'll link
the video on Eagleman dot com slash podcast. Now, all
this is not a criticism of the approach of moving
zeros and ones around. But it is to point out
that we shouldn't confuse this type of Chinese room correlation

(16:39):
with real sentience or intelligence. And there's a deeper reason
to be suspicious too, because despite the incredible successes of
large language models, we also see that they sometimes make
decisions that expose the fact that they don't have any
meaningful model of the In other words, I think we

(17:01):
can gain some fast insight by paying attention to the
places where the AI is not working so well. So
I'll give three quick examples. The first has to do
with humor. AI has a very difficult time making an
original joke, and this is for a simple reason. To
make up a new joke, you need to know what

(17:24):
the ending is and then you work backwards to construct
the joke with red herrings so no one sees where
you're going and it happens at the way these large
language models work is all in the forward direction. They
decide what is the most probable word to come next,
So they're fine at parroting jokes back to us, but

(17:45):
they're total failures at building original jokes. And there's a
deeper point here as well. To build a joke, You
need to have some model, some idea of what will
be funny to a fellow human, what shared concept or
shared experience would make someone laugh. And for that, you
generally need to have the experience of a human life

(18:08):
with all of its joys and slings and arrows and
so on. And these large language models can do a
lot of things, but they don't have any model of
what it is to be a human. My second example
has to do with the flip side of making a joke,
which is getting a joke. And if you look carefully,

(18:28):
you will see how current AI always fails to catch
jokes that are thrown at it. It doesn't get jokes
because it doesn't have a model of what it is
to be a human. But this point goes beyond jokes.
One of the most remarkable feats of these large language
models is summarizing large texts, and in twenty twenty two,

(18:49):
open Ai announced how they could summarize entire books like
Alice in Wonderland. What it does is it generates a
summary of each chapter, and then it uses those after
summaries to make a summary of the whole book. So
for Alice in Wonderland, it generates the following. Alice falls
down a rabbit hole and grows to a giant size.
After drinking a mysterious bottle, she decides to focus on

(19:13):
growing back to her normal size and finding her way
into the garden. She meets the caterpillar, who tells her
that one side of a mushroom will make her grow taller,
the other side shorter. She eats the mushroom and returns
to her normal size. Alice attends a party with the
Mad Hatter and the march Hare. The Queen arrives and
orders the execution of the gardeners for making a mistake

(19:33):
with the roses. Alice saves them by putting them in
a flower pot. The King and Queen of Hearts preside
over a trial. The Queen gets angry and orders Alice
to be sentenced to death. Alice wakes up to find
her sister by her side. So that's pretty remarkable. It
took a whole book, and it was able to summarize
it down to a paragraph. But I kept reading these

(19:56):
text summaries carefully, and I got to the summary of
Act one of Romeo and Juliet, and here's what it says.
Romeo locks himself in his room, no longer in love
with rosalind Now, I think the engineers at open Ai
felt really satisfied with this summary. They thought it was
quite good, and my proof for this is that they

(20:17):
still display it proudly on their website. But I majored
in literature as an undergraduate, and I spend a lot
of time with shakespeare plays, and I immediately knew that
this summary was exactly wrong. The actual scene from Shakespeare
goes like this. His friend ben Voglio finds Romeo catatonically depressed,

(20:38):
and ben Volio says, what sadness lengthens Romeo's hours? And
Romeo says, not having that which having makes them short?
And ben Volio says in love, and Romeo says out
ben Reli says of love, and Romeo says out of
her favor, where I am in love? This this is

(21:00):
typical Shakespearean wordplay, where Romeo is expressing his grief of
being out of favor with Roslin, with whom he is
deeply in love. And when you read the play, it's
obvious that Romeo is not over Roslin. He's suffering over her.
He's almost suicidal. And this is an important piece of
the play, because the play is really about a young

(21:22):
man in love with the idea of being in love,
and that's why he later in the same act, falls
so hard into his relationship with Juliet, a relationship which
ends in their mutual suicide. By the way, as Friar
Lauren says of their relationship, these violent delights have violent ends.
And you get a bonus if you can tell me

(21:43):
where else you've heard that line more recently. Okay, anyway
back to the AI summary, The AI misses this wordplay entirely,
and it concludes that Romeo is out of love with Roslin. Again,
a human watching the play or reading the play immediately
gets that Romeo is making wordplay and his heartbroken over Roslin,

(22:06):
but the AI doesn't get that because it's reading words
only at a statistical level, not at a level of
understanding of what it is to be a human saying
those words. And that leads me to the third example,
which is the difficulty in understanding the physical world. So

(22:26):
consider a question like this, When President Biden walks into
a room, does his head come with him? So this
is famously difficult for AI to answer a question like this,
even though it's trivial for you because the AI doesn't
have an internal model of how everything physically hangs together

(22:46):
in the world. Last week, I was at the TED
conference and I heard a great talk by Yegin Choi,
and she was phrasing this problem as AI not having
common sense. She asked chat GPT the following question, it
takes six hours to dry six shirts in the sun,
how long does it take to dry thirty shirts? And

(23:07):
it answers thirty hours. Now you and I see that
the answer should be six hours, because we know the
sun doesn't care how many shirts are out there. But
chat GPT just doesn't get it because despite appearances, it
doesn't have a model of the world. And we've seen
this sort of thing for years. By the way, even

(23:27):
in mind blowingly impressive AI models that do image recognition,
they're so impressive in what they recognize, but then they'll
fail catastrophically. It's some easy picture making mistakes that a
human just wouldn't make. For example, there's one picture where
there's a boy holding a toothbrush and the AI says
it's a boy with a baseball bat. Okay, so there

(23:49):
are things that AI doesn't do that well. But that said,
there are other things that are mind blowing, things that
no one expected it to do. And this is why
I mentioned in my previous episode that we are in
an era of discovery more than just invention. Everyone's searching

(24:10):
and finding things that the AI can do that nobody
really expected or foresaw, including all the stuff that we're
now taking for granted, like oh, it can summarize books
or it can make art from text. And I want
to point out that a lot of the arguments that
people have been making about AI not being good at something,

(24:30):
these arguments have been changing rapidly. For example, just a
few months ago, people were arguing that AI would make
silly mistakes about things, and it couldn't really understand math
and would get math wrong and word problems. But in
a shockingly brief time, a lot of these shortcomings have
been mastered. So it's yet to be seen what challenges

(24:53):
will remain and for how long. So the evidence I've

(25:14):
presented so far is that AI doesn't have a great
model of what it's like to be human, but that
doesn't necessarily rule out that it has sentience or awareness,
even if it's of another flavor. It doesn't think like
a human, but maybe it stif thinks so is chat

(25:35):
GPT having some sort of experience? And how would we know?
In nineteen fifty, the brilliant mathematician and computer scientist Alan
Turing was asking this question, how could you determine whether
a machine exhibits human like intelligence? So he proposed an

(25:56):
experiment that he called the imitation game. You've got a
machine AI that's programmed to simulate human speech or conversation,
and you place it in a closed room, and in
a second room you have a real human, but the
doors are closed, so you don't know which room has
which machine or human. And now you are a person,

(26:19):
the evaluator, who communicates with both of them via a
computer terminal or I think of a nowadays like text
messaging with both of them. So you, the evaluator, engage
in a conversation with both closed rooms, one of which
has the machine and one the human, and your job
is simply to figure out which is which, which is

(26:40):
the machine and which is the human. And the only
thing that you have to work with are the texts
that are going back and forth. And if you, the evaluator,
cannot tell, that is the moment when machine intelligence has
finally arrived at the level of human intelligence. It has
passed the imitation or what we now call the Touring test.

(27:04):
And this reminds me of this great line in the
first episode of Westworld, where the protagonist William is talking
to the woman who's outfitting him for his adventure in
Westworld and giving him a hat and a gun and
so on, and he hesitantly asks, I hope you don't
mind if I ask you this question, but are you real?
And she says to him, if you can't tell, does

(27:27):
it matter? So I brought this up last episode in
the context of art, where we asked whether it matters
if the art is generated by an AI or a human,
But now this question comes up in the context of
intelligence and sentence. Does it matter whether we can tell
or not? Well, I think we're way beyond the Turing

(27:49):
test nowadays, but I don't feel like it gives us
a good answer to the question of whether the AI
is intelligent and is experiencing an inner life. I mean,
the Sturing test has been the test in the AI
world since the beginning. Why is it the perfect test? No,
but it's really hard to figure out how to test
for intelligence. But we have to be cautious about equating

(28:14):
conversational ability with sentience. Why well, for starters, let's just
acknowledge how easy it is for us to anthropomorphize. That
means to assign human qualities to everything around us. Like
we give animals human names and talk to them as
though they are people, and we project our emotions onto animals.

(28:36):
We make stories about animals that have human like qualities,
and we have animals that talk and wear clothes and
go on adventures in these stories. Every Pixar film that
you watch is about cars or toys or airplanes talking
and having emotions, and we don't even bad an eye
at that stuff. We can, in fact, just watch random

(29:00):
shapes moving around a computer screen and we will assign
intention and feel emotion depending on exactly how they're moving.
If you're interested in this, see the link on the
podcast page to the study by Heighter and Simil in
the nineteen forties where they move shapes around on a screen. Okay,
now this is all related to a point that I

(29:23):
brought up in the last episode, which is how easy
it is to pluck the strings on a human, or,
as the West World writers put it, how hackable humans are.
So I bring all this up to say that just
because you think that an answer sounds very clever or
it sounds like a human really tells us very little

(29:43):
about whether the AI is actually intelligent or sentient. It
only tells us something about the willingness of us as
observers to anthropomorphize, to assign intention where there is none,
Because what chat GPT does is take the structure of
language very impressively and spoon it back to us, and

(30:06):
we hear these well formed sentences, and we can hardly
help but impose sentience on the AI. And part of
the reason is that language is a super compressed package
that needs to be unpacked by the listener's brain for
its meaning. So we generally assume that when we send

(30:27):
our little package of sounds across the air, that it
unpacks and the other person understands exactly what we meant.
So when I say justice or love or suffering, we
all have a different sense in our heads about what
that means, because I'm just sending a few phonemes across

(30:48):
the air, and you have to unpack those words and
interpret them within your own model of the world. I'm
going to come back to this point in future episodes,
but for now, the point I want to make is
that a large language model can generate text statistically and
we can be gobsmacked by the apparent depth of it.

(31:09):
But in part this is because we cannot help but
impose meaning on the words that we receive. We hear
a particular string of sounds and we cannot help but
assume meaning behind it. Okay, so maybe the imitation game
is not really the best test for meaningful intelligence, but
there are other tests out there. Because while the Turing

(31:33):
test measures something about AI language processing, it doesn't necessarily
require the AI to demonstrate creative thinking or originality, and
so that leads us to the Loveless test, named after
Ada Loveless, who is the nineteenth century mathematician who's often
thought of as the world's first computer programmer. And she

(31:55):
once said quote, only when computers originate things should be
believed to have minds. So the Loveless test was proposed
in two thousand and one, and this test focuses on
the creative capabilities of AI systems. So to pass the
Loveless test, a machine has to create an original work,

(32:17):
such as a piece of art or a novel that
it was not explicitly designed to produce. This test aims
to assess whether AI systems can exhibit creativity and autonomy,
which are key aspects of what we think about with consciousness.
And the idea is that true sentience involves creative and
original thinking, not just the ability to follow pre programmed

(32:41):
rules or algorithms. And I'll just note that over a
decade ago, the scientist A. Mark Rydel proposed the loveless
two point zero test, which gets the human evaluator to
specify the constraints that will make the output novel and surprising.
So the example that l used in his paper is, quote,
create a story in which a boy falls in love

(33:03):
with a girl, Aliens abduct the boy, and the girl
saves the world with the help of a talking cat.
But we now know that this is totally trivial for chat,
GPTE or BARD or any large language model. And I
think this tells us that these sorts of games with
making conversation or making text or art are insufficient to

(33:25):
actually assess intelligence. Why because it's not so hard to
mix things up to make them seem original and intelligent
when it's really just doing a mashup. So I want
to turn to another test that I think is more
powerful than the Turing test of the Loveless test, and
probably easier to judge, and that is this, if a

(33:47):
system is truly intelligent, it should be able to do
scientific discovery. A version of the scientific discovery test was
first proposed by a scientist named Shao cheng Xiang a
few years ago, and he pointed out that the most
important thing that humans do is make scientific discoveries, and

(34:10):
the day our AI can make real discoveries is the
day they become as smart as we are. Now. I
want to propose an important change to this test, and
then I think we'll be getting somewhere. So here's the

(34:38):
scenario I'm envisioning. Let's say that I ask Ai some question,
a question in the biomedical space about what kind of
drug would be best suited to bind to this receptor
and trigger a cascade that causes a particular gene to
get suppressed. Okay, So imagine that I ask that to
chat GPT and it tells me some mind blowing, amazing

(35:01):
clever answer, one that had previously not been known, something
that's never been known by scientists before. We would assume
naturally that it has done some extraordinary scientific reasoning, but
that won't necessarily be the reason that it passes. Instead,
it might pass simply because it's more well read than

(35:23):
I am, or than any other human on the planet
by literally millions of times. So the way to think
about this is to picture a typical giant biomedical library,
where there's some fact stored at a paper and a
journal over here on this shelf in this book, and
there's another seemingly dissociated fact over on this shelf seven

(35:46):
stacks away, and there's a third fact all the way
on the other side of the library, on the bottom shelf,
in a book from nineteen seventy nine. And it's almost
infinitesimally unlikely that any human could even hope to have
read one one millionth of the biomedical literature, and really
really unlikely that she would be able to catch those

(36:08):
three facts and hold them in mind at the same time.
But this is trivial, of course, for a large language
model with hundreds of billions of nodes. So I think
that we will see new science getting done by CHATGPT,
not because it is conceptualizing, not because it's doing human
like reasoning, but because it doesn't know that these are

(36:32):
disparate facts spread around the library. It simply knows these
as three facts that seem to fit together. And so
with the right sort of questions, we might find that
sometimes AI generates something amazing and it seems to pass
the scientific discovery test. So this is going to be
incredibly useful for science. And I've never been able to

(36:53):
escape the feeling as I sift through Google scholar and
the thousands of papers published each month that have something
could hold all the knowledge and mind at once, each
page in every journal, and every gene in the genome,
and all the pages about chemistry and physics and mathematical
techniques and astrophysics and so on. Then you'd have lots

(37:15):
of puzzle pieces that could potentially make lots of connections.
And you know this might lead to the retirement of
many scientists, or at minimum lead to a better use
of our time. There's a depressing sense in which each scientist,
each one of us, finds little pieces of the puzzle,
and in the twinkling of a single human lifetime, a

(37:37):
busy scientist might collect up a handful of different puzzle pieces.
The most voracious reader, the most assiduous worker, the most
creative synthesizer of ideas, can only hope to collect a
small number of puzzle pieces and pray that some of
them might fit together. So this is going to be
massively important. But I wanted to find two categories of

(38:03):
scientific discovery. The first is what I just described, which
is science where things that already exist in literature can
be pieced together. And let's call that level one discovery.
And these large language models will be awesome at level
one because they've read every paper and they have a
perfect memory. But I want to distinguish a second level
of scientific discovery, and this is the one I'm interested in.

(38:26):
I'll call this level two, and that is science that
requires conceptualization to get to the next step, not just
remixing what's already there. Conceptualization like when the young Albert
Einstein imagined something that he had never seen before. He
asked himself, what would it be like if I could
catch up with a beam of light and write it

(38:49):
like a surfer riding a wave. And this is how
he derived this special theory of relativity. This isn't something
he looked up and found three facts that clicked. Again,
he imagined he asked new questions. He tried out a
new model of the world, one in which time runs
differently depending on how fast you're going, and then he

(39:11):
worked backwards to see if that model could work. Or
consider when Charles Darwin thought about the species that he
saw around him, and he imagined all the species that
he didn't see but who might have existed, and he
was able to put together a new mental model in
which most species don't make it and we only see

(39:33):
those whose mutations cause survival advantages or reproductive advantages. These
weren't facts that he just collected from some papers. He
was trying out a new model of the world. Now
this kind of science isn't just for the big giant stuff.
Most meaningful science is actually driven by this kind of

(39:54):
imagination of new models. Just as one example, I recently
in an episode about whether time runs in slow motion
when you're in fear for your life. And so when
I wondered about this question, I realized there were two
hypotheses that might explain it, and I thought up an
experiment to discriminate those two hypotheses. And then we built

(40:17):
a wristband that flashes information at a particular speed and
had people wear, and we dropped them from one hundred
and fifty foot tall tower into a net below. A
large language model presumably couldn't do that because it's just
playing statistical word games. And unless someone had thought of
that experiment and written it down, JATGPT would never say, Okay,

(40:40):
here's a new framework, and how we can design an
experiment to put this to the test. So this is
what I wanted to find as the most meaningful test
for a human level of intelligence. When AI can do
science in this way, generating new ideas and frameworks, not
just clicking act together, then we will have matched human intelligence.

(41:08):
And I just want to take one more angle on
this to make the picture clear. The way a scientist
reads a journal paper is not simply by correlating words
and extracting keywords, although that might be part of it,
but also by realizing what was not said. Why did
the authors cut off the x axis here at thirty

(41:28):
What if they had extended this graph, would the line
have reversed in its trend? And why didn't the authors
mention the hypothesis of Smith at all? And does that
graph look too perfect? You know? One of my mentors,
Francis Krik operated under the assumption that he should disbelieve
twenty five percent of what he read in the literature.

(41:49):
Is this because of fraud or error, or statistical fluctuations
or manipulation or the waste basket effect? Who cares? The
bottom line is that the literature is rife with errors,
and depending on the field, some estimates put the ireproducibility
at fifty percent. So when scientists read papers they know this,

(42:11):
just as Francis Crick did. They read in an entirely
different manner than Google Translate or Watson or chat GPT
or any of the correlational methods they extrapolate. They read
the paper and wonder about other possibilities. They chew on
what's missing. They envision the next step. They think of

(42:32):
the next experiment that could confirm or disconfirm the hypotheses
and the frameworks in the paper. To my mind, the
meaningful goal of AI is not going to be found
in number crunching and looking for facts that click together.
It's going to often be something else. It's going to
require an AI that learns how humans think, how they behave,

(42:56):
what they don't say, what they didn't think of, what
they misthought about, what they should think about. And one
more thing, I should note that these different levels I've outlined,
from fitting facts together versus imagining new world models, they're
probably gonna end up with blurry boundaries. So maybe chat
GPT will come up with something, and you won't always

(43:20):
know whether it's piecing together a few disparate pieces in
the literature what I'm calling level one, or whether it's
come up with something that is truly a new world
model that's not a simple clicking together but a genuine
process of generating a new framework to explain the data.

(43:40):
So distinguishing the levels of discovery is probably not going
to be an easy task with a bright line between them,
but I think it will clarify some things to make
this distinction. And last thing, I don't necessarily know that
there's something magical and ineffable about the way that humans
do this. Presumably we're running algorithms too, it's just that

(44:03):
they're running on self configuring wetwear. I have seen tens
of thousands of science experiments in my career, so I
know the process of asking a question and figuring out
what we'll put it to the test. So we may
get to level two and it may be sooner than
we expect, but I just want to be clear that
right now we have not figured out the human algorithms.

(44:25):
So the current version of AI, as massively impressive as
it is, does not do level two scientific problem solving.
And that's when we're going to know that we've crossed
a new kind of line into a machine that is
truly intelligent. So let's wrap up. At least for now.

(44:45):
Humans still have to do the science, by which I
mean the conceptual work, wherein we take a framework for
understanding the world and we rethink it and we mentally
simulate whether a new model of the world could explain
the observed data, and we come up with a way
to test that new model. It's not just searching for facts.
So I'm definitely not saying we won't get to the

(45:07):
next level where AI can conceptualize things and predict forward
and build new knowledge. This might be a week from now,
or it might be a century from now. Who knows
how hard a problem that's going to turn out to be.
But I want us to be clear eyed on where
we are right now, because sometimes in the blindingly impressive
light of what current AI is doing, it can be

(45:29):
difficult to see, what's missing and where we might be heading.
That's all for this week. To find out more and
to share your thoughts, head over to eagleman dot com
slash Podcasts, and you can also watch full episodes of
Inner Cosmos on YouTube. Subscribe to my channel so you

(45:51):
can follow along each week for new updates. I'd love
to hear your questions, so please send those to podcast
at eagleman dot com and I will do a special
episode where I answer questions. Until next time. I'm David
Eagelman and this is Inner Cosmos
Advertise With Us

Popular Podcasts

Dateline NBC
Stuff You Should Know

Stuff You Should Know

If you've ever wanted to know about champagne, satanism, the Stonewall Uprising, chaos theory, LSD, El Nino, true crime and Rosa Parks, then look no further. Josh and Chuck have you covered.

The Nikki Glaser Podcast

The Nikki Glaser Podcast

Every week comedian and infamous roaster Nikki Glaser provides a fun, fast-paced, and brutally honest look into current pop-culture and her own personal life.

Music, radio and podcasts, all free. Listen online or download the iHeart App.

Connect

© 2024 iHeartMedia, Inc.