Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Speaker 1 (00:05):
AI seems like it burst out of the gate a
few years ago, But is it actually the latest chapter
in a three hundred year trajectory to turn thought into math?
Speaker 2 (00:16):
Can the mind be captured with equations?
Speaker 1 (00:19):
Why do current AI models need petabytes of data but
a child can learn from just a few examples. Why
does AI have jagged intelligence, meaning it looks brilliant in
one moment and then it does something totally nonsensical. In physics,
we have various laws, like the law of gravity or
the laws of motion, And today we're joined by cognitive
(00:43):
scientist Tom Griffiths from Princeton to talk about whether we
are moving towards nailing down laws of thought. Welcome to
Inner Cosmos with me David Eagleman. I'm a neuroscientist and
author at Stanford, and in these episodes we sail deeply
into our three pound universe to understand why and how.
Speaker 2 (01:05):
Our lives look the way they do.
Speaker 1 (01:23):
One thing that distinguishes Homo sapiens from all our cousins
in the animal kingdom is that we watch the world
around us and we try to abstract patterns from it.
For example, you might watch the way that a stone
falls to the ground and maybe you see a tree
branch fall, and maybe you see a glacier and one
(01:44):
day a huge wall of ice falls off it, and
pretty soon you start seeing an underlying similarity to the
way that things move. And eventually someone very very smart
comes along, like Isaac Newton and summarizes all this in
the law of gravity. And then the same smart guy,
Newton comes up with the three laws of motion. And
(02:05):
then another smart person is Einstein. He figures out the
conservation of mass and energy, which seems to be another
ironclad law, and then we have the laws of thermodynamics
and electrostatic laws, and all of this speaks to the
great success that we've had as the species in figuring
out the lowest level of code that's running in the universe.
(02:28):
But for most of human history, the concept of a
thought has felt like the most intimate thing we experience
and the least tractable thing to study.
Speaker 2 (02:39):
What a thought is and how it occurs.
Speaker 1 (02:42):
That seems to live in a different category of mystery
from how an object falls. Why, Well, it's because the
thought pops into your head and somehow it carries memory
and expectation and language and often a feeling. But it
feels vaporous and private. It feels like the one thing
(03:02):
that will forever escape formal description. But what's interesting is
that for centuries people have tried, there's always been a
deep human urge.
Speaker 2 (03:13):
To ask whether thought has laws to it?
Speaker 1 (03:16):
In other words, does the mind have principles that you
can write down? Does reasoning have a grammar to it?
Can you describe intelligence in a language that's precise enough
that once you understand the rules, you can begin to
build with them, like build artificial intelligence. Most of us
are old enough to remember that this question of AI
(03:38):
once lived in philosophy seminars and math departments, but now
it's sitting at the center of our economy.
Speaker 2 (03:46):
Okay, So what is thought?
Speaker 1 (03:48):
Can we capture it in formal systems like laws or equations?
Do different parts of intelligence come from logic, from learning,
from uncertainty, from memory, from prior knowledge, from living inside bodies,
from living inside our cultures?
Speaker 2 (04:06):
From the particular.
Speaker 1 (04:07):
Constraints of being a human animal with a short lifespan
and limited bandwidth. Our guest today is someone who lives
right at the intersection of all these questions. Tom Griffiths
is a professor at Princeton, where He directs the Computational
Cognitive Science Lab and the Princeton Laboratory for Artificial Intelligence.
(04:28):
He has spent years asking how minds work through the
different lenses of math and computation and learning. And he's
the author of a wonderful new book called The Laws
of Thought, which traces the long history of thinkers asking
are their rules to this? Can we understand what human
thinking is? In his book we get the lengthy arc
(04:51):
of minds trying to understand mind. This begins millennia ago
with Aristotle, who wondered whether logic itself could be math matized,
and Tom follows the trail through the architects of symbolic reasoning,
through the birth of computation, through the rise of neural networks,
through the realization that probability theory might serve as a
(05:13):
language for our beliefs about things. Along the way, in
his book, a picture emerges that there may not be
just a single tool for capturing their mind, but instead
there are different ways of trying to tackle the problem,
and each one sheds light on a different aspect of cognition.
So we're going to talk about ourselves human minds, and
(05:36):
we'll talk about AI what kind of intelligence is this
and what is missing? Here's my interview with Tom Griffiths.
Speaker 3 (05:48):
As soon as you turn thought into math, it becomes
something that machines would be able to do. And so
our modern AI systems are really a consequence of, you know,
that thought that people were having hundreds of years ago,
of being able to turn thought into something that can
be expressed in mathematical terms.
Speaker 1 (06:05):
And so one of the things that I loved about
your book, by the way, is that you really tell
stories of all the thinkers.
Speaker 2 (06:12):
You dive into the lives, you tell them with real color.
Speaker 1 (06:15):
If you were going to start with one thinker that
you think is the most important, who would that be.
Speaker 3 (06:19):
There are a couple of people who have this sort
of enduring influence throughout the book. One of them is Leibnitz,
who kind of started this enterprise in some sense. He
was really trying to take the idea of logic as
expressed by Aristotle and turn it into math, but ultimately
failed in doing that. But along the way he also
discovered the calculus, which turned out to be really important
when people wanted to make neural networks that could learn
(06:41):
from data. It turns out that the trick for doing
that is actually a trick that lad had figured out
all that time ago. And then another key figure here,
as might be suggested by the title of the book,
is George Bull, who was a nineteenth century mathematician. He
was a school teacher for most of his life and
did a lot of like serious math on the side
(07:01):
instead of you know, had a big effect on the
history of mathematics. But he was really the person who
then first solved that problem that Leibnitz had posed. And
in addition to the impact that that work had, he's
also the great grandfather of Jeff Hinton, who was one
of the people who played an important role in developing
these algorithms for learning from your own networks. And so
(07:24):
you could make an argument that without Boole we would
be a fair way back from where we are today.
Speaker 2 (07:29):
You know.
Speaker 1 (07:30):
Interestingly, when most people think about Boole, they only know
about Boolean numbers. They know about zero and one binary numbers,
and that's essentially the extent of the think. But he
was quite celebrated in his life right even though he
was a headmaster and not formally involved as a professor.
Am I correct about this? He nonetheless was quite recognized
(07:51):
as a mathematician.
Speaker 3 (07:52):
Yeah, he became a university professor later in his life,
but spent most of his life as a teacher and
a head master. But yeah, he won a gold medal
in mathematics from the Royal Society. Was a very prestigious award,
and you know, was this amazing person who was having
these high level correspondences with the leading mathematicians of the
day while holding down his job running a small school.
Speaker 2 (08:16):
Yeah.
Speaker 1 (08:17):
Now, in the book, you essentially use three different frameworks.
What phenomenon does each framework explain?
Speaker 2 (08:26):
Unusually Well, the.
Speaker 3 (08:27):
Three frameworks I talk about in the book are what
I call rules and symbols, which is what we've been
talking about. This kind of like approach that stems out
of logic, where the idea is that you're going to
be able to write down some rules that characterize the
structure of thought, and by following those rules, you end
up with interesting consequences. The second approach is networks, features
and spaces.
Speaker 4 (08:47):
Right.
Speaker 3 (08:48):
This is neural networks, which you can kind of think
about as a system for doing computation when you start
representing things as points in a space. Right, So if
you start to think about you know, every object that
you could see in the world is not being something
that's described by rules, but being described by a location
along some dimensions. You need to have a way of
(09:08):
talking about how to map between those spaces and your
all network solve that problem. And then the third is
probability and statistics. And probability theory is really powerful because
it is the complement to logic, where logic tells us
how to go from things that we know to be
true to other things that we're equally certain or true.
(09:29):
Probability theory tells us what to do when we're uncertain.
So if we get some information we want to draw
a conclusion, but we're not able to draw that conclusion
with perfect certainty, Probability theory tells us how to do that,
and it tells us how to combine our sort of
background beliefs, the other sources of information we have our
biases in with the data that we see in a
(09:52):
way that helps us to explain how it's possible to
learn from small amounts of data. And that's one thing
which is still something that discriminates human learning from the
learning that's done by AI systems today.
Speaker 1 (10:01):
Okay, great, so we're going to dive into each of
these three lenses. But just before we do, do you
see the AI conversation today over indexing on one of
these lenses over the others.
Speaker 3 (10:14):
I think there's a lot of emphasis on neural networks,
which are fundamentally the sort of engineering technology which is
making possible the creation of our chatbots and the other
sort of big AI systems that are deployed. I think
that potentially misses out the importance of these other threads
right where. One thing that's important to remember is that
(10:37):
those neural networks are being trained on what is essentially
a system of rules and symbols. They're being trained on
human language, which is symbolic and rule like in various ways,
and they're being trained on code, which is even more
symbolic and even more rule like, And those things together
provide some of the substrate for developing the kind of
intelligence that they demonstrate. And then the way that they're
(10:59):
trained is by learning to predict the next token, right,
the next word or part of word, based on what
they've seen so far. And that way of training them
is actually using probability theory. So that's a probabilistic problem
because you're making a guess about what the next thing
is going to be based on the things that you see,
and so that's an important ingredient in their success as well,
is that they're essentially learning to approximate a big probability distribution.
Speaker 1 (11:22):
So let's dive into the first one, rules and symbols.
So take us back to the original urge. Why did
early thinkers believe that this could be used to explain thinking.
Speaker 3 (11:35):
I think a lot of the draw of rules and
symbols was that that really was, in some way what
mathematics was to people, right, So Leibniz, part of the
reason why he wasn't able to solve this problem of
figuring out how to turn thought into math is that
what he thought math was, or the kind of math
that he was trying to use to solve that problem,
(11:56):
was really arithmetic, right, And arithmetic was kind of like
the model that they had for a mathematical system. So
you can think about ideas being added together or subtracting
one idea from another, and really thinking about the operators
that you're using as being the things that are sort
of coming from this familiar mathematical language.
Speaker 4 (12:11):
And so I think part of.
Speaker 3 (12:13):
The reason that we end up with that approach is
because of the kind of math that has been successful
in other settings, right where we need to do arithmetic
to you know, that's a good description of certain.
Speaker 4 (12:25):
Kinds of things that human minds do.
Speaker 3 (12:27):
Google had the insight that you needed a different kind
of algebra in order to describe thought, and then that's
what leads to modern mathematical logic. But it's still in
this kind of symbolic language, although Gooole also talked about
probability theory as being important for capturing languages as well.
So I think it's really more about what are the
kinds of mathematical systems that it was sort of straightforward
(12:48):
to formalize, and that gave us something that we could
try to map thought onto. And that's what we do
as scientists is often taking mathematical systems that mathematicians have
defined for us and then saying, oh, I think this
mathematical system maps onto the thing that I want to understand,
and so trying to establish that correspondence and not just
then allow us to derive its consequences.
Speaker 1 (13:09):
So speaking of rules and symbols, So thinkers like Newl
and Simon, they popularize this idea of goals and sub goals.
What did that viewpoint get exactly right about human problem solving.
Speaker 3 (13:23):
So now we're fast forwarding a bit right from we
have Boule figuring out the structure of logic. That turns
into you know, lots of people then sort of turn
that into a sort of mature theory of logic. You
get aalenteering kind of turning this into a theory of computation,
thinking about what an abstract mathematician is doing when they're
doing something like logic, and thinking about how you can
(13:43):
make a machine do that. And then we have people
starting to realize that, you know, as digital computers are
being developed, maybe those provide a good model for how
thinking works in general, and then trying to use a
computer as a sort of foundation for you know, thinking
about things like how people might solve problems. And so
(14:06):
Alan Ewele and Herbert Simon were influential cognitive scientists who
did exactly that. They had this idea that maybe there
is a way that you could make computers smarter by
using insights from human cognition, but also get a better
understanding of what humans are doing when they're solving problems
by using the sort of ideas that come from things
like computer programming, and so they set up you know this,
(14:29):
you know, when we're trying to solve a problem or
prove a mathematical theorem or play a game of chess,
they set this up as a problem of searching through
a tree of possibilities, where what you're doing is making choices,
and then each of those choices gives you a new
set of choices, and each of those choices gives you
a new set of choices, and the hard thing is
finding a path through those choices that leads to the
point that you want to end up at. And so
(14:51):
that's something where you can take inspiration from how human
mathematicians solve problems. You can take inspiration from the kind
of you know, tricks like working backwards from the end
towards the start. Right, Those were principles that they were
able to use to try and explain these aspects of
human cognition as well as making the machines work better.
Speaker 1 (15:08):
Okay, but then one of the things that happened is
that at least one of these attempts had ballooned into
twenty five million rules. And so what does that teach
us about the shape of human intelligence.
Speaker 4 (15:21):
This rules and symbols enterprise.
Speaker 2 (15:24):
Right.
Speaker 4 (15:24):
The sort of appeal that this had was that maybe
one day.
Speaker 3 (15:27):
You could just write down all of the rules that
you need to write down, and then you've characterized how
intelligence works. Right, So it's just a matter of getting
enough rules in a way that's very reminiscent today, right
of you know, the way that our modern AI systems
are being made is by training them on more and
more data, right, feeding in more and more language. There
was a hope that you could just like, yeah, like
(15:48):
document all of the rules that you need to capture
the structure of human knowledge. And so that led to
you know, companies being started to try and engage in
that enterprise, ultimately I would say, unsuccessfully, but giving us
some kind of characterization of like particular subsets of human knowledge.
And so I think the thing that came out of
(16:08):
that enterprise was revealing that maybe you need something more
than just rules, right, that maybe thinking about logic as
a basis for our model of intelligence was missing something.
It's an approach that worked really well for certain kinds
of problems like doing arithmetic, playing games or chess, but
it didn't work very well for other kinds of problems
like figuring out what you're seeing in the world, or
(16:31):
actually learning language or these other kinds of things.
Speaker 1 (16:34):
And so this is what leads to your second lens,
which is neural networks. And you talk about these as
having you know, a boom and bust history. So, first,
what happened in the last decade that allowed them to
turn into the dominant paradigm.
Speaker 3 (16:49):
The big breakthrough in the last decade was really about
being able to make bigger in neural networks that could
be trained on more data in a way that could scale, right,
and so bigger here means what these are. An artificial
neural network is a set of units that are communicating
with one another. They're communicating along weighted connections, a sort
(17:13):
of you know, imagine like how neurons are connected in
your brain, and those neurons are connected to one another
and sending each other signals. An artificial neural network is
basically simulating that kind of structure inside a computer. And
so for a long time, the sort of the history
of neural networks has been one of people figuring out
how to make bigger neural networks work. So the very
(17:33):
first you know, learning neural networks. They had a learning
algorithm that worked for one layer of weights, and then
there was a breakthrough in the nineteen eighties that meant,
now you had a learning algorithm that could work for
multiple layers of weights, but it didn't work for very
deep neural networks with lots of layers of net weights
because it I can sort of explain the technical reasons
behind it, but you know, sort of like the basic
(17:54):
algorithm didn't quite work. And so the big breakthroughs of
the last you know, ten to fifteen years have been
about coming up with ways to take those algorithms and
actually make them work for neural networks that are bigger
and bigger and deeper and deeper, that are able to
easily learn more complex functions and can do so from
massive amounts of data in a way that means that
(18:14):
they're able to discover sort of complex relationships between things
that are necessary to produce intelligent behavior.
Speaker 1 (18:20):
And so, what are these neural networks capture about cognition
that symbols missed, especially in terms of things like similarity
and fuzziness and graded concepts.
Speaker 3 (18:32):
Fuzziness is a really good way of describing it. It's
that you know, if you ask somebody, you know, whether
something is a piece of furniture, they're going to say,
you know, if you show them a chair, they'll say, yes,
definitely a piece of furniture. If you show them a rug,
they'll say, yeah, maybe a piece of furniture. Right, it
doesn't sort of fit with our you know, week sort
(18:52):
of have a prototypical idea of what furniture is, which
contains things like chairs and tables and ottomans and these
other kinds of things, and then rugs and treadmills, and you.
Speaker 4 (19:02):
Know, like these are things that maybe.
Speaker 3 (19:03):
You're in this category, but maybe an't right. And so
we need to have a way of thinking about concepts
that's not just the sort of yes or no, true
or false one or zero that logic would give us.
We need to have something which has that fuzziness in it.
One way of getting fuzziness is by thinking about a
concept in terms of points in space, right where you
could think chairs are here in one location, rugs are
(19:26):
here in another location, and maybe what it is to
be a piece of furniture is to just be in
some part of that space, and how close you are
to that part of the space is like how good
you are as an example of that kind of furniture.
And so as soon as you think in those terms,
you have a new problem, which is with our rules
and symbols. We knew how to do computation, we knew
how to describe thinking. Thinking was a matter of applying
(19:47):
the rules and seat of you know, repeating that process.
But we don't have a way of doing computation in spaces.
And that's what youral networks give us. So you can
kind of think about a space corresponding to the activation
of the units inside this neural network.
Speaker 4 (20:00):
How much you know, how much input.
Speaker 3 (20:02):
Each neural unit in that neural network is receiving, and
how much of a response it's making that characterizes some
kind of space. And then neural network gives us a
way of mapping from the inputs that it's getting to
some output.
Speaker 4 (20:14):
So you could put in you know.
Speaker 3 (20:16):
Your picture of a chair, and it maps that to
some point in space, and then it put sort of
produces out an output the corresponds to, yes, this is
a piece of furniture. And because those outputs can now
be continuous values, you can capture the fuzziness and other
kinds of things that you want for your concepts.
Speaker 1 (20:31):
And so, in what sense are these modern systems, these
artificial neural networks learning, and in what sense are they
doing something that's maybe categorically different from how children learn.
Speaker 3 (20:44):
This is a fundamental question, right, That's the kind of
thing that we cognitive scientists think about a lot, and
I think that AI researchers are starting to care about
a lot too, which is, you know, what are these
sort of meaningful differences between human minds, human brains and
what we building in these AI systems or these sort
of artificial brains. I think one very salient difference is
(21:07):
the amount of data which is needed for a human
to learn language compared to the amount of data you
need to put into on neural network. So if you
take a system like chat GPT, right, one of these chatbots,
those systems are trained on the equivalent of something like
five thousand to fifty thousand years of continuous speech. There's
sort of massive amounts of data that are going into
that system. So it's like on the order of a
(21:28):
thousand or ten thousand times as much data as a
human child might get in order to learn language. And
the reason is that those artificial neural networks are really
kind of like undifferentiated learning machines. You can take that
same kind of neural network, you can get it to
learn all sorts of different kinds of things. It works
really well for learning language, but you can use it
to learn something about vision or something. You know, you
can sort of take all sorts of problems and give
(21:49):
it to them and it can learn how to do that.
And so as a consequence, they have what we call
in cognitive science machine learning inductive biases. They're not biased
to towards any particular solution to the learning problem, and
human brains have stronger inductive biases for things like learning language. Right,
(22:09):
we're sort of disposed towards certain kinds of things, which
are human languages. The things that we call human languages
are the things that we're disposed to learn. And as
a consequence, you know, we're able to sort of narrow
down the space of possibilities in a way that means
that we're able to learn from less data.
Speaker 1 (22:36):
Okay, this makes a great segue to the third lens.
So you talked about rules and symbols, and you talked
about artificial neural networks. The third part of your book
is about probabilities and statistics. So why did probability become
an attractive candidate language for thinking about cognition?
Speaker 3 (22:56):
Probability there is a good way of answering certain kinds
of why questions that we might have, right so, and
the reason is that it's a way of characterizing how
a rational agent should make an inference. So all the
way back in the eighteenth century, British nonconformist minister, the
Reverend Thomas Bays had this radical idea that you could
(23:17):
talk about, you know, again, like take a mathematical system
probability theory, which we would use for describing what happens
when you roll dice or flip coins right, sort of
you know, sort of language of gambling and saying, oh,
in fact, that mathematical system might also be a really
good system for describing how beliefs work. And so what
he was interested in was if you think about a
belief as you know, a degree of belief, right, you
(23:38):
can say, oh, I think it's going to rain tomorrow,
and I'll put put that on a scale which goes
from zero to one, where zero is you know, not
going to rain, and one is one hundred percent it's
going to rain.
Speaker 4 (23:47):
Tomorrow.
Speaker 3 (23:48):
Right, That is a belief that you've expressed in the
form of a probability. And now if you, you know,
wake up in the morning and look out the window
and you see gray storm clouds, you've got a new
piece of data. You need to revise your beliefs, and
probability theory actually tells you how to do that. It says,
you know, for each hypothesis, right, so our hypotheses here
(24:09):
are it's going.
Speaker 4 (24:10):
To rain or it's not going to rain.
Speaker 3 (24:12):
Right, You're going to modify that belief based on how
likely the data is that you saw if that hypothesis
were true. So, because gray storm clouds are more likely
if it's going to rain that day, we should increase
our belief that it's going to rain. And as a consequence, well,
we'll end up with a number that's a little bit
higher than the number we had before.
Speaker 4 (24:30):
And probability theory tells us how to do that.
Speaker 3 (24:32):
There's a principle of probability theory called Bays rule after
the Reverend Thomas Beys that tells you how to take
your original beliefs and then turn them into the beliefs
that you get after seeing data. And that turns out
to be exactly the tool that you need to answer
these kinds of questions about how inductive biases work. Right, So,
how is it that children are able to learn from
(24:53):
less data than anural networks. Well, it's a consequence of
you know, these things that we can describe using different
probabilities being assigned to different hypotheses, whether hypotheses correspond to
the structure of the languages that are being learned.
Speaker 1 (25:05):
And when people call humans irrational, what changes if we
look at mistakes as resource limited inferences.
Speaker 3 (25:17):
This is one of the things that I explore in
my own research is this question of how we should
actually think about rationality for real agents, right, And this
is relevant if you're building an AI system or if
you're just trying to understand human behavior. So I think,
like I said, probability theory gives us a characterization of
what you should do as an ideal rational agent, But
(25:38):
that assumes that you have infinite computational resources. We mirror,
humans don't have infinite computational resources, nor do our AI systems,
And so you can ask what should a rational agent
do if they don't have all of the computational resources
that you might need and then out of that you
get the answer is that you know, you should follow
an algorithm, follow us strategy makes sense given the resources
(26:02):
that you have. That's what it means to be rational
in those circumstances where you're sort of doing the best
job you can of approximating probabilistic inference given those resource constraints.
And so some of the things that people do when
people do weird things, and we do lots of weird things,
and we don't always follow probability theory, things that we
can understand as us, you know, running into those resource
limitations and then coming up with, you know, reasonable strategies
(26:25):
for trying to approximate the right answer.
Speaker 2 (26:28):
So, if we look.
Speaker 1 (26:28):
At probability being the grammar of uncertainty, one thing we
know is that our prior expectations matter. And one of
the things I've been obsessed with and doing a lot
of research on and talking a lot about on the
podcast is the way that all of us drop into
the world and our cultures influence us and our language
and our moment in time and our neighborhood, and this
(26:51):
leads to people being quite different on the inside. Is
this something that you think about sometimes about how we
develop our priors differently based on you know what where
we grow up.
Speaker 3 (27:05):
Yeah, So priors is that Daysian language, right, for talking
about the beliefs that you have before you get data
that you then update into what we call posterior probabilities
that are informed by those data. And so yeah, I
spend a lot of my research time thinking about these
kinds of questions of you know what are these sort
of prior distributions for humans? How do we acquire good
(27:28):
prior distributions for solving different kinds of problems. One thing
there is that calling it a prior makes it sound
like maybe it's something you're born with, but in fact,
it just means.
Speaker 4 (27:37):
It's before you get data.
Speaker 3 (27:39):
Right, So when you're seeing that storm cloud in the morning,
you had a prior probability from last night, and then
that prior probability was informed by everything else that you know, Right,
The priors are all of the biases and knowledge that
we bring to bear when we're trying to make an difference.
And so yeah, I think I think understanding that is
a big part of the project of understanding human cognition.
Speaker 2 (27:57):
So let's zoom the camera out.
Speaker 1 (27:59):
We've talked about the three lenses that you describe in
the book. Now, you also point out that we have
a lot of constraints like finite lives in limited compute,
and limited bandwidth, and so how do these constraints sculpt
human intelligence?
Speaker 3 (28:13):
I think this is really important to just thinking about
the moment that we're in where there's a lot of
anxiety around AI, right, And I think if you think
about intelligence as a kind of one dimensional quantity, you
can kind of imagine that you know, humans are somewhere,
our AI systems are somewhere.
Speaker 4 (28:28):
It seems like they're approaching us.
Speaker 3 (28:29):
Maybe they're going to overtake us, and then, oh my god,
what is going to happen when that happens, Right, We're
just going to become redundant. There's not going to be
any jobs. Everything is going to fall apart. And so
that's a consequence of having a particular conception of what
intelligence is, which is this kind of one dimensional way
of thinking about it. And I think there's a different
way of thinking about it which gives us a little
more flexibility and maybe a little more hope in the
way that we think about what's going to happen with AI,
(28:50):
and that is thinking about intelligence as being an adaptation
to the kinds of computational problems that a system has
either of or being trained to solve, right, And so
for human beings, those computational problems are shaped by the
constraints that we operate under. And a lot of those
constraints come from our biology, right that we, as you said,
(29:11):
have limited lives, have you know, limited compute resources what
we can carry around inside our heads, have limited bandwidth
for communication. We have to like make noises at each other,
or wiggle our fingers or you know, somehow use our
bodies to transfer data from one human mind to another
human mind. It's very inefficient. And so those constraints are
things that mean that human intelligence takes a particular shape,
(29:34):
which is we're able to learn from limited data because
we have to because we don't live that long. Right,
You can't rely on getting five thousand years of language
data or multiple human lifetimes of you know, chess playing
or whatever it is. Right, you have to be good
at using the resources that you have in ways that
are efficient. And so that's kind of like deciding what
(29:56):
to think about being able to recognize when a problem
has a structure that you I've seen before being able
to you know, sort of like become sort of automatic
in using certain kinds of patterns of thinking and strategies
for solving problems, really trying to make it as easy as.
Speaker 4 (30:11):
Possible for us to use the resources that we have.
Speaker 3 (30:13):
And then you need to develop capacities for trying to
circumvent those bandwidth constraints in order to be able to
do things that transcend what any individual human can do,
and that means developing things like language writing societies LLCs,
you know, all of the sort of libraries, right institutions,
(30:35):
all of the theory of mind, right for reasoning about
what someone else might be trying to communicate to you.
All of this stuff is actually sort of like human
stuff that's a consequence of these constraints. And so as
we make AI systems that are smarter, those AI systems
are in turn being shaped by what they're being trained
to do and what constraints they operate under. But those
(30:56):
constraints are different from the ones that humans have. They
can you know, get more data, they can get access
to more compute, they don't have bandwidth limitations.
Speaker 4 (31:04):
You can just copy.
Speaker 3 (31:05):
A you know, a state of an AI system across machines.
You can use the same data to train multiple AI systems,
and all of those things mean that I think, rather
than being sort of on one axis where we're sort
of thinking about better and worse, it's more that there
are many axes that we can think about intelligent systems
developing along, and we're just going to end up in
(31:25):
a state where we have human intelligence and we have AIS,
and they're going to be meaningfully different from one another,
rather than things that are sort of directly competing in
terms of the capacities that they have.
Speaker 2 (31:35):
Yeah, I agree with you on that.
Speaker 1 (31:37):
When you think about the way that humans beat machines
on data efficiency, what do you think that means is
missing architecturally from our AI systems.
Speaker 3 (31:48):
I think it's actually it's a great question, and the
way I would express it is not in terms of architecture.
So it's actually in terms of a different part of
a neural network. So when we think about this problem
of inductive bias, right, which is we're you know, what
a system is sort of disposed towards learning. As I said,
(32:09):
our neural networks, the way we normally set them up,
are pretty weak inductive biases. They can learn all sorts
of things. The inductive bias that a neural network has
it is constrained by its architecture, but it's also constrained
by where it starts out in the space of the
settings of all of those weights. And normally the default
is that you set up your neural networks so those
weights start out really small, close to zero, and then
(32:31):
they sort of grow away from that as it starts
to learn how to do things. We've had success in
taking neural networks that are architecturally identical but setting them
with different initial weights in order to create an inductive
bias that enables rapid learning. So we act to use
a technique called meta learning, which is a method from
machine learning where you take the same neural network architecture
(32:53):
and the same initial weights, and you use it to
learn to solve lots of different problems, like you can
use it to learn lots of different languages, say, from
limited data, and then you try and optimize the initial
weights of the neural network to make it so it
can learn all of those languages better using the same
kinds of algorithms we use for training the weights of
the neural network. When we have these giant data sets.
You can instead use those algorithms to train the initial
(33:15):
weights of a neural network for a small data set,
for lots of small data sets, And when you do that,
you end up with a neural network that has an
inductive bias that makes it possible to learn from small
amounts of data. And so that's the kind of thing
we've been exploring in my lab is can we find
a way of taking exactly these same neural network architectures
and just starting them out in a different place that
(33:36):
maybe aligns better with the kinds of things that humans do. Okay, well,
this is a really good segue to what I wanted
to ask you, which is, if you're looking at rules
and systems is one sort of math to describe the mind,
and artificial neural networks is another kind of math, and
probability is another. What does an optimal hybrid look like?
Given that no single, no one of these describes everything
(33:58):
about what's going on with minds, So what does the
hybrid for an aisystem look like?
Speaker 2 (34:04):
In twenty twenty six, the place.
Speaker 3 (34:06):
Where I end up in the book is saying that
these different kinds of math really do all fit together
in an interesting way, and in order to understand that
we can talk about different levels of analysis when we're
trying to make sense of an information processing system. So
the most abstract level, this is an idea that was
introduced by the computational neuroscientist David Marr. He said, the
(34:27):
most abstract level is just thinking about the problem that
the system is solving in its ideal solution, right. And
I think logic and symbolic systems and probability theory give
us a good way of characterizing the kinds of problems
that minds have to solve, right They you know, probabilistic
inference because we have to make these uncertain inferences. And
then logic as a way of characterizing the kinds of
(34:49):
things that are in the world that have this rich
structure of you know, like a sort of combinatorial structure
that you get from from having symbols and rules that
combine together with things like language and dance and all
of these you know, structured things. Even you know, if
you look at trees, you can see they have like
recursive structures that are expressed in them. Right, So these
kind of occur in nature and are important to be
able to understand. And then at the level below that
(35:11):
you have how the system solves those problems, right, like
what algorithms it might use, what representations it might use,
And then below that it's you know, how that's actually
implemented in some kind of physical system, right, and artificial
neural networks give us a kind of story at those
levels where we can think about them as being a
good general purpose system for learning to approximate the things
(35:35):
that probabilistic in front tells you to do, and learning
to approximate the structure that's contained within those symbolic systems.
So I actually think, you know, the kind of story
that we have right now that's emerged out of these
advances in AI is actually a pretty good story for
how we could think about human minds working. The thing
that's missing, most important thing that's missing is this kind
of aspect of inductive bias, right where we haven't been
(35:59):
able to capture what human inductive biases are like in
machines and so that you have these meaningful differences that come.
Speaker 4 (36:03):
Out of that.
Speaker 3 (36:05):
But it's not a bad place for thinking about how
these pieces might fit together to give us an explanation
for how it is that mind's work.
Speaker 1 (36:12):
So along these lines, which AI benchmarks feel to you misleading?
Speaker 2 (36:18):
And how would you make better benchmarks?
Speaker 3 (36:22):
So, in general, I'm not a huge fan of benchmarks,
because I think benchmarks are useful as an engineering tool,
but I, as a cognitive scientist, don't just want to
know how well something is doing something. I want to
know how it's doing that thing right and how it
might be sort of messing that up right. So when
we are designing experiments as cognitive scientists, we don't just say, oh,
(36:46):
here's one hundred math problems. Go do with one hundred
math problems and we'll get a score. We say, let's
choose a set of math problems so that which answers
people give us tell us about the misconceptions that they
have in a way that we can then diagnose, oh,
you know, this is why this person is thinking this
particular thing. And so I think there's lots of room
for coming up with better ways of evaluating our AI
(37:06):
systems that look more like cognitive science experiments. We're really
targeting understanding what's going on rather than just trying to
get some brute sort of you know, performance score.
Speaker 1 (37:16):
Okay, good, And you have talked about curiosity as a
computational problem. So how do you think about what curiosity
is and how we might measure real curiosity in a machine?
Speaker 3 (37:30):
What problem is curiosity trying to solve? Yeah, this is
this is a good question. You can you can ask
this kind of question that we call rational analysis. Right,
if you have a system that's solving a problem, you
know what's what's the problem?
Speaker 4 (37:44):
What's the ideal solution?
Speaker 3 (37:44):
Okay, So for curiosity, we've argued and this is work
with wretched debate.
Speaker 4 (37:49):
Who is that UCLA That.
Speaker 3 (37:53):
One way I think about curiosity is that you're trying
to find things that are good in increasing your long
run probability of being able to solve problems in the future. Right,
So you know, it's sort of like you want data
which for which the derivative of your total knowledge is
(38:13):
high relative to that particular data point. And so that
explanation captures some of the things that happen in human cognition,
where you know, in some circumstances, we're interested in the
newest thing, something we've never seen before, Right, But in
a lot of circumstances, those things aren't the things that
grab our attention. It's more things that maybe we've seen
(38:34):
a few times, and you know, we just sort of
noticed that they're starting to occur. If something happens once,
you're just say okay, that was weird, and you sort
of dismiss it. But when something happens a few times
and it's unfamiliar to you, you say, okay, maybe I
need to figure that out. Right, And something that happens
all the time, you're not that curious about. That's just
the thing that happens all the time. And you can
explain that by thinking about this sort of derivative, right,
(38:55):
where if something just happens once, you shouldn't be interested
in it. Because it just happened once, it's probably never
going to happen again. If something happens a few times,
that's a clue that it's probably going to happen again
in the future, and you've not seen it enough to
actually know what's going on, And so paying attention to
that is good in terms of that derivative of your
future knowledge. And if something happens a lot, then it's
(39:16):
probably happened enough that you know something about what's going
on and it's not that interesting, right, And so so
that sort of sweet spot ends up being around the
things that are sort of like just happening to your
enough times that you're starting to realize, oh, this might be.
Speaker 4 (39:27):
A thing that I need to pay attention to.
Speaker 1 (39:42):
If you hadn't been on one capability that's going to
unlock a broader intelligence, unlock a jump to that.
Speaker 2 (39:50):
What's your candidate?
Speaker 3 (39:52):
I actually think the biggest obstacle at the moment is
more about generalizability of intelligence rather than any specific capacity, right.
And so people in the AI world talk about jagged intelligence, right,
the sort of phenomenon where you have an AI system
that can do something that's really smart and impress you,
and then five minutes later does something that's really dumb
on a problem that's like right next to it, and like,
(40:13):
if it's able to solve that first problem, it seems
obvious that it should be able to solve the second problem.
Speaker 4 (40:17):
And you're just like, what happened? You know, why did
it go wrong there?
Speaker 3 (40:20):
And so that lack of generalizability is also a consequence
of these kinds of inductive biases, right, So these human
inductive biases that steer us towards a solution and let
us learn from limited amounts of data, they constrain the
kinds of solutions that we find are The kinds of
solutions that we find are the ones that are sort
of like generalizable at least to us, right. They are
(40:40):
things that kind of make sense where if someone's able
to do one thing, they'll be able to do the
other thing. And because the AI systems are approaching these
problems just in a completely different way from a different
starting point and then getting tons of data that's allowing
them to sort of approximate what the human solutions are.
But they're coming at it from another angle. That's the
thing that makes them jagged. It's not that they don't
have sort of these same compatible inducted bis is that
we have that are informed by having evolved in certain
(41:03):
environments and having had experience of the world, and you know,
all of these other things that are part of what
it means to you know, sort of learn anything as
a human being. And so because they are coming with
this different set of inductive biases, they're very influenced by
their training data, they end up doing things that are
sort of inscrutable to us because they are, you know, yeah,
(41:24):
like coming at these problems in a way that doesn't
make sense to us. You know, from the starting point
that humans come from.
Speaker 1 (41:30):
After writing this book, what do you think we understand
now about minds that we didn't understand let's say, a
decade or two ago.
Speaker 3 (41:39):
So it's funny because when I have taught this material
for you know, twenty twenty years at this point, I
normally start my cognitive science classes saying, you know, welcome
to cognitive science. This is going to be different from
your other science classes. Normally, when you take a science class,
someone is going to stand up and say, Okay, here's
all the things that we figured out. Here are the
answers to the questions. And in cognito science, it's more
(42:02):
that we figured out how to get better at asking
the questions.
Speaker 4 (42:05):
We haven't answered them. We don't.
Speaker 3 (42:06):
It's not like you have a consensus across the whole
field about what those answers look like. And so I
think that's important that we're still very much And this
is what got me interested in cognres science in the
first place. You know, still a field that has deep
mysteries and lots of opportunities to learn and discover interesting things.
But I think over the last ten years, like so,
as I was working on this book, I wrote the
(42:28):
first chapter, and I had that disclaimer in the first
chapter and said, okay, look, you know I'm not bromising
you answers. Well, well, we're going to see if we
can get a good handle on the questions. But by
the time I got to the end of the book,
right after that sort of process of working on it
for years, I felt like things that actually, you know,
(42:49):
me going through the process of writing it and exploring
all these things and thinking about how they fit together,
but also just where the field was, you know, having
moved forward, I actually started to feel like, actually, these
things do fit together in a way where you can
see the glimpses of what answers are going to look
like in a way that I think really wasn't there
ten years ago. And it's that story of Okay, we
sort of know what the goals are, right, we know
(43:11):
what the right mathematical systems are for describing what intelligent
system should be doing. You have these ingredients of symbolic
systems and probablistic inference, and we've discovered that in fact,
you can get a remarkable way just using these artificial
neural networks to learn to approximate those things, and so
that demonstration I think has shown first of all, that
language is a extremely good substrate for intelligence right in
(43:36):
a way that I think people had not anticipated before
large language models, and that you can make big neural
networks that can learn to approximate really complex probability distributions.
And so it gives us some of these ingredients for
seeing how what originally worth three very different views of
the mind might start to fit together to make something
that's a little bit more of a unified hole.
Speaker 1 (43:57):
Excellent, And when you wrote the book what struck You
is the most beautiful idea in the whole quest, in
the whole history of this gosh.
Speaker 3 (44:07):
Okay, I mean, I'm a big probability for theory fan,
so going to you're gonna get me endorsing Bays rule,
which I really do think is like it's it's when
you learn it, take it a probability class. It's just
like it's just a dumb principle of probability theory. But
when you make this move of saying probability theory isn't
just about dice and cards, it's about you know, beliefs,
(44:30):
it suddenly becomes a very deep and insightful sort of principle,
And in the book, I also show probability theory kind
of subsumes logic, like everything that's a valid logical inference
is also a valid inference in probability theory. Probability theory
just kind of extends the surmountics of logic to these
cases of uncertainty. So to me, I think that's a
that's a that's a big one. I kind of like
(44:50):
that's where I live.
Speaker 1 (44:51):
Yeah, excellent, And and somebody, if we do have a
mature physics of thought, let's say fifty years from now,
what is that change from us in terms of education,
in terms of the way we build machines.
Speaker 3 (45:05):
So I think this is this is exactly where we
can go, right, which is, once you figure out the
scientific principles of a domain, you can start to think
about how to do engineering right. So like you know,
when you're an engineer and you go to engineering school,
you take physics, right, and then you learn in your
physics class what these principles are, and then you take
your applied engineering classes, which are like taking those physical
(45:27):
principles and telling you how to build a bridge right
and explaining that you know not in terms of heuristics
for what makes a good bridge, but in terms of
those fundamental physical principles. So I think that's a thing
that's incredibly exciting here is that as we start to
converge on what these laws of thought look like, it
gives us the opportunity to do a much more sort
(45:48):
of science based form of engineering applied to human cognition,
thinking about how do we make an optimal you know,
sort of learning environment, how do we support human decision making.
That's something that I work on in my lab is like,
how do we put computation into human environments to overcome
(46:08):
whatever computational constraints we have as individual decision makers and
help us make better decisions. And how do we understand,
you know, the kinds of things that people are doing
in a way that allows us to then sort of
like make suggestions about, you know, how they might do
them better. Right, And so I think there's a there's
(46:30):
a lot of potential for you know, sort of human
upside as we start to be able to answer these
scientific questions.
Speaker 1 (46:41):
That was my interview with Tom Griffith's. To quickly summarize
his framework, Tom sees three major scientific approaches that all
try to capture the mind. You've got rules and symbols,
you've got artificial neural networks, and you've got probability theory.
These very different approaches, and each which one has delivered
(47:01):
something a little different. Rules and symbols give us language
like machinery where pieces can be assembled and reassembled into
complex ideas. Artificial neural networks they give us graded concepts,
meaning ideas can be fuzzy, and probability theory gives us
a language for dealing with uncertainty. Now, what's interesting is
(47:24):
that human minds seem to traffic in all of these modes.
We use structured symbols, we also use graded concepts. We
also revise our belief as new evidence comes in. And
part of that is that we move through the world
with prior beliefs shaped by our history, our culture, our language,
our neighborhood, our moment in time. So none of these
(47:46):
models by themselves are the final answer. And what this
means is that, like most scientific stories, this is one
about humility. Tom's book illustrates how every generation arrives with
some new formalism, some new piece of math, some new
model that's powerful enough to illuminate an area of mental
(48:08):
life and for a moment it feels like, hey, the
whole mystery is finally collapsing. But then the spotlight widens
and we see more terrain. So what I love about
this conversation is that it can leave us with a
sense of progress and a sense of wonder. At the
same time, we feel a convergence of different fields, and
(48:29):
we can also feel how large this subject remains. Cognition
is still a field in motion, So let's look at
the big picture. When the field of physics matured, we
could then build bridges and airplanes and power grids because
we had firm principles to build on. So once the
(48:50):
laws of thought come into clearer view, what becomes possible
for education and for decision making, and for rules that
help us reason more effectively. So here we are at
a very cool moment in history where the old dream
of formalizing thought has escaped to the library and shown
(49:13):
up in everyone's laptop. The big thinkers of centuries ago
could sort of squint and see the outline of the project,
and now we're living much more squarely right in the
middle of it. If there truly are laws of thought,
they're going to teach us about our machines, but more importantly,
they're going to teach us about ourselves, because although it's
(49:35):
sometimes tempting to view the mind as a ghostly exception
to the universe, the mind, presumably is part of the universe,
and it is lawful and wondrous and discoverable, and every
step towards understanding it enlarges the human story. Go to
(49:56):
eagleman dot com slash podcast for more information and to
find further reading. Join the weekly discussions on my substack,
and check out and subscribe to Inner Cosmos on YouTube
for videos of each episode and to leave comments until
next time. I'm David Eagleman, and this is Inner Cosmos.