Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Blake (00:00):
And actually that's a
really good way of phrasing it
because I think what's sofascinating about human learning
and this is really what marksnot just human, but I would say
broadly sort of generalistspecies is that we seem to be
able to actually define our owncost functions.
Matt (00:23):
You just heard a little
sound bite from my interview
today with Blake Richards.
Thank you Paul Middlebrooks forgiving me that idea.
I've been watching his podcastscalled Brain inspired.
If you like our podcast, givehim a shot at braininspired.co.
Welcome to the show.
This is the Numenta onIntelligence podcast and today
we're going to have anotherInterview with a Neuroscientist.
(00:44):
So stay tuned and we will getright into it.
All right welcome to anotherepisode of Interview with a
Neuroscientist.
I'm Matt Taylor with Numenta andtoday I'm really excited to have
Dr.
Blake Richards here with us.
He's an associate fellow at theCanadian Institute for Advanced
(01:04):
Research Hello Blake.
Blake (01:06):
Hi Matt.
Matt (01:07):
Great to have you here.
I've been following your workfor a while and I'm interested
in the ideas you are bringing tothe field as an observer of like
Twitter and the neurosciencecommunity for the past couple of
years.
I feel like you're part of thissort of new, new wave of
neuroscientists coming up withsome new ideas and not just
(01:27):
about the science but also aboutprocesses and protocols.
How do you think the field ischanging right now?
Blake (01:33):
Yeah that's a good
question because it definitely
feels like it's changing andit's not always easy to put one
finger on exactly what ischanging.
I think the way that I wouldarticulate what's happening
right now is that we areactually seeing neuroscience
more, at least parts ofneuroscience, morph into
(01:53):
something that's almost moreakin to what cognitive science
was back in the day.
That is a trulyinterdisciplinary field of
research that incorporates notonly the components of biology
that are relevant tounderstanding how brain cells
(02:13):
communicate with one another butalso components of computer
science and philosophy andpsychology in order to try to
get a grasp of what we mightcall sort of general principles
of intelligence and generalprinciples of behavior that are
(02:34):
important for understanding theways in which any agent whether
an animal or in fact anartificial agent works.
And that's quite different fromwhat neuroscience was when I
started as a Ph.D.
student you know a little over adecade ago where it was really
more kind of the Sub-Branch ofbiology and with a bit of
(02:58):
psychology thrown inoccasionally.
Matt (03:00):
So it definitely is
broadening a lot it seems.
It's one point.
And you think that's becauselike to understand the general
principles of the brain you haveto you have to think broader
than just the molecular biologylevel, right?.
Blake (03:13):
That's right exactly.
I think that's part of it.
And I think it's also a resultof the realization more broadly
in biology altogether that it's-biological systems are so
complex and their operations areso non-trivial.
(03:33):
You really have to bring to bearany tool that you can to
understand them.
And it's not really viable tosimply-- look, I think with the
practice was in neuroscience formany years and what some people
still do to some extent is whatI call you know neuro stamp
(03:54):
collecting where you basicallyjust try to get as many facts
about the brain and itsoperations on a biological level
as possible.
And there this this hope thatyou know the more facts we
accumulate at some point we havesomething like an understanding
of the brain.
But you know Daniel Wolpert aresearcher who lives at
(04:18):
Cambridge I think he's moved toColumbia now.
He had a great bit about thisthat he gives in his talks
sometimes so there's a veryfamous neuroscience textbook by
Kandel and a few others calledPrinciples of Neural Science.
And it's the textbook that manyof us receive when we first
start in the field andPrinciples of Neural Science-
(04:42):
Daniel Wolpert has this plotwhere he shows the number of
pages of the Principles ofNeural Science keeps increasing
every year after year accordingto linear function and he, he
points out that like if we wereactually uncovering Principles
of Neural Science, presumablythe book wouldn't have to keep
(05:02):
growing and growing because allit is at this point in time is
an accumulation of potentiallyunrelated facts about the brain.
So what people are starting todesire and why we're seeing this
shift towards broader ways ofthinking about the brain is
something more like trueprinciples and the way that
Daniel Wolpert puts it is youknow, we know we'll be
(05:25):
successful in neuroscience inthe coming decades if we can
start to actually shrink thenumber of pages in the
Principles of Neural Sciencetextbook.
Matt (05:32):
Right, that make sense.
When I'm reading a neurosciencepaper, because I sometimes read
neuroscience papers as a driveto try and understand all of the
real biology behind the theory,and there's so many ways you can
just go down rabbit holes and belost forever, you know you can
spend your whole career studyingthis one particular aspect of
(05:52):
intelligence.
Blake (05:52):
That's right.
Matt (05:53):
It's amazing.
Blake (05:54):
Yes.
Exactly and that's what manypeople have done in the past and
historically you'd kind of pickyour specialization and your
particular circuit and you wouldstudy the hell out of it.
So you would be the expert onyou know the synaptic physiology
(06:15):
of the Shaefer collaterals inthe hippocampus or something
like that.
And that you know, that madesense in some ways in terms of
like that was a good way tolike, I think the impulse behind
it was a good one.
The idea being that you reallywant to fully understand the
systems and you know these arecomplicated systems so why not
(06:36):
take decades to study this onelittle circuit.
But yeah if, if you don'tactually end up bringing that to
to unite with other things thatwe're learning about the brain
and with broader principles thatwe might derive from artificial
intelligence or psychology, thenyou know, how can you actually
(06:56):
say that you've gained anunderstanding of the brain
beyond just the stamp collectingas I say
Matt (07:02):
Right.
We've got to put the thing, thefacts together in some cohesive
story about how it all works.
All the things work.
And that's in some ways ofsaying you know at the end.
It involves imagination involvestheorizing.
Blake (07:17):
That's right.
Exactly.
And I think it's something whichmany neuroscientists are
uncomfortable with and it's whysometimes we see some pushback
against this slightly newdirection in neuroscience
because some people areuncomfortable with the idea that
we are going to, basically,because part of what is required
(07:37):
to develop the kind of cohesivebroader picture of how the brain
is working is occasionally notincorporating certain biological
facts into the ways that you'rethinking about something because
there's just too many to to wrapyour head around trying to make
it all work.
And I think that makes somepeople uncomfortable because it
(07:59):
means that occasionally we'reignoring some components of the
biology that we know exist eventhough, you know, we know it's
true we were kind of like, wellwe're not going to think about
that for our broader model rightnow.
And that's something noteveryone is comfortable with.
Matt (08:13):
Maybe we can explain it.
We know something like this ishappening and we might know why
it needs to happen but not how.
Blake (08:20):
Right.
Yes.
Matt (08:22):
So I'm afraid of getting
too deep here.
But you're a Doctor ofPhilosophy so, why not?
I like to talk about reality,especially how it applies to
artificial intelligence as youknow the world perceives A.I.
right now.
Blake (08:38):
Yeah.
Matt (08:38):
And so I love this idea
that Max Tegmark introduced me
to, this external reality thatjust exists.
It's sort of like the groundtruth.
It's what's out there and all ofus intelligent beings have an
internal reality which is reallyjust a model based on our
experience with reality what wethink it's like.
(08:58):
And they're all sort of wrongand distorted.
You know, it's just our it's oursensory perception over time of
what we think is out there andin order for us to communicate
with each other we have toestablish sort of a consensus a
reality where we can share ideasand we can say red and you know
what I mean and I can say twoplus two equals four and we know
what that means.
You know this sort ofaccumulated knowledge is in this
(09:22):
consensus reality.
And when you talk about AI, Imean if we're going to create
intelligence sort of in ourimage if we're if we're trying
to learn how the brain works andwe think we can turn around and
reverse engineer it and createsomething like that.
It goes against this idea thatsome people want to make
explainable AI.
They want to know you knowexactly why AI made a decision
(09:45):
and it always bothers me becausefrom that from the perspective
of biology we can't do that withbiology.
So how can we expect to do thatwith, you know, machine
intelligence in the same way?
Blake (09:54):
Quite.
Yes I agree.
That's a really good point and Ithink this the complaint that
current Deep Learning Systems inAI are interpretable or
unexplainable is certainly afunny one whenever it comes from
neuroscientists because I ampersonally completely convinced
(10:19):
that the brain is probablyequally uninterpretable and
unexplainable.
Certainly you know I thinkConrad Kording a neuroscientist
at UPenn articulates this wellyou know when you when you
actually go looking for you knowoh does the brain respond to
(10:39):
this stimulus does the brainrespond to this stimulus, etc.
Basically you can find almostanything you want in almost any
brain region if you look hardenough and interpreting that is
almost impossible and arguablythe only way to interpret it is
to come back to principles ofoptimization in the same way
that you know we can.
(11:00):
You know it's always happeningwhen people say that we can't
understand deep nets.
We do understand them.
We understand that they'reoptimizing on particular loss
functions.
We understand learningalgorithms that enable them to
optimize in that way.
And so we can say very clearlywhy they developed the
representations they developed.
We just can't articulate exactlywhat their solution is to the
(11:21):
problem in human readable formatand it's entirely possible that
the brain is the same way eitheras a result of the evolutionary
optimization or learning duringan individual's lifetime the
specific wiring of our neuralcircuits that lets us do the
things that we do, may or maynot be human interpretable and
(11:41):
there's no reason to expect thatit would be really.
So why would we expect the samefor deep neural networks?
Matt (11:48):
Something Jonathan Michael
said- he was on the program
awhile back and I asked him whatis it to grab a cup because he
studies motor commands andmonkeys and he's like What is
that.
Is that motion to grab arepresentation of grabbing a
cup.
How do you come up with that?
His answer is basically youcould bring together every time
you grabbed a cup and everyjoint experience you've ever had
(12:09):
in your entire life and that'swhat it is.
How do you convey that toanother person?
That's sort of the level ofinformation we're trying to
capture.
Blake (12:16):
Right.
Quite right.
Yeah.
And I think that you know thedifficulty with all this stuff
is that there aren't actuallysimple need to verbalize ways of
you know describing what it isto pick up a cup or what it is
to successfully navigatesomewhere or what it is to
(12:38):
suggest we perceive an object,you know- very, very abstract
mathematical descriptions thatwe can give but that's not what
many people who are complainingabout the lack of
controllability are looking for.
What they want is a simple fewsentence description of what's
going on and that just might notexist.
Matt (12:57):
Maybe it will exist in a
consensus reality that we create
with these intelligent systemsover time.
Blake (13:03):
Yeah possibly.
And so I think that what'sinteresting, what you are you're
saying that way which isinteresting is that arguably you
know part of what happens withhuman beings is that we make
some of our actionsinterpretable quote unquote by
virtue of the stories that wetell each other about why we did
(13:25):
something or other.
Right?
Matt (13:26):
Right.
Blake (13:27):
And I think often the
funny thing is if these things
are false One of the things thatwe know that there's some
evidence for research-wise isthat you know we will kind of
generate post hoc explanationsfor our actions even though the
(13:47):
experimentalist knows thatthey've manipulated you in such
and such a way.
And the fact is that I suspectthat's happening constantly.
I think that you know we areoften engaged in various
behaviors, the ultimate reasonsfor why we do the things we do
might be almost completelyunexplainable.
(14:07):
But we tell each other thesepost stories and then that
becomes our shared reality.
So you know I went to the storeand whatever bought some ice
cream because I was stresseating quote unquote like the
exact computations behind thatare surely far more far less
(14:27):
interpretable rather than I wasstress eating.
Matt (14:30):
Which is an interesting
segue into an essay I wanted to
talk about which was costfunctions or loss functions.
I don't know why stress eating,but you know, that's sort of
that's the feeling in need inyour brain somewhere.
Blake (14:44):
Yes quite.
Matt (14:47):
We've had discussions on
Twitter about this but I think
there's some in my audience thatmay not be familiar with that
term.
Could you maybe give it a 30000foot definition what is a loss
function.
Blake (14:58):
So a loss function is
just a way of quantifying
learning.
So when we talk about learning,necessarily learning implies
some kind of normativeimprovement.
Right.
If you are learning you'regetting better at something and
if you want to quantify you'regetting better at something then
(15:22):
you need to identify somenumber, some function.
That is a measurement of howgood you currently are at
whatever it is you're trying tolearn.
And the word we use in machinelearning to describe these
functions are loss functions orcost functions.
And so then learning can bedefined as anything which
(15:46):
reduces a loss function.
Matt (15:49):
Right.
So I have a background ofsoftware engineering so I think
I can think of this as afunction that takes input and
gives an output.
So in this sample what would asample of that input be and the
out the output would be you knowhow good it is right?
Blake (16:04):
That's right.
Exactly.
So the input would be thecurrent setting for the agent.
So in the case of a neuralnetwork it would be the current
synaptic weight for the agentand the output is this
measurement of how good it isnow.
Matt (16:23):
Now can we abstract that
even further?
I like to think about videogames.
Obviously I was playing a lot ofvideo games.
If you think about a loss a costfunction for Pong like for an AI
player.
Could I think of that as likeall the input being the location
of the ball as it moves and thenthe loss function judging how
(16:44):
well the panel whether thepaddle prevents the ball from
going past it or not?
Blake (16:50):
Roughly but I think the
way we would probably approach
it in an actual like AI systemis one step more abstraction.
So the input would be thecurrent policy as it were.
That would work.
That is to say the current setof actions that you would select
(17:11):
as a pong player based upon thescreen that you provided.
Matt (17:16):
Oh, so like all possible
things you could you might do
Blake (17:20):
Exactly.
All possible things you might doin response to all possible
inputs and then the output wouldbe a measurement of the average
score that you would get in thegame.
And so in this case it's it'swhat we'd call rather than a
loss function it's the inverseof our loss function.
(17:42):
We want to increase our score soyou want to see that improve
over time.
Matt (17:50):
It's like an optimization
function or something.
Blake (17:52):
That's right- an
optimization function- precisely
right.
Matt (17:56):
So more- again I keep
thinking about video games.
Could I also think of this interms of behavior?
Like if I'm playing Mario Kartor Pole Position depending on
how old you are and I'mcontrolling a car can I even
define that environment with aloss function?
If I'm if I want to say I wantto stay on the road I want to go
around the track as fast aspossible and I don't want to hit
(18:17):
things.
Is that- is that- is thisworking in that scene too?
Blake (18:21):
Yup exactly.
So again you know the way thatwe approach it in machine
learning is in this very sort ofhigh level you where you say
Okay so for all possiblesituations in this car game,
what actions would you take atthis point in time?
And then you would get somescore based upon that such that
(18:45):
your score would go down if youever drove off the road and it
would go up for you know howrapidly you were able to go
around the track or whatever.
And that is your loss functionthen.
Matt (18:58):
In your brain though, I
can imagine evolution provides
loss functions over a longperiod of time you know like
behaviors that expose themselvesin order to help the animals
survive right.
Those are coded in genes andthose are going to be stored,
well I mean they're going to beexpressed in older parts of the
brain is that right?
Blake (19:17):
Well so when we talk
about the loss functions that
govern evolution, what'sinteresting there is effectively
what we're talking about- thecentral loss function for
evolution of course is thelikelihood that your genes will
propagate to the next generationand the input to that loss
(19:40):
function.
So that's the output of the lossfunction is what's the
likelihood that your genes willpropagate to the next
generation.
The input to that loss functionis effectively your current
physiological state andevolution is about shaping your
physiology in order to maximizethe probability that you're
going to propagate your genes tothe next generation.
(20:03):
So that specific loss functionitself isn't encoded in your DNA
but your DNA has ultimately beenshaped by this process of
optimization on this lossfunction over time.
Matt (20:17):
So the example I'm
thinking of, trying not to be
crude, but all biologicalsystems have to excrete waste
and there is behavior in animalsto excrete waste where you're
not collecting food.
That's something I think- Isthat something that is at those
low levels of the brain or isthat something that you think is
learned?
Blake (20:38):
Right.
Well so OK so then we when westart talking about the
intersection between the sort oflearning that is evolution
because you can view evolutionas a type of learning because it
is this optimization.
Matt (20:54):
A a very slow type of
learning.
Blake (20:56):
That's right.
And a type of learning thatdoesn't occur in an individual
but instead occurs in apopulation.
Matt (21:02):
Exactly.
Blake (21:03):
So evolution is this very
slow learning that occurs over a
population and then within allof our brains we also have
learning algorithms that help usas individuals to learn.
And what I think is interestingis that part of what has
probably happened over thecourse of evolution is that one
(21:26):
of the things that that came outof our evolution was that it was
beneficial from the evolutionarycost function for our brains to
also optimize on some other costfunctions.
And sometimes you know ourbehaviors can seem a little bit
weird.
(21:48):
With respect to our survival.
Because even though it mighthave been beneficial in the long
run for us to be optimizing onthese other cost functions,
internally at the end of the daythey might not always agree with
the evolutionary cost function.
And so the example I always givethat way is with drug addiction.
(22:10):
So in all likelihood we thinkthat you know the brain seems to
have a cost function that issome kind of reward maximization
cost function.
Right?
You as an animal are going to dostuff that helps you to maximize
(22:30):
the probability of obtainingrewards and the difficulty then
of course is that if you takesomething that's very basically
intrinsically rewarding likeheroin, that cost function might
go into a new behavior to justdo whatever you can to get as
(22:50):
much heroin as possible eventhough that's not beneficial for
the evolutionary cost functionof you propagating the genes to
the next generation.
Matt (23:00):
It's sort of like shoring
a circuit.
Blake (23:01):
Yeah that's right exactly
a sort of short circuit.
Exactly.
And you know that's not to saythat you didn't evolve that
reward maximization costfunction for evolutionary
purposes.
Because the African savanna thatwas probably a pretty good core
function to be optimizing on butfor a modern human maybe not so
(23:27):
much.
Matt (23:27):
There are certainly
examples of us humans enhancing
our evolved cost functions forinstance if we're using example
you know don't excrete where youeat, at some point we decided it
would be good if we startedwashing that process which
increased our lifespanconsiderably.
(23:49):
We learned that behavior.
I mean it's almost like thesecost functions, once they
emerge, they're memes.
They turn into memes.
Blake (23:56):
Yes right.
Right.
And actually that's a reallygood with phrasing it because I
think what's so fascinatingabout human learning and this is
really what marks not just humanbut I would say broadly sort of
generalist species is that weseem to be able to actually
define our own cost functions.
(24:18):
So you know for example you knowsome people will just get
obsessed about getting reallygood at that particular random
tasks right like they will theywill decide that they really
want to be an expert on, I don'tknow, different types of lager
or something like that and it'snot immediately clear what cost
(24:43):
function they're optimizing onbesides this arbitrary one that
being able to distinguishdifferent types of lager.
But they do it, right?
And so we seem to have thisability to define our own cost
functions in a way that makes usincredibly flexible as an animal
and which again can sometimesseem to go against our evolution
(25:04):
but probably in its origins wasbeneficial for our ancestors
somehow.
Matt (25:10):
We're pretty much making
it up as we go along at this
point- defining our ownfunctions, doing whatever we
want.
I mean performance art is abeautiful thing to behold when
it is done right.
And it's a cost function.
And if it's appealing to thegeneral public they get
accolades for it.
I mean they're basicallydefining beauty with one of
(25:35):
these cost functions.
It's amazing.
Blake (25:35):
Yes that's right.
And yeah.
And so actually to come back toyour memetics point I suppose I
got off track.
What I think is interestingabout your point that way is
that we also and this ties backto your point about shared
reality, arguably what happensin human society is that we
develop joint shared costfunctions.
(25:56):
So you know we all decide thatwhat we really want is, you
know, whatever like particularB.C.
house music or particular as yousay like performance art with
certain characteristics that arehard to find
Matt (26:12):
Certain type of politics
or whatever.
Blake (26:14):
Yes that's right exactly.
And so that then becomes thething that we're all optimizing
on because we were obsessed withthese sorts of shared memetic
goals that we develop.
Matt (26:28):
Wow.
All right.
I didn't know how they weregoing to go but we were pretty
deep.
That's awesome.
OK.
Well let's talk about deeplearning.
We haven't touched on really awhole lot yet.
You've done a lot of work andlearning.
My audience may not be the mostproficient in the subject.
I think, I think you know theHTM audience is more towards the
(26:49):
neuroscience than the hobbyistsand the engineers.
So maybe you could talk aboutback propagation in a simple
term.
Can you define back propagationfor us and why doesn't it work
biologically.
Because that's one question it'dbe great to explain.
Blake (27:03):
Sure yeah.
Ok so I'll start by justdefining deep learning.
So deep learning is a particularapproach in machine learning
that has two basic tenets.
The first is that you should tryto have minimal intervention
(27:26):
from the programmer meaning youshould hardwire as little as
possible and have the systemlearn as much as possible.
So this is in contrast to moretraditional approaches to
artificial intelligence whichare sometimes referred to as
good old fashioned AI or GOFAI
Matt (27:47):
Like expert systems or
very finely tuned applications.
Blake (27:51):
That's right.
That's right.
Where you as the programmer sayOK Computer here's the way I
want you to act, here's thelogical chain of reasoning that
I want you to engage in, here'syour understanding of the world
as programmed by me.
Go behave intelligently please.
The deep learning philosophysays no you as the programmer
(28:14):
should do as little hardwiringas possible and you should
basically just focus on thedevelopment of learning
algorithms that allow your agentto use the data that you provide
it to figure out for itself howto behave exactly.
Matt (28:30):
A noble endeavor for sure
yeah.
Blake (28:33):
So then the second tenet
of deep learning which
distinguishes it from quoteunquote shallow learning is the
idea that what you want to do isnot only to learn as much as
possible but to also have a whatwe call a hierarchical system
(28:56):
where you process data in aseries of stages or modules and
you also ensure that yourlearning algorithm is adopting
every one of those stages.
So the analogy that deeplearning people were ultimately
(29:18):
building off of that they wereultimately inspired by was how
our own brains work.
So even though it's anoversimplification of what goes
on in our brains to some extentyou can say that when a data
arrives at our retina it thengets processed by a series of
stages where each stage of theprocessing in our brains
(29:40):
identify ever more complex kindof abstract features of the
image that we're looking at.
So in the early stages ofprocessing your brain identifies
various lines and edges it thenassembles that into an
understanding of various jointsand shapes and then that gets
fed into areas that identifymore abstract object categories
(30:03):
etc.
etc.
And so the deep learningapproach was inspired by this
and said we're going to havethat same kind of like multiple
stages of processing and per thefirst part of the philosophy,
we're going to learn every partof that and that's what
distinguished deep learning fromsome of the other quote unquote
(30:24):
shallow approaches that werepopular at the time that deep
learning really took off such assupport vector machines and
chrome machines and relatedstuff.
Those systems would havemultiple stages of processing
but typically only the finalstage of processing was where
any learning occurred and allthe early stages of processing
(30:45):
were hardwired by the programmeror followed some pre-determined
mathematical formula and onlythe final stage was learned.
Matt (30:57):
Sort of a mash up of the
old way and the new way.
Blake (31:00):
That's right.
Yeah.
So really what distinguished thelearning was we're going to have
these hierarchical processingstages and we're going to learn
it all
Matt (31:09):
Right.
So what is the back propagation?
Is that the learning it allpart?
Blake (31:14):
You got it.
So the back propagation of erroralgorithm is a learning
algorithm which provides youwith to date probably the best
guarantee that anyone's everbeen able to develop that every
stage of your processinghierarchy is going to be
(31:35):
optimized based on a given costfunction.
Matt (31:38):
So that's just not
biologically feasible right?
There just couldn't possibly bethat many connections, is that
the argument?
Blake (31:45):
Well no actually so in
fact a big part of my research
is that I believe that the brainalso does this.
I believe strongly that ourbrains optimize every stage of
the hierarchy and they do so ina way that guarantees that the
cost functions that we'reoptimizing are reduced by virtue
(32:09):
of the changes that happen inevery part of our brains where
we say that back propagation isbiologically infeasible is that
back propagation is really justa specific way of implementing
something known as gradientdescent.
So gradient descent is thefollowing idea.
(32:30):
Let's say, so we've got thiscost function that we've
discussed where the input is thecurrent state of our system and
the output is a measure of howwell we're doing according to
our learning goal.
Matt (32:42):
So the complete state, all
of the neurons in the system.
Blake (32:46):
All of the neurons and
all of the synapses.
That's right.
So we get we take all theneurons and all the synapses,
that feeds into our lossfunction.
We get out this number thatmeasures how well we're doing on
our learning task.
The gradient descent approachsays OK the way we're going to
learn this system is we're goingto try to estimate the slope of
(33:13):
our loss function.
So you can think of it if youcan in kind of abstract terms:
think of the loss function asrepresenting the height of a
hill and your position in kindof you know GPS coordinates as
(33:34):
representing the current stateof your network that you're
feeding into your loss function.
Matt (33:39):
This is, of course is a
very high dimensional thing.
Blake (33:42):
Very high dimensional.
That's right.
So you're not moving in a twodimensional space which is what
you're doing when you're lookingat GPS coordinates but instead
you are moving in a say 10million dimensional space.
So you've got this hilleffectively in a 10 million
dimensional space and in thesame way like let's say you're
(34:05):
let's say you were a blindperson who was trying to descend
a hill, how could you do it?
Well one potential way of doingit would be to basically try to
figure out just by feelingaround a bit which direction the
slope was going and always justwalk downhill.
Matt (34:24):
So local analysis sort of.
Blake (34:26):
Yes exactly a local
analysis.
So if you always just look atthe slope to where you are and
you go downhill eventuallyyou're guaranteed to converge to
a local minima.
At some point you're going toreach something that is the
bottom of some valley.
Now it might not be the absolutebottom of the hill but you're
(34:49):
guaranteed to approach a localminima anyway.
Matt (34:52):
That's a great explanation
by the way.
Blake (34:56):
Good.
Now what's interesting is thatthis is the reason that gradient
descent is such a powerfulapproach is that if you consider
it you know two dimensional orthree dimensional landscape it's
very easy to get trapped inlocal minima that are very far
from the global minima and it'ssomething that concerned many
(35:20):
people when gradient descentapproaches were first developed
in artificial intelligence.
Now imagine doing the same youknow hill descending in a ten
million dimensional environment.
In order for something to be atrue local minima you have to
have it be a minima in tenmillion directions and the
(35:41):
probability of that happening isactually relatively low.
So people have done analyses toshow that in fact what's
interesting about gradientdescent is that the higher the
number of neurons you have themore synapses you have, the less
likely it is that you're goingto get trapped in local minima
and thus the better it is to dogradient descent.
(36:04):
So what we've kind of discoveredin AI is that in fact these
gradient descent algorithms workbetter, the larger the system
the more we scale it up.
Matt (36:15):
These things are so high
dimensional it seems like you
never really settle on anythingif it's a dynamic system.
Blake (36:22):
Right.
So in fact what you can show isthat basically says sometimes
what can happen to thesealgorithms is that they'll get
trapped in what's called asaddle point.
And that's where you've got alocal minima in a few directions
but non minima in otherdirections.
And if you happen to get trappedat exactly the middle point of
this saddle then your algorithmcan get stuck.
(36:44):
But people have worked out avariety of tricks to get past
and and with those tricks inplace, basically the only time
that your algorithm ends upconverging is when it gets
pretty close to what we think issomething like a global minimum
of a function essentially.
Matt (36:58):
So this gradient this
finding they're trying to find
the best place to be in that ndimensional space is what back
propagation enables because wecan see the complete state
space.
So now enter ideas about apicalcredit assignment and how that
could work.
Blake (37:19):
Right.
Right.
So to be clear back propagationis one possible way of doing
gradient descent and what my labhas been proposing is that there
- Well we know there are otherways of doing gradient descent
and I am personally convinced bythe idea that our own brains do
(37:40):
something like gradient descentbut there are a variety of
reasons that the specificdetails of propagation are just
biologically infeasible and oneof those things is that in order
for back propagation to work youhave to do a full path forward
through your hierarchy and thendo another path towards your
(38:02):
hierarchy and these need to betwo separate things that you do
and there's no evidence that ourbrains engage in this sort of
separate forward and backwardpast.
Matt (38:14):
Right
Blake (38:14):
What's interesting though
is that when we look at the
physiology of the neurons in ourbrains a lot of the feedback
that arrives at the neurons, theprincipal neurons of our
forebrain, which are calledpyramidal neurons, a lot of the
feedback that arrives at apyramidal neuron is in its
(38:34):
apical dendrite which is aspecial dendritic compartment
that these cells have thatbasically actually goes up
towards the surface of thebrain.
So what my lab's been interestedin is the idea that these apical
dendrites might actually providea way of integrating some of the
(38:56):
feedback information that youneed to do gradient descent with
without disrupting the ongoingprocessing that's happening in
other parts of the cell.
And that in this way you couldestimate the gradient of your
cost function without disruptingwithout having to do separate
forward backward passes ordisrupting the processing that's
(39:17):
occurring.
Matt (39:17):
So instead of duplicating
or doing the past twice you're
adding an additional sort ofcomputational unit to each
neuron.
Blake (39:25):
That's right.
Exactly.
So if each neuron has its ownlittle computational unit where
it can calculate its gradientinformation then you don't have
to worry about the separateforward and backward passes.
Matt (39:39):
So I just want to relate
this to HTM because that's my
audience you know.
We talk a lot about distal basaldendrites and it sort of having
its own computation for a cellthat can predict or change its
behavior.
It's similar to that I think...
Blake (39:54):
So you know this is
something that I really like
about the model that you guysare building at Numenta is I
think thinking about things inthis way where you say okay what
might this dendritic compartmentbe contributing to learning and
how might that be a distinctcomputation is something that
has rarely come into artificialintelligence but which I suspect
(40:15):
is critical to understandingwhat's going on in the brain
because just when you look atthe diversity of shapes of
neurons in the brain it's prettyclear that there's, the brain is
using these different dendriticcompartments to do different
computation somehow.
And so that's got to be part ofthe answer.
Matt (40:35):
Absolutely.
So with these new ideas how do,how can we change current deep
learning frameworks?
Because this is sort of going tothe core of what a neuron is the
definition of a neuron how canpeople earning change to
incorporate these new ideas.
Do you see a path forward?
Blake (40:54):
Yeah so I think that
probably the most important
thing for the people is in termsof incorporating some of these
ideas is about hardwareimplementations potentially
because you know the fact isthat gradient descent works so
(41:15):
well that one of the things, itdrives some people who are
purists nuts.
Gradient descent works so wellthat what we've seen over the
last few years in AI is just anexplosion of people saying OK
well I'm just going to definethis cost function and this
particular architecture and thenI do gradient descent to it and
(41:35):
voila I have now got a new stateof the art on some task.
And to some extent there's noreason to expect that that has
to stop in the near anytime inthe near future.
It will probably peter out atsome point.
But as it stands we're stillseeing continued improvements
from just applying gradientdescent to new problems and new
(41:56):
cost functions and newarchitectures.
Matt (41:57):
Five years ago, I wanted
an app that I could take a
picture of a leaf and tell mewhat kind of plant it was.
That exists now because ofgradient descent I imagine.
Blake (42:06):
That's right.
Precisely precisely.
And so you know I think thatwhere where that might end up
failing a bit more though isthat if you're actually trying
to you know actually build acircuit that does in the
(42:28):
hardware deep learning for younot just you know simulating it
on a GPU then maybe you'd wantto think about potentially
having circuits where you've gotdifferent compartments where
your gradient signals are beingcalculated or predictive signals
like you guys have are beingcalculated.
And this might end up being amore, much more efficient
(42:51):
architecture for running deeplearning systems.
Matt (42:54):
Do you think there's
software changes that can be
made to current deep learningframeworks that are in
production right now that canincorporate these things or is
this going to be like the nextphase of AI development that
incorporates it?
Blake (43:09):
I think would be more
like the next phase.
As I said I think this kind ofstuff- I think there are a
variety of things thatneuroscience can still teach
deep learning as it stands andwe've seen some of that with
respect to the incorporation ofthings like memory and attention
and other things.
But really I think in terms ofsome of these ideas about
(43:31):
dendrites and how are they goingto help it's only going to be
when we come to the sort ofhardware systems that it might
be useful, because you know atthe end of the day I suspect
that's why the brain did it aswell because it was having to
implement it in hardware
Matt (43:47):
That makes perfect sense.
Blake (43:48):
And you couldn't just
toss around you know numbers to
wherever it wanted to at anytime.
Matt (43:51):
Yeah that's a good insight
there.
Well, this has been a greatconversation really.
I have one more question foryou.
Are you a theoreticalneuroscientist?
Blake (44:02):
Yes I think I am.
Matt (44:03):
That's great.
It's hard to find thosesometimes.
Blake (44:07):
Yes.
Matt (44:08):
People who admit they are
theoretical neuroscientists, so
it's nice to see some peopleclaim that because I don't think
there's anything wrong with it.
Blake (44:15):
No indeed.
And I think that's, that's partof this shift we, this is maybe
a good place to come full circleon is part of the shift we're
seeing in neuroscience towardsthis broader perspective that
incorporates things like machinelearning and other parts of
mathematics into neurosciencerather than just taking this
very biological approach is thatwe need to shift to something a
(44:37):
little bit more like physics interms of having people who are
theoreticians that are reallyjust thinking deeply about the
data that's coming in and tryingto integrate it in order to
generate mathematical modelsthat can really guide our
experiments and guide thehypotheses that we're
generating.
Matt (44:56):
Especially with all the
data that is right around the
corner, these theories are goingto be dated or invalidated
pretty quickly.
Blake (45:04):
Yep that's right.
That's right.
Matt (45:06):
It's an exciting time to
be in the field honestly and
it's been a pleasure to talk toyou, Blake.
Thanks for being on the podcast.
Blake (45:12):
Yep thank you.
It was a real pleasure, Matt.
Matt (45:15):
Thanks for listening to
the Numenta On Intelligence
podcast.
For more information on ourcompany or our mission visit
numenta.com.