Episode 13: Subutai Ahmad on Applying HTM Ideas to Deep Learning

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Matt (00:00):
Welcome to the Numenta On Intelligence podcast.
I'm Matt Taylor, CommunityManager, and today I'm talking
to Subutai Ahmad, our VP ofResearch.
I've worked with Subutai atNumenta for almost eight years
now and I have the utmostrespect and admiration for his
drive, his thoughtfulness andthe calm, studious atmosphere he

(00:21):
brings into every situation.
It was a pleasure to talk toSubutai about his most recent
work, applying the ideas of HTMto constructing deep learning
networks.
I hope you'll enjoy it.
Okay, Subutai, thanks forjoining me.

(00:44):
I had some questions for youabout what you've been working
on in the domain of sparserepresentations and machine
learning.

Subutai (00:49):
Sure.
I'm happy to be here.

Matt (00:51):
One of the first questions I always, I've been getting
recently on Twitter especiallyis why are we looking at deep
learning?
Because that seems to be counterto everything we've talked about
for the past 10 years.
So why are we looking in thedeep learning domain at this
point?

Subutai (01:04):
Yeah, this was, um, this was interesting.
It's a, it's a change for usthis year compared to previous
years.
So we've been very focused onthe neuroscience and
understanding neuroscienceprinciples and creating
theoretical models ofexperimental data.
And we published a lot there.
Um, sort of early this year.
We published the frameworkspaper which put a lot of that

(01:25):
together into a consistentframework and scaffolding that,
uh, ties a lot of our findingsinto one place.
And I felt at that time it wasthe right time to start thinking
about, okay, we've done all ofthis neuroscience research, can
it actually apply to practicalmachine intelligence systems and
practical systems?
And initially the answer wasn'treally clear, but as we started

(01:49):
looking into it more and more,it looked like we can actually.
Rather than starting fromscratch, we can do a more
incremental approach of lookingat existing machine learning and
machine intelligence systems,start incorporating neuroscience
principles and keep expandingand improving on the
functionality there.
And with deep learning inparticular, there are some

(02:12):
really big fundamental problemswith deep learning despite all
of the successes that it has.
It has a lot of really keyissues that needs to be solved
there to get to trulyintelligent systems.

Matt (02:21):
Right.

Subutai (02:21):
And the neuroscience work we've done so far I think
could really impact, um, thedeep learning world and impact
practical systems.
So that's really the reason Ithink we, we moved, started
moving this way.

Matt (02:32):
So that's an interesting, subject to go down.
Can you talk about some of theseinherent problems in deep
learning and what types ofthings that we see in
neuroscience can help in thosesituations?

Subutai (02:43):
Yeah, and these are- everything I'm going to say is
sort of generally acknowledgedwithin the field as well.
But, um, there are sort of deeplearning has been incredibly
successful.
Um, but it's nowhere near thecapability of human
intelligence, which we think ofas extremely flexible and
dynamic.

Matt (03:01):
Well, voice recognition is really good, but voice
understanding is really stillvery bad.

Subutai (03:06):
Yeah.
Um, and so some of these thingswould be, uh, and we've talked
about these for a long time, butthe idea that a system should be
continuously learning.
Today's deep learning systemsare very static and rigid.
You train them and then that'sit.
Um, they're actually not reallylearning systems.
They're more, they're trainedand then they're static.

Matt (03:26):
We used to call them online, they're not online, not
continuous, uh, I mean peopletry hacks or ways to, u h, batch
train them or get them to trainat certain intervals, but that
doesn't address the underlyingproblem, which is, it doesn't,
it doesn't, like in our brains,our s ynapses a re constantly
updating as we learn with everytime step.

(03:47):
Right?

Subutai (03:48):
Yeah, exactly.
And, and in the deep learningcontext, this is an active area
of research and the deeplearning world, um, you know,
they have this thing calledcatastrophic forgetting where as
you learn new things, uh, it'svery easy to forget the old
stuff unless you really payspecial attention to it.
But from the neuroscience stuffwe've done, uh, continuous

(04:08):
learning is relativelystraightforward.
Um, and we've shown this indifferent isolated settings in
the past.
Uh, if you have really sparserepresentations and a better
neuron model and a predictivelearning system, uh, you can
learn continuously withoutforgetting the old stuff.

Matt (04:24):
And the neuroscience as far as the synapses and the
neuron level, that's been wellknown for a long time about how
those things learn.

Subutai (04:31):
Exactly.
Yeah.
Yeah.
And more and more is beinglearned, ah, all the time there.
Um, another big thing isrobustness.
So, um, neural networks areknown, uh, and deep learning
systems are known to be veryfragile and sensitive.
If you start adding noise orother things, their accuracy
drops quite a bit.

Matt (04:48):
Adversarial attacks, perhaps?

Speaker 3 (04:50):
Adversarial attacks are sort of the extreme example
of that.
Um, but we know the human brainis not as sensitive as these
networks are.
Right.
And what we can show is that if,if you have really high
dimensional, sparserepresentations, those networks
and also be a lot more stableagainst noise, then, uh, then

(05:10):
you know, dense, uh, systems.

Matt (05:11):
So let's talk about that.
That's one of the topics Iwanted to kind of break apart.
The idea of high dimensionalityis one, cause that's a topic on
itself.
And then the sparseness withinthere, it's, I think it's hard
for some people to think aboutdimensionality from a
mathematical perspective becausewe're always living and
breathing in three dimensions,you know, through time.
But when we talk about highdimensionality and deep

(05:34):
learning, or in sort of neuralnetworks, what does that really
sort of mean?
Like if you take an image, howmany dimensions does one image
have?

Subutai (05:42):
Yeah.
So in the dimensionality, whatwe're talking about is
dimensionality of the vectorsthat are at play.
And so if you think about aneural network, you have a bunch
of neurons or units that areprojecting to another set of
neurons.
And so you have one at any pointin time, you have a vector of
activity.
And if you have 10,000 neurons,then that's a 10,000 dimensional

(06:05):
vector, right?
And it's feeding into anotherneuron.
Now, this layer that hasanother, that represented by
another big vector, right?

Matt (06:12):
So, so these neurons in well, in our, in our brains and,
and deeep networks are trying tojudge somehow some activity over
across this huge amount ofspace.
This huge space.

Subutai (06:21):
That's right.
Yeah.
So we might be living in a threedimensional world, but, uh, the
information that's actually inthese neural networks and these
incredibly high dimensionalspaces, and that's what it
offers.

Matt (06:34):
And so what, why does sparsity help?
I mean, that's really what we'retrying to nail down.
Why does it help to make a deepnetwork sparse?

Subutai (06:43):
Yeah.
So, um, from a robustnessstandpoint, what we can say is
that if you have, what doesrobustness mean and what is kind
of stability, uh, mean in thiscontext?
Uh, you know, if you add noiseinto the input, you want the
output of the layer to be, tonot change much, to be pretty
stable and insensitive to thenoise that's coming in.

(07:06):
Right.
And what happens with densenetworks is if you change
something in one place, itaffects everything, whereas with
sparse networks, because youhave mostly zeroes, um, changes
in one place tend not to affect,uh, you know, representation as
much.
And, um, so this is a case whereif you have a vector of outputs,

(07:28):
mostly they're zeros, but youhave some non zeros and your
connectivity or the weightmatrix is also sparse, then
changes in the input are goingto have very little impact on
changes in the output undercertain conditions.
Um, and that's, uh, and the mainreason is every, mostly these
things are zero.
And so most of the time, it'sonly a small subset of small

(07:49):
clusters of points that arereally impacting one another.

Matt (07:52):
Is it a stabilizing effect on the whole network?
Sort of?

Subutai (07:56):
Yeah, I think so.
As long as you're in the rightkind of mathematical regime
that's dictated by theequations, then then you get
representations that areincredibly stable, um, and
incredibly resistant tointerference from random stuff
going on.

Matt (08:09):
Right.
So can we talk about the, theidea of sparse really in the
realm of deep learning now we'regoing to get there in this
podcast.
So can we talk about thedifference between, um, the
connections between layers?
We can just talk about hiddenlayers cause it's sorta there's
a generic way to apply sparsityright, to connections and then
to activations.
So can you break that apart?

Subutai (08:30):
Yeah.
So, um, uh, there's two sets ofvectors really that we're
talking about.
One is the vector ofactivations.
So at any point in time, what isthe output of a given neuron and
that, and the output of all theneurons in a layer can be
represented as a vector.
Then that layer projects toanother layer.
And if you look at one of theunits in the destination layer,

(08:56):
there's another vector ofweights, uh, that indicates what
the connection is to each ofthese input layers.
And so when you have an inputand you want to find a output of
one of these units, you do avector multiplication or a dot
product, um, between the weightsand the activations.
And that gives you the output.

(09:16):
So though, so there's two placesyou can be sparse.
You can be sparse in theactivations.
So maybe most of these vector,most of these units are zero in
the, in the incoming layer orthe weights themselves could be
sparse.
So that means most of theconnections are actually zero.
Only a small number areconnected.

Matt (09:35):
So, okay.
So, cause if it's zero, there'snothing to multiply.

Subutai (09:39):
That's right.
Right.
And if a weight value is zero,it doesn't matter what the input
value is f or that unit, it'snot g oing t o make, have any
impact on the, on the output.

Matt (09:48):
And so that's sort of going back to the neuroscience,
that's an idea fromneuroscience, right?
Could you talk about that a bit?

Subutai (09:55):
Yeah.
So it's in the neocortex whereit's known for a long time that
the neocortex is extremelysparse in just about every way
you can imagine; there are moreways than these two actually.
Um, so the neocortex, theactivity of neurons at any point
in time is extremely sparse.
So as, as I'm talking to you andyou are talking to me, um, you

(10:17):
know, the neurons in myneocortex are firing, but
typically it'll be less than 2%of the neurons at any point in
time are firing.
And quite often it's, uh,significantly less than 1%

Matt (10:28):
For the most part, no matter what's sort of happening
in your environment, it's alwaysstable.

Subutai (10:32):
It's always a, well, it's always sparse.
It could be moving around, butat any point in time it's really
sparse.
And so it, that is incrediblelevels of sparsity it's, you
know, 99 more than 99% of theneurons at any point in time are
silent.
Um, and deep learning systemsare not like that.

Matt (10:49):
Right.
They're very dense.

Subutai (10:51):
They're very dense.
The other side of it is that ifyou look at the connections and
the synapses from one layer toanother, um, and if you look at
the projections, those are alsoextremely sparse.
Most of the connections thatcould exist don't exist.

Matt (11:08):
You think about one neuron and all of its dendrites and all
of the thousands and thousandsof synapses across all those
dendrites.
If, if this cell body is waitingfor some stimulus to respond,
that's a huge space, to beobserving.
Right?

Subutai (11:21):
Yeah, exactly.

Matt (11:22):
Yeah, it makes sense.

Subutai (11:23):
Yeah.
And we should get to the neuronunits and the dendrites
themselves cause that there's,you have other types of sparsity
that come into play.

Matt (11:31):
Go for it.

Subutai (11:31):
Okay.
So in the neocortex, the,neurons are a lot more complex
than in, in deep learning.
Uh, you alluded to the, youknow, dendritic tree, the, the
complex, um, set of dendritesand that's where all the inputs
to a neuron come into the, ontothe dendrites.
Right.
Um, it turns out those dendritesare very nonlinear and complex.

(11:54):
And at any point in time, uh,particularly when you get
further out from the neuron, thedendrites themselves are tiny,
sparse computing devices.
U h, so isolated segments of thed endrites are recognizing
sparse patterns, u h, and actingon their own, independent of the
other parts of the dendrite.

(12:14):
So the neuron itself has lots ofthese tiny, sparse computers
spread throughout the dendrites,this is something called active
dendrites in the neuroscienceliterature.
Um, a nd it's, that's prettyinteresting.
Again, it's v ery different fromhow deep learning systems work.

Matt (12:31):
We call it coincidence detectors sometimes.

Subutai (12:33):
Yeah.

Matt (12:34):
But the idea is that those, each one of those little
things could send a signal tothe cell body to let it know
something's about ti happen.

Subutai (12:41):
Yes.
You know, some part of thedendritic segment could detect a
coincidence of other neuronsfiring and that's a sparse
pattern that's coming in and itcould then, uh, initiate a
dendritic spike.

Matt (12:53):
I don't want to go off on a tangent, but how could we
apply that idea to deepnetworks?

Subutai (12:58):
Yeah.
So I think that's somethingwe're looking into.
I think that's going to be keyto doing continuous learning.
Um, cause you, you take twoproperties together.
One is that these sparse vectorsare very unlikely to interfere
with one another.
They're very robust.
If you have a random, sparsepattern, it's very unlikely to
collide with another randomsparse pattern, and so, so you

(13:25):
have that.
So sparse patterns don'tinterfere with one another.
And then you get to thesedendritic trees and each one is
independently computing thesesparse patterns.
Right.
U m, because the neuron iscontinuously learning, it can
learn new patterns dynamically,u m, in different parts of the
dendritic tree without affectingthe other s parse patterns c

(13:48):
ause they're independent andmathematically they're unlikely
to interfere with one another.
Highly unlikely.
And so you can have acontinuously learning system
that avoids this catastrophicforgetting problem.
Y ou'd just be adding newsynapses and new things to one
part of the neuron withouteffecting the other parts of the
n euron.

Matt (14:06):
And potentially the old learning could be applicable to
the new space.

Subutai (14:10):
Yeah, exactly.
Yeah.
If there is a, if there is aclose match, you know, it's
going to be very applicablecause it's highly unlikely to
happen by chance.
And so you would learn that, butmost of the time you'd be
learning other things.
Um, and this allows you to dohopefully, cause I'm sort of
continuous learning really in areally stable way.

Matt (14:28):
It gets rid of the brittleness we talked about in
deep learning weights where youchange the weights and it could
throw everything off.

Subutai (14:35):
That's right.
Yeah.
Yeah.
Cool.
Um, so that's one of the keyprinciples through which the
brain does continuous learning,I think.

Matt (14:43):
Um, great.
So is there another type ofsparsity or we're going to talk?

Subutai (14:47):
Um, yeah, so another type of sparsity would be, so
neurons are these independent,sparse computing devices.
The learning on a neuron is alsovery sparse.
So in a deep learning system,when you learn, every weight
gets updated pretty much.
Um, but in a neuron only this,the little segment that detected
the pattern actually getsupdated, right?

(15:09):
So the learning happens in avery sparse way on a, on a
neuron.
Um, and that's critical as well.

Matt (15:15):
Very different than gradient descent.

Subutai (15:18):
Very different.There are a lot of differences there.
It's very localized and thisagain, helps continuous learning
cause you, when y ou a relearning a new pattern, you only
update one part of the weightsand you leave everything else
untouched.
And it's a tiny percentage ofthe entire neuron's set of
synapses.

Matt (15:34):
Cool.
Yeah.
So, well, let's talk quicklyabout the paper.
We haven't talked about this inthe podcast, although it's been
out for awhile on arXiv.

Subutai (15:45):
It's on arXiv now, yeah.

Matt (15:46):
How Could We Be So Dense is the title of the paper?
What's the subtitle, I forgot?

Subutai (15:52):
Uh, I think I changed it a couple of times.
I'm not sure.
I think it's like the power ofsparse representations or the
robustness of sparserepresentations.

Matt (15:59):
Yeah.
Something like that.
We'll link it in the show notes.
Um, but, so let's talk aboutbuilding actually taking deep
learning networks, like typicalarchitectures and how we can
convert them into sparsearchitectures.
Yeah.
Um, so like a CNN for example, astandard CNN, well, how would
you, what's the process ofmaking that a sparse network?

Subutai (16:21):
Yeah, it's actually turns out to be very
straightforward.
Um, uh, so in the paper what wedo is we go through the
mathematics of sparserepresentations and show how
it's very stable.
And then the second part of thepaper, we show how it can be
applied to deep learningsystems.
So in a convolutional network,you have two types of layers.
You have the convolutionallayers and linear layers.

(16:43):
Um, and basically, um, theconnection matrix, uh, having
that be sparse is prettystraightforward.
You just randomly initialize awhole bunch of those weights to
zero and you keep them fixed atzero.
So there's like a mask over theweights that maintains, uh,

Matt (17:00):
Permanent masks for the whole, the life of the model
that's going to be thispermanent.

Subutai (17:04):
In this paper, but we did is we just created a single
static random mask over at eachweight matrix and kept that,
those weights to be zerothroughout.
That's very straight forward.

Matt (17:15):
See that'd blow some people away.
You think like, if you did thatfrom the deep learning
perspective, you're ruining thenetwork in some way.

Subutai (17:20):
Yeah.
But in reality, it turns outthat these these deep learning
systems are often veryoverparameterized.
They have way more weights thanthey need.
Um, and so you can actually dothis and uh, and still have a
pretty sparse weight matrix andstill have it still work.
So that gives you a sparseconnections.
And the way we did sparse, um,activations is very similar to

(17:43):
the way we did it with theSpatial Pooler in, in our HTM
theory, which is we just look ateach layer and select the topK
most active cells and keep thoseactive.
And the rest are set to zero.

Matt (17:58):
And this is inspired by the idea of mini column
structures and yeah, and quartetand neocortex and those being
sort of groups with similarproximal dendritic trees.

Subutai (18:10):
Yeah, it, it's, it's, it's closer to this, it's almost
identical to the Spatial Poolerwhere we have a local inhibition
um, and you know, if a unit isreally strongly active, it's
going to inhibit its neighbors.
Right.
And in the, in the neuroscience,in the neocortex, you have these
inhibitory cells that, um, thatform sort of local inhibitory

(18:31):
networks.
And we think that's how thesparsity is, is in part created,
yeah, enforced in the neocortex.
So we have this K winner takeall network or system for each
layer.

Matt (18:44):
So that's like the activation function.

Subutai (18:46):
That's the activation function.

Matt (18:47):
Instead of like a tanh or a railu.

Subutai (18:50):
Yeah.
It's actually very similar to arally because in the rail you,
anything above a zero is keptactive.
Here, the threshold point isdynamically determined based on
the activity of the other units.

Matt (19:03):
How sparse do you want it?
Right.

Subutai (19:04):
Well that's a good question.
In the, if we were to match theneuroscience, we want it to be
like 98, 99% sparse in the paperwe were closer to 80 to 90%
sparse.
So 10 to 20% non zeros, um, uh,I'd like to get to the point
where we're much sparser thanthat.

(19:26):
And then you can do the samething with the convolutional
layers.
It's slightly trickier, uh,cause there's weight sharing and
sub, but it's, you can do thesame, same basic idea.

Matt (19:34):
So what does it get you?
What do you get from sparsifyingnetworks?

Subutai (19:37):
Yeah, so firstly, so surprisingly, uh, so here we
have sparse activations andsparse weights, which is
extremely rare in the machinelearning community.
And when I mention this to deeplearning people, they're kind of
surprised.
Like how could that possiblywork?
It turns out it works reallywell.
Um, so we've shown that for manydata sets, the three data sets

(20:01):
now that the accuracy leveldoesn't change, uh, so you can
get the same level of accuraciesthat you do with dense networks.
Um, and, but when you startadding noise into the inputs,
the sparse versions are muchmore robust to random noise, uh,
into the inputs then the denseversions are.

Matt (20:20):
Which is good cause we've always said that for years we've
said sparsity should help withnoise.

Subutai (20:26):
Exactly.
Yeah.
And if, you know, the math isnot at all a surprise, it has to
be that way.
Um, but, uh, it was kind, it waskind of nice to see that even in
a deep learning scenario, youcan maintain that property.
Right.
Um, and so this just shows thatthrough sparsity if both of
these things are sparse, you getrepresentations that are

(20:46):
inherently more stable, uh, thandense representations in a deep
learning system.

Matt (20:52):
Same with the similar accuracies and yeah.
And what are the benchmarks wewere working, I know MNIST.

Subutai (20:58):
Yes.
MNIST is kind of a basic onethat you start with whenever
you're doing something new.
So it works really well withMNIST, then we tried it with
CIFAR 10.
Um, and so, and then, uh,there's, uh, we've also tested
with audio with the Googlespeech commands dataset, which
is a sort of a data set of oneword spoken commands.

(21:19):
Um, and if the results hold inall three of those things, a
scenario for all three of thosebenchmarks.
Um, and the other nice thing waswe tried different network
architectures to, so one was asimple Lynette style
convolutional network.
We also did it with VGG 19,which is a much more complex
convolutional network, muchdeeper.

Matt (21:41):
You can apply the sparsity throughout-

Subutai (21:43):
throughout the network.
And it works.
And we've also done it with, um,a version of densenet, which we
call NotsoDenseNet.
Uh, so dense nets are, have beenused in image net benchmarks and
, and larger benchmarks as well.
And it works on all three ofthose different architectures.

Matt (22:01):
Well, it seems like with, uh, with all these zeroes,
eventually we should be able toget some computational gains
with that sparse multiplication.
Right?

Subutai (22:11):
yeah, yeah.
So you, you know, going back tothe question of what is the
benefit of sparsity, another bigbenefit is computational things
because if there's a zero, youcan ignore that piece.
Um, and, uh, traditionally it'sbeen very hard to exploit that
in machine learning because ofGPUS are inherently not as good
at handling sparse, um,structures.

(22:34):
But the brain is extremelysparse and because of that, it's
extremely efficient.
Uh, I think that in theneocortex, the entire brain like
runs on 30 Watts of power orsomething like that.
Um, and this is primarily, Ithink, due to the extreme levels
of sparsity.

Matt (22:48):
I t's l ike a lightbulb.
That's crazy, right?

Subutai (22:50):
Yeah.
Um, so in theory we should beable to get to extremely sparse
structures.
And in the paper we lay out someof the, um, just the level of
non-zero computations that aregoing on compared to a dense
system and there's at least 30to a hundred X a difference in
the number of non-zerocomputations.
Yeah.

(23:11):
The trick will be finding theright hardware architecture and
we're starting to explore that.
It's still too early to reallysay definitively anything right
now, but there are hardwarearchitectures out there that
could exploit sparsity.
Um, and if that happens we thinkwe could get tremendous
computational benefits as well.

Matt (23:27):
But what about software?
Aren't there sparse matrixmultiplication software
libraries?

Subutai (23:31):
Yeah, I think software libraries, you can get some
benefits and we're looking atthat as well.
But to really run largenetworks, you need hardware
acceleration and it's going tobe difficult to run really large
scale stuff just on a, on a,purely on software.

Matt (23:45):
Anybody else in the deep learning world focused on
sparsity?
Is anybody else?
Would they want hardware thatruns sparse calculations?

Subutai (23:53):
I think so.
I think the, there's, there's abunch of people looking into
this already.
Um, so I think that's nice tosee is that particularly over
the last year, I think there'sbeen a resurgence of interest in
sparsity so I think our timingis really good.
No one has really, as far as Iknow, really looked into
robustness with sparsity anddoing both sparse activations

(24:14):
and sparse w eights seems to bereally rare.
U m, but u m, yeah, there aredefinitely a bunch of other labs
looking into this as well.

Matt (24:22):
Good.
We'll be in good company.
So, uh, what's the future ofresearch look like for Numenta
right now?
We're sort of all in, on, ondeep learning at the moment.
Um, do you have further plansafter you go through sparsity?
I know we've talked aboutcontinuous learning in the past.
What else are you working on?

Subutai (24:42):
Yeah, so I would say we're not so much focused on
deep learning as we're focusedon practical machine
intelligence systems.
And currently deep learning isthe best example of that.
Um, so in terms of our researchroadmap, so sparsity is, you
want to start with sparsity, sayincorporating sparsity
everywhere and showing it'srobust and showing its fast.

(25:02):
Those are the first steps.
And then continue adding thisnotion of active dendrites and a
more complex neuron model wouldallow us to think about
continuous learning.
Uh, and then, uh, just as we didwith HTM- going to a temporal
memory like structure where youhave each layer has its own
recurrence, a recurring set ofconnections.

(25:24):
Um, what this will allow you todo is again, just like in our
old temporal memory, we can notonly do continuous learning, but
we can do that in anunsupervised manner.
So

Matt (25:33):
Recurrence meaning connection to itself, right?
So one layer having connectionsto itself.
That's a temporal memory orsequence memory in HTM.

Subutai (25:41):
Exactly.
Yeah.
And the way we did it in HTM Ithink could really apply to deep
learning systems as well, whichis the system is constantly
making predictions about what'sabout to happen.
Um, and then when you get theactual data about what happened,
that serves as an error signal.
And you can immediately dounsupervised learning, uh, on
that, uh, and that can be donein a continuous learning setup

(26:04):
because you're using sparserepresentations in these active
dendrites.
So now all of a sudden you havesomething that doesn't require
as much label training data canreally deal with the streaming
sensory inputs.
Um, and is constantly learningthrough these predictions.

Matt (26:20):
Would we still be having to apply gradient descent over
top of all this as we go?

Subutai (26:25):
Yeah, that's a great question.
Um, you know, in the brain wedon't have a strict
backpropagation-like structureor learning system.
So over time, as our systemsbecome more and more, you know,
critical, I think the, the needfor backpropagation will lessen,
so to do this predictivelearning, uh, if it's happening

(26:46):
at every layer independently,you don't need to do
backpropagation as much there.
And we might still for a whilehave gradients flying through
just because currently there'sno better way to create really,
uh, scalable, uh, systems.
Uh, but over time as we learn,as we incorporate more and more
principles from the brain inthere, hopefully we can, we can

(27:07):
get rid of that too.

Matt (27:08):
That sounds exciting.

Subutai (27:09):
That's when it's not deep learning anymore.

Matt (27:10):
Yeah, I guess not.

Subutai (27:11):
Yeah, exactly.

Matt (27:13):
It's still neural networks.

Subutai (27:14):
It's still neural networks and that's fine.
I mean neural that works aresupposed to model the brain
because that was the wholereason there came into being,
so, yeah.

Matt (27:22):
Yeah.
Well that's great.
Subutai.
Anything else you want to talkabout while you have this
opportunity?

Subutai (27:29):
No, I think it's, I think from the external world,
you know, you kind of said, itseems like, Oh, why are we
jumping into the deep learningbandwagon?
That's really not what we'redoing.
I think we feel very happy aboutwhere we are on the neuroscience
and now we're going and pickingoff all of those pieces,
everything we've done there andstart to implement them in
practical systems.
And I think the timing is rightfor that and it's, uh, I'm

(27:51):
really excited about it becausea lot of potential there.

Matt (27:55):
Well, thanks for your time, Subutai.
We're doing fist bumps now.
Take care.

Subutai (28:00):
All right, take care.

Matt (28:01):
Thanks for watching.

Subutai (28:02):
Bye.

Matt (28:09):
Thanks for listening to the Numenta On Intelligence
podcast.
My name is Matt Taylor.
I am the community manager andengineer for Numenta.
My guest today was our VP ofResearch Subutai Ahmad.
If you liked this content, youshould also check out our
YouTube channel.
I've been live streaming ourresearch meetings and journal
clubs every week.

All Episodes

Episode Transcript

Popular Podcasts

United States of Kennedy

Stuff You Should Know

Dateline NBC

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}Episode 13: Subutai Ahmad on Applying HTM Ideas to Deep Learning

Episode Transcript

Popular Podcasts

.css-r6mb8g{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:1;overflow:hidden;}United States of Kennedy

Stuff You Should Know

Dateline NBC

All Episodes

Episode 13: Subutai Ahmad on Applying HTM Ideas to Deep Learning

United States of Kennedy