Episode 4: Natural Language Understanding with Cortical.io's Francisco Webber

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Speaker 1 (00:09):
You're listening to Numenta On Intelligence, a
monthly podcast about howintelligence works in the brain
and how to implement it innon-biological systems.
I'm Christy Maver and today I'llbe talking with Francisco
Webber, CEO and Co, founder ofCortical.io.
Cortical.io is a strategicpartner of Numenta that
specializes in natural languageunderstanding.
In this episode, Francisco and Italk about the spark that

(00:32):
started it all for him whilewatching a YouTube video of our
co founder, Jeff Hawkins, andhow their approach differs from
other machine learning models.
As a reminder, if you want tokeep up with the latest Numenta
news subscribe to ournewsletter, which you'll find on
our website, numenta.com, andyou can follow us on all things
social at Numenta.
All right, hope you enjoy thisconversation with Francisco

(00:54):
Webber.

Christy (01:00):
Hi, this is Christy Maver and you're listening to
the Numenta On Intelligencepodcast and I have a special
guest here with me today.
Francisco Webber, CEO and Co,founder of cortical Io, which is
one of our strategic partners.
Cortical.io is a biologicallyinspired natural language
understanding company.
So Francisco, thank you so muchfor joining me today.

Francisco (01:23):
Thank you for having me.

Christy (01:25):
Yes, absolutely.
I think natural languageprocessing and natural language
understanding seems to be such ahot area of interest right now.
And you guys are doing somethingreally unique.
So talk to us a little bit aboutCortical.io and what you all are
doing.

Francisco (01:42):
Yeah.
So, uh, in fact, uh, we arefocusing very much in a going
sort of deep into theunderstanding of language, um,
because the situation that we'rein is that we have, of course
this a data explosion ingeneral, uh, and what makes it
worse and worse is that thebalance of the data we need to

(02:03):
work with goes more and morefrom the numerical transactional
part, which we do already handlepretty well with database
technologies and all these kindsof things.
And more and more text databecomes the key asset in many
businesses.
For example, an insurancecompany.

(02:24):
The only foundation of theirbusiness is text.
So it's contracts, it's adescription of things that the
insurer and things like that.
So the whole business isbasically processing text on a
very large scale and by becomingbigger and bigger companies have
more and more customers that endup with huge amounts of text and

(02:47):
the only way out of this for themoment at least is to hire a lot
of people who read the text andthen do something with it.
And that's why this potential ofstreamlining business processes
of getting them really efficientand really quick, very similar
to what happens on thetransactional side, we just need

(03:11):
the tooling for that.
And so the interest in naturallanguage understanding was
growing a lot in the recenttime.
But I can assure you we arestruggling with this already
like 30, 40 years intensively.

Christy (03:27):
So your whole approach is that you're, you're doing
natural language understandingbased on the brain.
Right?
And obviously that's the tie into Numenta.
So what does that mean exactlywhat or how is it different than
what most approaches are doing?

Francisco (03:40):
So this is all about trying to be different in the
first place.
So in my previous, um, sort ofproject and endeavors, we were
trying to apply just the stateof the art in doing this kind of
processing of text information,but it turned out after many

(04:02):
years from me and many, manyyears for, for the photo scene
if you want, um, that byapplying this very same
principle, we can achieve quickresults.
So we have managed to developthings like search engines and
so on.
Uh, but when you look closely,you see that the capabilities of

(04:26):
these engines are like a 10 yearold in general, is able to be
more precise and more specific.
Uh, then you could be with a,with a, with a huge computer
computation cluster.
The interesting thing is thatalthough there was so much
effort in research and so on,the improvement from what you

(04:47):
get easily to how can I improvethis, is so expensive that it
seems that this whole effortwe'll never actually reach
levels that come close to whatyou could expect from a human.

Christy (05:01):
And that's why people just hire many humans.

Francisco (05:04):
Of course.
I mean, there was always thispart of the business, uh,
environment where they try bruteforce methods to just throw in
more money, more power, moreanything.
Um, but in that case it turnsout that this doesn't really
work out.
Then you'll even see very largecompanies struggling basically.

(05:24):
Like Facebook for example,currently struggling with just
doing very basic functions offind me text where someone is
talking badly or hate speech orthings like that.
Yeah.
For a human I could probably askmy six year old son, um, to be a
filter and he would performbetter than any algorithm.

Christy (05:50):
Is that because as humans, we are processing
language based on meaning asopposed to a keyword or--

Francisco (05:58):
Absolutely.
I mean, it's all about themeaning.
That's one thing.
Yeah.
Uh, it's all about how do Icapture the meaning, how do I
represent meaning?
These are all big questions.
I mean, one of my favoritetopics is the representational
problem, is a key problem intrying to get the computers be
more like humans.

(06:20):
On the other hand, uh, in, insort of traditional research
strategy if you don't yetunderstand the system that you
want to study or that youobserve, what you do is you
record behavior of the systemand you try to get as many as
you can and you do a statisticalmodel of what you have seen.

(06:41):
And by having the statisticalmodel you can then play around
with it and try to figure outhow does it actually work
internally.
With a system like language thathas such a huge space that it
covers in terms of content, interms of variations, yeah, this
seems to take forever.

(07:02):
Yeah.
Uh, and so, uh, the question wasfor us, how could we improve
things substantially?
And the only way to, to look fora solution was to drop this
whole statistical modeling inthe first place.
But of course, what are yougoing to do?

Christy (07:22):
Then what do you do?

Francisco (07:22):
And, uh, so I, I keep saying the only known reference
implementation for a wellfunctioning language engine is
the human brain.
So the answer has to be there.
Uh, and the brain is not aninfinite structure.
So it's-- the problem can't beinfinite in the sense.

(07:46):
Um, and uh, the fact that mypersonal training is a, was in
medical school in Vienna, madethis sort of natural connection
of trying to see what hasnature, uh, tried to overcome
this.
Um, and that was also a when,when I first crossed the work

(08:06):
of, uh, of Jeff.
It was a YouTube, a YouTubevideo of a talk he gave at
Almaden, I think it was, and itwas, in fact when talking about
sparse distributedrepresentations came up.
That was the moment that wassort of the spark for me, uh, to
see the connection between whathe's doing and what could be
useful in terms of textprocessing.

Christy (08:27):
So can you talk a little bit about sparse
distributed representationsspecifically?
So some of our listeners mightknow that's a, that's a
fundamental concept in, in whatwe're doing.
And it's all about how the brainrepresents information.
Right?
So it sounds like that was kindof the launching point for you
too.
So can you, can you talk aboutsparse distributed
representations from your pointof view and what it means in

(08:49):
language?

Francisco (08:50):
You can start by imagining a, a very simple
situation.
You see a cat and you have a lotof contextual information, uh,
so you know, what kinds ofsounds a cat makes you know, how
it feels when you touched it forand so on, and all of that is
triggered in the moment when yousee a cat or when you hear a

(09:14):
cat, for example.
You have this a 360 degreeexperience, or view of the cat.
Even if the, you don't see, youjust hear it.
Uh, it might not look exactlythe same, uh, but, uh, there is
very little that the cat coulddo that would surprise you.
You're prepared for a lot ofaspects that are related to

(09:35):
pets.

Christy (09:36):
It wouldn't bark, it wouldn't...

Francisco (09:37):
Yeah, exactly.
So very obviously the way how a,a cat is represented in your
brain is basically a sum of allthese experiences.
And in the end it's also a,there couldn't be anything else.
I mean, everything your brainhas is just recorded

(09:57):
experiences.
So whatever, uh, the brain wantsto represent has to
fundamentally be built out ofcomponents of these experiences
which typically come from oursenses and that's how we sort of
naturally, uh, experience.

Christy (10:11):
In moving through the world.

Francisco (10:11):
Yes.
So basically what happens isthat when you see a cat, you
hear a cat or in language, youread the word cat.
Yeah.
Then all these experiences ofseeing, hearing, feeling,
smelling, all the senses thatare involved, um, uh, are evoked

(10:32):
simultaneously.
And this simultaneous, evocationof all these stored parts shows
that the definition of how catis represented is a distribution
of many little events aroundcats.

Christy (10:48):
Right and there's meaning embedded in each of
those, as opposed to the brainsaying, oh, cat equals 0, 1, 0,
1, 1...

Francisco (10:56):
So, uh, when you hear, see, read the word cat,
uh, everything in your brainbasically triggers all the
memories that are related tocats.
This representation keeps usingyour whole brain.
So it's in fact, at the firstlevel, it's your sensory input.
So you have images, mentalimages that come up or mental

(11:17):
sounds, but it can also besecondary.
It can, you can be rememberingyour mother talking about the
cat in your childhood.
It could, uh, be sort of a, ahierarchically stacked set of
consequences that you'veexperienced around cats.
So basically it uses all thedifferent levels that are

(11:39):
available in the brain.
So basically it uses the wholecortex, uh, to represent
everything.
And this idea of using a vastspace as a whole for basically
every information that's there,that's what, what this sparse
distributed representationbasically defines.

(12:01):
So when we compare that to acomputer, for example, uh, if
you want to store moreinformation, you need to build
up a, a long sequence of memorycells.
And in each memory cell you put,uh, some data.
If you have more data, you haveto extend the number of memories
you have.

(12:21):
If you store in a memory thatdoesn't get bigger, where you
always use the same space, youjust change the pattern in which
you store things.
Uh, then it's very efficient.
It's a very simple principle,uh, but to think it through,
it's, it's astonishing how manycomplex aspects you will
discover if you consider that tobe the actual processing.

(12:45):
And the other thing is ofcourse, that this distributed
representation also needs to besparse so that you can
differentiate between, uh, manydifferent, uh, information
states if you want.
And, uh, we have found out in,in, in mathematically I would
say, that there is arelationship between how much

(13:08):
information you can store inthis kind of memory and the
degree of sparsity that youhave.
And uh, if the sparsity is notenough then you keep losing a
lot of information constantly,and if it's, if it's sort of
lower, you have more and morespace to find different
combinations.

Christy (13:25):
Which is exactly how the brain processes information.
At any given time, very fewneurons are actually firing.

Francisco (13:32):
And this distributed representation, just to give you
an example, um, allows you to dosomething that in
straightforward logic is notpossible.
Uh, it allows you to have aperception of something that two
people have, which is per sedifferent, which is this old

(13:52):
problem.
We all are different, but we, weare very similar and the same
time know also how do you storesomething that is different and
similar in the same time?
Uh, so you have seen your cats,you have heard your cats and so
on.
And I have seen my cats, butstill we can agree on what, for
example, the word cat actuallymeans even if we have different

(14:15):
mental, actual mentalrepresentations.
The trick is simply that becausewe are very similar, we have
very similar bodies with twoarms, two legs,

Christy (14:26):
Same senses...

Francisco (14:27):
Same senses and so on, there are still many
experiences that we actuallyshare and as long as your
representation and myrepresentation have sufficient
overlap, we can agree on what acat is.
That's the simple principle thatit's not about being equal in

(14:49):
the representation, but byhaving sufficient overlap.
So what you need is a mechanismthat makes overlap the
determinant for what is actuallystored there.

Christy (15:00):
So your representation of cat might involve a memory of
your mom?
Mine might not, but that doesn'tmatter that those two pieces are
different because the coreconcepts overlap.

Francisco (15:10):
Exactly, exactly.
We have sufficient overlap.
It's interesting because whenyou have a sparse
representation, that overlap canbe in absolute counting, pretty
small or astonishingly small,and it can still trigger sort of
a common experiences and theability to exchange thoughts on

(15:31):
this topic.

Christy (15:32):
So you basically have have come up with a way to
represent language where you'rerepresenting it based on
meaning.

Francisco (15:40):
Exactly.
So what we did is basically wedid a reverse processing, so
obviously we humans interactwith each other by having our
cortices to exchangeinformation.
And in order to do this we havelearned a way of encoding the

(16:00):
state of the cortex and theencoding is called language.
Yeah.
So that's basically likecomputers have an encoding to
communicate over the Internet.
We have a, a sort of networklayer that is carried by
language and the way how thisencoding happens is of course

(16:21):
intrinsically related to how thebrain actually works.
If your brain works ontransistors, then you're
encoding has to be based onsomething that transistors can
do.
And the same for us.
We can only encode using amethod that our transistors, if
you want, can actually do.

(16:42):
And that's precisely what comesout of the work of Jeff.
He described the sparsedistributed representation, for
me at least, as a set ofconstraints.
That was basically the startingpoint and we went back from the
language and we said, okay,which must have been the
processing step to end up withthis representation.

(17:05):
Um, and that is basically whatsemantic folding describes.

Christy (17:09):
And semantic folding is really what Cortical.io has, has
come up with.

Francisco (17:14):
Yeah, it's basically using a lot of concepts that we
know in computer science alreadyfor quite some time, but it uses
them in a new combination or anew setup.
So we haven't invented new, uh,I don't know transistors, but we

(17:34):
are using transistors in adifferent way than we have been
using so far.

Christy (17:39):
One of the examples that I, that I like and you, you
actually have a lot of greatdemos on your website where
people can go in and play with.

Francisco (17:47):
Yes, to experience what that actually means.

Christy (17:49):
Yes, because essentially you're, you're
making language computable,right?
Based on meaning.
So, um, so one of the examplesthat I like is if, if I say the
word apple to you, especiallysitting here in Silicon Valley,
you don't know if I'm talkingabout the fruit or the company
down the street, right?
And the brain can hold both ofthose representations at the

(18:09):
same time, which means all ofthe things that I know about
apple, the fruit and all of thethings I know about Apple, the
company are firing.
But then if I say apple and Ipull out a red delicious, then
your Steve Jobs neuron is notgoing to fire.

Francisco (18:26):
Language is the way it is because of our brains.
So were our brains a bitdifferent, the language
structure would be different.
And for example, one of thereasons why nearly every
language has something like asentence is basically to create
a context for each word in thesentence, to allow you to

(18:47):
effortlessly disambiguate.
If, for example, I use the wordsmell together with the word
apple, uh, within, uh, afraction of a second, any Steve
Jobs thinking would go awaybecause that is not probable to
be related.
It's much more probable that theapple falls from the tree has
something to do with smell.

Christy (19:08):
Yes.

Francisco (19:08):
Uh, it's even, uh, on a, on a sort of a linguistic
level, you can observe similarphenomenons.
Uh, again, with the exampleapples, apple, uh, if you make
the plural, if you put an s inthe end, immediately, there is
no computer coming up in yourmind.

Christy (19:25):
There's no company.

Francisco (19:27):
So it just to say that for example, to, to, to
show you the difference to thestandard approach, uh, with
statistical natural languageprocessing, because you are
confronted with this hugecombinatorial space, you have to
make your model simpler becauseotherwise the computer never
ends computing.

(19:47):
With statistical approaches wevery often discard that
information.
So we say in order not to end upwith too many words, let's cut
down all the apple, apple's andso on to some common root, and
then you lose that information.
And by losing the distinction ofplural or singular, it makes the

(20:10):
disambiguation even harder.
Uh, and that is the reason whyyou have this quality ceiling in
a statistical model because youjust throw away a lot of
important information in orderto make it computable.
But if you would find a way, uh,that actually uses all these

(20:31):
aspects and makes themcomputable, then we could try
and go further.
And, and SDRs, so sparsedistributed representations, are
actually the solution to this.
Sparse distributedrepresentation, especially for a
German speaker, is tough topronounce.
Um, uh, we, uh, decided to callthis a semantic fingerprint.

(20:54):
It's also sort of, it also, Ithink better describes on how we
used it.
So depending on where you are inthis, in this area, that the
semantic fingerprint covers, uh,the bits that you find there
stand for different contexts,semantically, different
contexts.
So you might have all the sortof a different sounds the cat

(21:18):
makes in one area of thefingerprint, uh, the different,
uh, types of fur that the catcould have on another area and
looking at the whole gives youthe representation of what cat
means or what cat could meanfor.

Christy (21:32):
Okay.
And we'll, we'll put links inthe show notes, but just to give
a visual for people that arelistening when a semantic
fingerprint is essentially amatrix of how many, how many
cells?

Francisco (21:43):
So we currently, uh, so in nature that must be huge,
millions by millions.
Um, uh, in our technicalimplementation, we have found
out that, uh, an extent of about128 x 128 possible positions,
uh, is a very useful sort ofsize.

(22:04):
But theoretically you couldchoose any.
It's a deliberate choice on howbig you want to have it.
Okay.
Uh, so we use for standardlanguage, this 128 x 128 gives
you 16,000 bits, every bit beinga feature which means a semantic
context.
So at every location of my 2Dextent, it's like a little

(22:27):
square with those 16,000 dots init.
Um, every position stands fordifferent topics and the
arrangement, and that is crucialfor being practically useful, is
generated automatically withoutneeding any human input in the
first place.
So it's unsupervised basically.

Christy (22:48):
Okay.
So I could have a semanticfingerprint for the word cat
where each each bit representsmy experience of a cat or I
could have a fingerprintrepresenting a document where
each bit represents topics ofthe document is or it could be a
sentence, right?

Francisco (23:07):
Yes.
And, and an important point is,and that's also a sort of a
differentiator to, to the stateof the art, uh, in order to
generate this topology and thosecontexts, we don't use the data
that we actually want to processbecause the big problem in AI is
that we always hit a ceiling,um, that is typically called,"to

(23:29):
solve this, we need worldknowledge." So in order to solve
certain things, especially ifyou come close to what humans
do, uh, there has to be someprior.
And the big question was alwaysokay, how to bring this prior in
to use it to describe what theactual data shows.

(23:50):
So what we humans do is we go toschool, we go to university, if
I want to become a medicaldoctor, a I read books and I
listened to medical doctorsspeak and I learn the language
and the thinking based on thelanguage for that domain.

(24:11):
And once I have done this, thenI can work in the hospital and I
can start reading patientrecords and understanding what
is meant there.
But it's not by reading 300,000patient records that I become a
doctor.
And in order to bring in thisprior world knowledge that is
needed to have a, what we call asemantically grounded

(24:31):
information, we have to bring intraining material that is not
actually the material I want towork on.
Um, and this might soundcomplicated, but in fact, it
makes it easier becausereference material is something
we systematically produced ashuman civilization since very

(24:55):
long time for one reason,because we need to teach the
next generation the findingsthat we have done so far.
And this has been refined to thedegree that you nowadays have, I
don't know, a couple of hundred,probably key publishers who are,
whose job it is to gather thisreference information to build

(25:19):
it up in an adequate way.
So interestingly, it has to bebuilt up in a way so that humans
have it so that it becomes easyfor humans to build up that map
in their brain about that topic.
And if you look carefully, yousee that an author who tries to
write a textbook, carefullystructures the text.

(25:40):
He makes titles, paragraphsections, uh, he puts certain
words, bold or italic.
And so in all these aspects arein fact an encoding mechanism
for capturing this wholeontological systematics that you
find in basically every domain.
Okay?
And when we try to extract this,we say one context is a sentence

(26:04):
because a sentence stands for acertain fact, but it's also a
sentence that appears under acertain title.
So we add the title to thesentence and the title, uh, is
under a certain section that isalso titled.
So we take the whole sequence oftitles with the actual sentence

(26:25):
and all of that becomes onespecific context and the context
stands for all the words thatare actually involved.
And so there you see that, uh,the whole treatment of the
training data already tries tobe intelligent instead of having
sort of the brute force approachwhere we say we don't care what

(26:48):
the actual structure of the datais, if it's, if it isn't enough,
we just take more of it, whichis the statistical approach.

Christy (26:55):
Yes.

Francisco (26:55):
But it shows that by just trying to be a little
smart, you can be so much moreefficient that for many cases,
uh, we could, uh, create asemantic maps in domains where
there hasn't been any machinelearning so far because there
isn't enough material.

Christy (27:13):
Wow.
So I want you to talk about someof the use cases that you're,
you're, engaged on.

Francisco (27:18):
So fundamentally what every solution does that we
create for our customers, isfirst we define the semantic
space that the customer worksin.
So if the customer, for example,happens to be a, a, a bank.
So we have a couple of, ofbanks, customers, um, their

(27:39):
professional language is let'ssay a English investment
bankerish, let's call it.
So we try to find typicalliterature that people who end
up in that position learn.
So we take financial textbooks,uh, textbooks about the
investments and so on, uh, andby ingesting that, we define the

(28:02):
kind of language that thesedocuments, for example, are
written in.
And based on that semanticspace, which we call a, it's a
bit ambiguous, but we call itRetina.

Christy (28:14):
Retina

Francisco (28:14):
This is historical because we always say that it is
like looking at the words, um,and this specific Retina is now
used to convert any givendocument in that domain into a
fingerprint, into this sparsedistributed representation.
And the goal of it is that if Ihave any two fingerprints, I can

(28:39):
measure how similar the text isthat they were generated from.
So let's imagine you have aphrase that says done deal, um,
and you have another phrase thatsays, signed contract.
So any average banking personwould say, okay, these are
similar.

Christy (28:57):
These mean the same thing.

Francisco (28:58):
They don't use the same words, right?
Um, and our Retina in factconverts both of them into
fingerprints where you have anoverlap of let's say 35 percent,
which is typically saying theyare very similar.
Okay?
And the nice thing from the factthat we have this thing, this
two dimensional fingerprintrepresentation, is that we not

(29:20):
only can count how many commonbits, how many bits are set at
the same position in both of thefinger fingerprints, but also
where is this overlap?
Because I do have this topologyso I can inspect it.
And I can find that there is aregion where the two overlap and
the context behind it are aboutinvestment, money transfers,

(29:45):
legal aspects and things likethat which are typical for the
context of"done deal" and"signedcontract." And the interesting
thing is that making two piecesof texts comparable is the
atomic function out of which youcan build everything merely in
language.
So if I can compare twodocuments, for example, I can

(30:09):
start searching through documentcollections because if I have
100 million contracts, I want tomake them searchable.
I just convert each of thecontracts into a semantic
fingerprint.
I allow you to type in what youare interested in, literally
typing in a, I would beinterested in contracts that are
about whatever and then make afingerprint of what your

(30:32):
description of what you arelooking for.
Then I just need to see which ofthe documents has the biggest
overlap with the fingerprint ofthe query of the user.

Christy (30:41):
Rather than having X many humans either comb through
text

Francisco (30:46):
Read or by matching what we do now in search
engines, by matching thekeywords.
What words have you been usingfor your query and what other
documents where these wordsappear?

Christy (30:57):
But in the example you just gave, the words didn't
overlap at all, so that wouldn'thelp.

Francisco (31:00):
So that's precisely the problem.
So if the words that you usedfor your career appearing the
text, yes, chances are high thatthe text is relevant to you, but
that doesn't mean that onlythese texts are relevant to you.
There might be much morerelevant documents, uh, but they
just don't use the same wording.
This problem is in fact a veryfrequent because very often you

(31:24):
have lesser skilled peoplesearch for information that was
written by more skilled people.
So there is an incompatibilityof the language.
So you are missing a lot ofinformation by default.
That's a, a sort of typicalbuilding block in our solutions

(31:44):
is the ability to do semanticsearch.

Speaker 2 (31:47):
And a semantic search in general can be defined as a
mechanism that allows you tofind information or documents,
for example, that do notnecessarily contain a word from
the query.
And this mechanism itself canagain be, sort of used in many

(32:08):
different environments.
So, uh, banks might want tosearch through large collections
of contracts.
Um, pharmaceutical companiesmight want to search through
collections of scientificpublications for doing
exploration, finding newmolecules, or car producers

(32:31):
might want to index their carmanuals to allow you while you
sit in the car to find out whatthat light means or how that
function could be.
I mean, a modern car so complexthat you end up having a
handbook of like 1,500 pages.
And we have actually made a usecase with a car producer in

(32:54):
Germany and there, the case ofthe incompatibility of language
is very evident because car useris an average person and the car
manual was written by expertsabout cars.
And so we created a system thatwas trained on automotive

(33:14):
engineering concepts as well asa chat about cars by private
persons, by aficionados, sortof.
Yeah.
And we use both, um, um,language components and mixed
them in one system.
And in the end, you could searchthrough the handbook by, for

(33:36):
example, typing in"where's thedonut?" and it would put you on
the page where the spare wheelis actually packed in the car,
basically to demonstrate, uh,the meaning of a word that in
principle has nothing to do withcars, for people who know about
cars, could be mapped to theactual term and concepts that

(33:57):
you want.
So, yeah, this is a typical

Christy (34:03):
So lots of different use cases across different
industries.

Francisco (34:04):
That is in, in, in business terms, this is one of
the challenges, in fact, uh,that that's a technology that is
so horizontal.

Christy (34:15):
Well every business has text.

Francisco (34:18):
And that is also, again, because, uh, we are used
to always differentiate us froma, a state of the art, uh,
approaches.
So a typical machine learningapproach in solving a business
problem is to try and, and getmore and more and more and more
specific until the brute forcemodel does a good job.

(34:42):
Uh, so you can use machinelearning to solve the business
problel of discriminatingbetween cats and dogs.
Uh, but if you want a foranother customer to discriminate
between Lions and tigers, uh,you need to rebuild the whole
model and create the wholeeffort once again, and if others

(35:04):
want to have the differencebetween Indian and African
elephants, it's yet anothermodel again.
In our approach, once we havegenerated the semantic space,
let's say about a investmentbankerish, nearly all the use
cases that we find in the domaincan use this, uh, this semantic

(35:27):
space.
So for the customer, it's a veryefficient way of consolidating
systems so they can start, forexample, to build a semantic
space about the products thatthey sell a for the marketing
department.
So they, uh, in order torepresent marketing information

(35:47):
properly and so on.
Uh, and with the same semanticspace, uh, they could then go
and start a support system tohelp the customer, the customer
care in the, in the call centerto find the support documents
easier because it's the samelanguage, it's the same
products.
Um, and, uh, and, and that issort of one of the business

(36:10):
advantages of our approach isthat you can build up a, a
system that reuses everythingthat has been built so far, even
if it's in a completelydifferent use case.

Christy (36:20):
I should mention Cortical.io has been a strategic
partner of Numenta's for three,three plus years now.
And it's really great to see theevolution of how you're applying
this in so many different ways.

Francisco (36:33):
We are still in its very infancy, uh, because, uh,
what I actually want to do is toget to the point that we not
only capture the semantics ofthings by using fingerprints,
but that we also start to learnabout grammar of things, which
is defined by the sequence ofthose fingerprints went, uh,

(36:57):
when I, uh, uh, take the wordsequence, immediately one thinks
of course of the sequencelearner that, that we have with
the HTM system.
So the idea is to take the nextstep and not only to take, um,
the lexical semantics.
That's the level on which wework now.
And it's astonishing how muchyou can do by just properly

(37:21):
understanding this, uh, but thenext level would be to take the
sequence into account and thenwe can also handle language
aspects that purely depend ongrammar.
Uh, so I do think that byapplying HTMs, uh, on, uh, our
semantic fingerprints, there isa lot of potential to, uh, get

(37:43):
things like machine translation,like speech to text conversion.
All of these things I think canbe substantially brought to the
next level, but there's a lot ofresearch still to go.

Christy (37:55):
Right.
So the story continues.

Francisco (37:57):
Absolutely.

Christy (37:58):
So where can people go if they want to find more?
We'll put a link to yourwebsite.

Francisco (38:04):
Yeah.
So we try to have a lot ofcontent on our website.
Um, so there is a lot to readthere.
There are references, there arelectures from people who speak
about important aspects besidesNumenta people of course,
interactive demos where you canactually play with words, um,

(38:26):
and, and get a feeling of whatit means.

Christy (38:29):
I've spent quite a bit of time on those demos.

Francisco (38:33):
We do also have, uh, the, the functionality as it is
right now, accessible on apublic API, so if you want to
have programmatic access to thefunctionality, you can request
a, a key to use our API for freeand you can also do more complex
programmatic experiments.

(38:54):
And we regularly get email andcommunication from people when
they manage to solve trickyproblems using our standard,
even not specifically trained,uh, um, a semantic space that we
offer there.

Christy (39:11):
Nice.
And you're based in Vienna,Austria, but you also have
offices here in San Franciscoand in New York,

Speaker 2 (39:18):
New York.
Yeah.
So currently besides Europe, theUS is of course our biggest
market.
That's the reason why we havestarted the offices here.
Um, and that's also what makesme travel a lot as you can
imagine.

Christy (39:35):
Well, we're always happy to see you and I'm
particularly happy that youstopped by today.
So thank you so much for yourtime, Francisco.

Francisco (39:42):
Thank you for having me here.

Christy (39:44):
And thanks for listening.

All Episodes

Episode Transcript

Popular Podcasts

Stuff You Should Know

Las Culturistas with Matt Rogers and Bowen Yang

Crime Junkie

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}Episode 4: Natural Language Understanding with Cortical.io's Francisco Webber

Episode Transcript

Popular Podcasts

.css-r6mb8g{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:1;overflow:hidden;}Stuff You Should Know

Las Culturistas with Matt Rogers and Bowen Yang

Crime Junkie

All Episodes

Episode 4: Natural Language Understanding with Cortical.io's Francisco Webber

Stuff You Should Know