Bayesian Inference: The Foundation of Data Science

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Jason (00:00):
Hello, and welcome to the DataCafe. I'm Jason.

Jeremy (00:03):
And I'm Jeremy. And today we're talking about
Bayesian inference.

Jason (00:17):
Bayesian inference based on Thomas Bayes. I didn't
mention to you, Jeremy, were youwith us when we visited his
grave? In London?

Jeremy (00:25):
Yes, yes. In a local London cemetery.

Jason (00:27):
Yes, Bunhill Fields.

Jeremy (00:28):
That was that was a real point of pilgrimage almost for
any any data scientist.

Jason (00:34):
Yeah, it's an amazing cemetery actually just to visit
as a tourist. And he is theperson who came up with Bayesian
statistics as a really cool areaof statistical inference. So
what is Bayesian inference?

Jeremy (00:47):
Bayesian inference is, I think, one of the sort of go to
approaches as a data scientist.
And it really sort of reflectsin the the ethos, almost the
philosophy of data science in avery simple and easily
understandable theorem andapproach. So basically, the

(01:07):
inference, it really starts witha model or hypotheses about a
particular data set, and thenallows you to update that model
as more data comes in. Sobasically, you've got this
lovely scenario, it's almostwell beyond its time of almost a
streaming data set, where you'vegot, you've got streaming data

(01:30):
coming into your model, as thedata comes in, you take each
batch or each data point, andyou change your model in
response to that data and updateyour model to be hopefully more
realistic, more relevant to thedata that you're actually
actually seeing. So it's thiswhole concept of sort of model
and updating in response to thedata you're seeing. So really,

(01:54):
super relevant to data science,I think.

Jason (01:59):
And crucially, depends on the probabilistic modelling that
we're bringing in here. Sowhat's the probability of
something happening, let's say,and if I have another data point
that indicates it's likely tohappen, I'm going to come up
with a higher probability thatyeah, it's more likely to
happen, because I've now learnedan additional piece of
information that makes me changemy mind, or if it's a model

(02:23):
makes the model head towards theactual truth.

Jeremy (02:26):
Yeah, all these models that we've been talking about
are statistical, they're formedaround probabilities. And as a
result, the probability ofseeing a particular outcome or
output from your model is justthat, you know, it's between
zero and one, it gives you alevel of confidence, maybe that
you're, you're seeing somethingthat is very likely or, or is

(02:51):
very unlikely to happen. Again,it typifies the sort of the data
scientist experience, which isthat these things are rarely
given in true and false. On Offstate, they are more often than
not outputs that are that areprobabilistic that are, you
know, have a have a measure ofuncertainty about them.

Jason (03:10):
Yeah, as every model does, it's never going to be a
perfect representation of whatcan happen in the world. But you
want to have a high confidencein the model, and you want to
have an answer that gives you ahigh probability of something
happening, and then you canreact to that. And I guess if
your model is flipping a coin,it kind of doesn't matter.

(03:31):
Having a model at all, whatdifference does it make if your
coin has come up heads or tails,pick an up outcome.

Jeremy (03:39):
But your model there might be that it's that it's
unbiased initially. So it'sequal and therefore really
simple. But, you know, if, ifthe data starts to come in, you
know, we've all had situationswhere you start flipping a coin
you go, hang on a minute, I'vehad 10 heads in a row. what's
what's what's up with this coin?
Maybe it's not, maybe it's not afair coin. Maybe it's a biased
coin.

Jason (04:00):
Yeah. And this is where it gets really confusing,
actually. Because at any oneinstance of flipping the coin,
if it's not a biassed coin, thenyour chance of picking heads or
tails doesn't change, right? Youknow, just because you've had
nine out of 10 heads, and you'regoing to flip it that 10th time,
you don't have a higherlikelihood of it being heads if
it's an unbiased coin, but it'sreally confusing because what I

(04:25):
learned a base when I firstpicked up the book on
probability was the Monty Hallproblem. That if people have
heard of it, it's a really kindof fun anecdote base, and how
probabilities can affect yourdecision making. Have you heard
of it?

Jeremy (04:41):
It's all goats and cars to me?

Jason (04:42):
Yeah. I think the premise is you're in the game show and
there's three doors presented toyou and behind one of the doors
is the winning car as the prizeright and the other two doors is
a goat! You're going to betravelling home on the goat.
Okay. They asked you to pick adoor and you pick a door, they
on't open the door. But theyay hold up, we're going to op

(05:05):
n one of the other doors. Oka, and show you how you picke
. Let's say you pick door numbr one, how do you pick doo
? Number two? It turns out tht would have been a goat. Now,
o you want to stick with yor original choice of door? N
mber one? or change your mind?
nd pick door

Jeremy (05:19):
Okay. And everyone always says, You shouldn't
change, you should just whywould it? Why would it matter?
That's what I think I think thatwould be the popular decision,
in this wouldn't it?

Jason (05:28):
Why? Why would it matter?
You just now have two doorsinstead of three. So attitude or
is 50:50 now or race instead ofthree chances? But no, it's
really interesting, because youstill have the information that
you had when there was threedoors. But because you picked
one. And they've now shown youone of the others was a goat.
The probability of the otherdoor being a car is actually now

(05:50):
two thirds instead of the onethird.

Jeremy (05:55):
So you've had to update your model, basically, in the
presence of a data point, whichhas come in which was one of the
doors being openned, exactly,and shown to be goat, right?

Jason (06:04):
Yeah. And it's really counterintuitive. But when you
work it out, if you drawsituations and scenarios on
paper, you can see why that'sthe case. But it's really
telling, and just as a way toexplain how important retaining
your new information is, whenyou go forward with your next
decision.

Jeremy (06:24):
Yeah, that's really interesting. I think it feeds
into a lot of what we've talkedabout previously, which is your
data. So it should be used fordecisions, it should be used for
actual outcomes, to drive, youknow, an event an impact, which
you want to have, as a result ofa result of the data science. So

(06:44):
in that sense, it's really,really helpful.

Jason (06:46):
Yeah, and the mathematics works as we see, whereas this
applies in the world of datascience. And one of the examples
we talked about with spam, emaildetection.

Jeremy (06:55):
Yeah, there's a there's a lot of companies doing this
sort of, streamed work, really,I mean, email you can think of
as a stream into anorganisation. And it turns out
that there's a lot of nefariousactors out there who are trying
to get people to take, you know,maybe bad decisions or poor
decisions based on the emailthat they're getting. I mean,

(07:18):
this is no surprise to anybodywho gets a tonne of email, like
you and I do. You know, it usedto be financial scheme or, or
other. But, you know, it'sbecome more nuanced. There's,
there's different categories ofmalicious email, there's sort of
things called spear phishingattacks, where someone pretends
to be your boss and says, Oh,I'm locked out of the office,

(07:41):
and I can't get to my computer,would you mind sending me that
key report, you know, thatsensitive piece of data that
I've asked for that I know, theemployment record of a colleague
or something, would you mindsending that to me, then I'd be
really grateful. And if theyfake the email address in a sort
of sufficiently clever ways,sometimes even spoof it to the
email server, then it can lookreally, really authentic. And

(08:06):
they can get quite a lot ofinformation out of some poor,
unsuspecting employee in thecompany. And then, you know,
obviously bad things happen as aresult. Yeah. So the outcome
there is, can we use ourBayesian tools to update our
belief about a given email andsay, well, I've seen these
features in this email. And as aresult, I now believe that this

(08:31):
may be a spear phishing attack,or it may be some other type of
spam that I'm trying to prevent.
And maybe I ask my users, youknow, occasionally, can you just
tell me whether that was a goodemail or a bad email? Can you
tell me whether it's a spam ornot? And then I'm starting to
get data in the form of newemails, I'm starting to get
corroborating classificationfrom my users that's between

(08:52):
those datasets. I'm starting toupdate my belief model about
what an actual spam email lookslike in the context of trying to
prevent this sort of thing.

Jason (09:03):
So every time I hear report junk, I'm actually
labelling a data point.

Jeremy (09:07):
Yeah,

Jason (09:08):
Yeah,

Jeremy (09:08):
You're doing. You're doing everyone a really good
service there Jason.

Jason (09:12):
Yeah. And we've seen this as well in medical testing. So
what's really kind of innocuousabout the spam email example is,
it doesn't really matter if aspam gets through to my inbox, I
probably recognise it as a humanas something that's a phishing
attempt, you know, maybe thelikelihood of me sending

(09:36):
somebody my credit card is lowif I'm questioning if I'm a
cautious user of my emails, butin the medical realm, we kind of
have the same setup, but you canrun a test for a medical
process, but you can also thensee whether you have what's
called false positives or falsenegatives in your results. And

(09:59):
there can be a more importantside effect there.

Jeremy (10:03):
Absolutely. And I think introduces a number of really
useful concepts from, again, thedata science, statistical
perspective and way of thinkingabout problems. I mean, you
know, we're all, you know,getting very familiar with types
of COVID tests at the moment.
And the implications, obviously,of having a positive or a

(10:24):
negative test in any of thesesort of regimes are quite
serious, they are quiteimpacting on an individual. So
you know, it really does matteryou there are, in the UK, at the
moment, there's a programme toroll out a type of COVID test
called lateral flow test to allof the secondary schools that

(10:45):
are going to be then puttingthese these tests in place, and
that, you know, weekly or biweekly basis. And this is a
test, which has a good truepositive rate. But it also has
quite a high false positive rateif you especially if you're
asymptomatic, apparently.

Jason (11:04):
So I wanted to set that up a bit, actually, because what
you're about to get into isalways really potentially
confusing, because when we thinkof a medical test, there's
supposed to be only two answers,which is either a test positive
or negative, but kind of there'sat least four answers four main
answers, because it depends alsoon me whether I have the

(11:29):
condition or not. So you aretesting me as somebody who
doesn't have COVID. And I caneither have a positive or
negative result. But then if youtest somebody who does have
COVID, either positive ornegative results, so you get
your different combination ofpossible results. And that's
where the rates come in. Withregards to specificity or

(11:49):
sensitivity of your test, Ialways find these confusing in
my classification matrix. But ifwe have somebody who has COVID,
and has a positive test result,then that's a true positive. And
that's a high sensitivity,hopefully. Yep. But you're
talking about this test thatmaybe doesn't have a high
sensitivity.

Jeremy (12:10):
Yeah, it's, I think it's something like I read on the
government website. It'ssomething like 75% sensitive in
the case where you have justanybody with COVID. So you don't
know if they've got very highpropensity to infect a very high
viral load or not. So it'sbetter in the good news is is
much better. If people have ahigh viral load, and they're,

(12:32):
they're infecting loads of otherpeople, then you find out very
quickly, which you sort ofhope...

Jason (12:35):
It's easier to detect.

Jeremy (12:36):
Yeah, right. Right, exactly. And so you're
absolutely right, you've comeacross this concept of the
rather aptly named sort ofconfusion matrix very quickly,
which is these four states thatare things that you just
outlined there of, you know,true positives and false
positives. And you would have tosort of take a step back and
hang on a minute, which onesthat every time every time,

Jason (12:54):
Right now on the podcast.
After practising, I'm stilldoing my head.

Jeremy (13:00):
Yeah, me too. So for these particular tests, these
lateral flow tests, you've gotthis true positive rate. But
obviously, if you hit a falsepositive, that means that
someone's going to bepotentially isolating for you
know, 10 days, two weeks,something like that, that's
quite a high life impact forwhat is an error in the test, I
suppose you could say, therecommendation in some

(13:20):
countries, and in somesituations is that you should go
Okay, so you take lateral flowtest, which is nice and easy,
you can do it at home comes upin answer in 20 minutes, half an
hour or something. And then ifyou come up with a positive, you
should then go on to one of themore precise tests something
with a higher sensitivity. Sothat would be the PCR test, I
think that has a better betteroutcome, but is more expensive,

(13:41):
and it takes longer to get theresult. So there's a lot of what
you find in these testingsituations is there's a lot of
context around the test that youhave to take into account. And
you have to you have to becomeepidemiological expert, or
mostly in the test to really getto the bottom of this. And in
the context of a school runningthe sequence of tests are having

(14:02):
all of these test results comein, because you're supposed to
report whatever the result istrue or false. Once you've taken
them, you want to know, right?
I've got a model for my currentbelief that we have a COVID
outbreak in my school, thatwould be really important to
know, given this this stream ofdata that's constantly being
updated. And that's where Ithink this Bayesian approach and
mindset can be super helpful.

Jason (14:24):
Yeah, and more than that with Bayesian and thinking we
could have more information tobring in about who it is you're
testing. I was listening to areport a beta recently saying
that there had been some linksto obesity, but they don't want
that to overtake the reallyimportant one, which is age. And

(14:46):
if you're in certain agebrackets, that's where you're
more likely to have a badreaction to COVID.

Jeremy (14:52):
Yeah, so they've noticed these correspondences, that's
very much a probabilisticassociation of if you have COVID
and you have one of thesefeatures if you like, if you if
you are older or you are...

Jason (15:07):
Age isn't a condition, right, is gonna say,

Jeremy (15:11):
Not a lot you can do about that as I as I know, then
then you know, you have a higherprobability of it turns out not
so good for you. That's a, Ithink, quite a useful way of
thinking about this be quitegood actually to talk about the
sort of the actors in thisformula in this way of
constructing this the Bayesianworld in the context, maybe of

(15:32):
your COVID testing. So I mean,you've got the notion then of,
and this is why it's soimportant, I think, to data
science, you have the notion ofI have a model, I have a belief
about whether I have thisinfection personally, or in my
school. And I have lots of datathat is coming in on a daily
basis that I'm using to inform.
So the outcome of that processis what we're interested in,

(15:53):
it's what is my update to mycurrent understanding of whether
I have COVID. So that's calledmy posterior distribution.
That's my posterior. So in thatcontext, and that's, that's made
up of a few other actors inthis, which is the likelihood
that I have COVID, at all. Sothat would be the probability of

(16:15):
having that set of data that Ihave that set of tests, given
the belief that I have inwhether I have COVID, or whether
my school has COVID. So that'swhat that's a likelihood agent
in this process. And then you'vegot your prior, which is your
which is where you were beforeyou started this whole thing.
It's like, well, do I believe Ihave COVID? or not, you know,

(16:37):
maybe there are some symptomsgoing on you so you have an
internal suspicion that youhave, you have COVID, or you
don't have COVID. So that's yourprior, and that will get fed
into this machine to give youyour output? And then finally
you have, what's the probabilitythat I was going to get those
test results anyway, justrandomly or, you know, looking
at the whole population, Isuppose, what's my estimate that

(16:58):
I would have that particularsequence of test results at all,
given the nature of the test,and that's where you have to
understand the test socarefully, and that that's your
marginal likelihood. So there'slots of these elements that go
into it. And you then you stickthem all together, you end up
with what I said at thebeginning, which is this
posterior probability that youhave COVID are based on the
data. And the power is, itallows you to update that

(17:21):
incrementally, almost on thestream basis is you get new
data, you can update your beliefabout that outcome.

Jason (17:29):
And right at the individual level, I, before I
ever heard of COVID, would neverhave reason to think that I have
it. That's my prior. And then Istart to learn what COVID is.
And suddenly, maybe I've got atickle in my throat or a bit of
a cough or losing my sense ofsmell. And I want data now I go
and get a test. And that testwill either tell me whether I do

(17:52):
or don't have COVID. But it'snot actually 100% guaranteed
that the test is accurate. So Iupdate my posterior my belief
now is, oh, I very well mighthave COVID, because I've gotten
a positive test result from thisquick and easy test, that's good
to roll out. But I'm going toget a second one. Because again,

(18:13):
more data means I can update myunderstanding of whether I do or
don't have COVID. And be evenmore convinced that the result
is true. Just at an individuallevel.

Jeremy (18:24):
Yeah, absolutely. That's quite nice example, because even
those suspicions that you hadeven those sort of expression of
symptoms, counters data in thismodel, you know, you start off
from the point of view not Idefinitely don't. And then
suddenly you get a particularlycough or you you lose your sense
of smell. And that's data, maybenot data that you write down or
tell anyone about but is itstill data that that you're

(18:45):
aware of. And so that starts toupdate your belief model. And
then you add in the test aswell. And that's that's more
data and that's more persuasive,maybe.

Jason (18:53):
Nice. And you had a interview with somebody who
knows a lot about this fromespecially an algorithmic point
of view, Dr. Joseph Walmswell, aprincipal data scientist at
Abcam. Let's hear what he toldus.

Jeremy (19:09):
I joined in the DataCafe today by Dr. Joseph Walmswell,
who's principal data scientistat life sciences company, Abcam.
Welcome, Joseph.

Joseph (19:19):
Thank you, Jeremy.

Jeremy (19:20):
And we just had a really interesting talk from you today,
around the area of Bayesianinference and all that goes with
that. And I really wanted tostart with how, in your view,
Bayesian inference and Bayesformula really relate to data

(19:42):
science from your experience.

Joseph (19:44):
Well, I suppose I'd start by saying it doesn't
relate enough. So there's quitea divide still between people
with a say a MathematicalStatistics background and then
practicing data scientist That'sunderstandable given that there
are a lot of data sciencemethods that aren't really

(20:07):
mathematical at all, like therandom forests, which are very
effective. And if you've got atoolbox of very effective
methods, why would you want tobe, as you might see as
unnecessarily constrainingourselves by constructing
parameterized models, and that'sfair enough, but you then are
faced with with a seriousdifficulty when, firstly, when

(20:30):
you need to construct aparameterized model when the
parameters are important, ratherthan just the ability to make a
prediction. So an example ofthis might be if you were say,
an ecommerce company, and you'retrying to understand what drew
people to your website, and thenwhat, what actions caused people
to buy things. Now, being ableto predict something is one

(20:53):
thing that you you want tounderstand that the causal
structure of what's going onunderneath. So there a model
rather than a black box can beuseful. Then, the other point
where I think data science canlearn something from Bayesian
statistics is in understandingthat that knowledge is
effectively probabilistic. Soyou might set up your neural

(21:17):
net, for example to to classifyan image as it might be, and
then outcomes, your result, it'sa cat or it's a dog. But that's
not really what they are leavingyour your black boxes capable of
doing, it will think with someprobability that it's, it's a
dog or a cat. And thereunderstanding that and then

(21:38):
understanding what sort ofprobability distribution is
really going on, is important.
So to be more specific, sayyou're trying to do forecasting
with a neural net. Yeah, ifyou're forecasting say something
that's fairly big numbers over afixed order of magnitude, then
the standard neural net approachof trying to optimise for mean
squared error will probably workquite well. But if you're trying

(22:01):
to forecast, say, small numbers,so as it might be sales of a
product line that does doesn'tmove very quickly, that might
sell, say, one unit this weekand zero units next week and
say, three other week, thenusing in spread or naively in
your on your own, that isprobably going to give you worse
results than if you thoughtwell, this is effectively a

(22:23):
person process, my neural netbehind the scenes is going to
take in all the information Iknow and then come up with some
best guess at what's going on.
But the way I relate that bestguess to the different outcomes.
So if I wanted, for example, tocalculate, how likely is it is

(22:43):
that I will sell, say, one ortwo units, I should make my
calculations on the assumptionthat I've got my Poisson mean,
and then I can use the Poissondistribution to do that
calculation. So I suppose itboils down to the data science
often does need Bayesian methodswithout realising it.

Jeremy (23:01):
Yes. I mean, the thing that strikes me about Bayes just
as a sort of philosophy, as muchas as much as a tool, at least
initially is that you've you'vegot this idea of a core that
represents the model I'mcreating as informed by the data
that I've, I have access to, andI and as a data scientist, you

(23:23):
know, we, we like to think wecan create sort of beautifully
general models, but really, weonly have access to the data
that we're given. And that'sreally all we have to go on
until we get more data until wediscover more knowledge about
the system. So, you know, in theneural net example, you can, the
neural nets only really as goodas the training data you've

(23:43):
historically given it to be ableto tune those parameters and get
it into a trained state. But ifyou then expose it to more
training data, it might be itmight get better, it might
become over fitted, it might, itwould change state, but it's all
dependent on that data. And I lke Bayes from the perspective
f it being dependent on on theata that's explicitly sort of

(24:06):
n there, right at the heart oft. Is that is that something
ou've taken advantage of?

Joseph (24:10):
Yes, yes, I agree with that. And then I'd add also t
e Bayesian reasoning is, wells human reasoning. It is how o
r brains actually work. We havea prior belief about a situatio
, we get some data, we update iwhen we have have a new belief
based on combining the two. Andhere's a nice example of

(24:31):
how this practical Bayesianeasoning intersected with what
appeared at least to be a veryffective blackbox neural net
lgorithm right back in the 80s,when the Department of Defen
e in the United States fundedproject at a particular
American University to build a nural net that would take image

(24:51):
of East German forests and thepredict whether or not there's
a Warsaw Pact tank column in itAnd the idea is that this
lgorithm could be loadeinto An automatic camera mou
ted on a NATO tank that coulbe done scanning the surroun
ings all the time. And w've been identified various
possible hazards to the tank cmmander. And they were very
happy to begin with when thisachieved 100% accuracy. And t

(25:16):
ey, they they did it all very wll, they had a specific tes
set set aside and they were geting 100% accuracy on the test
set. And the Bayesian brin would probably say somethin
like, Well, my prior belief thathe effectiveness of this c
assifier is such that 100% accurcy is just not highly plausibl

(25:40):
at all, I just don't belive that my prior probabili
y is that there is a certainpossibility that the algorit
m is doing something wrong smewhere. And I don't know what
t is that that's why it's bing accurate. And it turned o
t that what happened was thepies he provided the training
ata has photographed the Geman forests, without tanks on

(26:02):
a sunny day, and the German frest with tanks on a cloudy d
y. So all the neural net was realy doing was telling you

Jeremy (26:08):
Brilliant! Yes, of course. So just spotting,
spotting the light conditions inthe photo and going, there must
be a tank or there isn't a tank.

Joseph (26:19):
Yeah.

Jeremy (26:20):
I like that. And then I think you alluded to it there,
you've got this, this idea inBayes of it being there being a
prior model, a prior sort ofbelief about the world about the
data set, that you're all theproblem, you're considering it,
which is informed by the dataset. And then, you know, sort of
Bayes nicely provides you thisother update mechanism for

(26:43):
saying, right, well, I had thatprior model. That was my that
was my belief. This is what Ibelieve to be true, I believe
there was a tank in that forest.
But now I'm being given moredata. And now I can update that
say, well, there's only a tankwhen when the sun's out, well,
there's only there's been tankshappen, or there's only there's
only a tank when I can see ametal metal glinting maybe in

(27:04):
the photograph.

Joseph (27:07):
Yes, that is the great charm of Bayesian inference that
your state of knowledge iscaptured by your posterior once
you have that you can thendisregard how you came to that
state of knowledge. So you don'tneed to store all your previous
data points. When you rerun yourmodel, you just store your
posterior. Of course, inpractice, that's easier said

(27:28):
than done. If your posterior isin the form of a bunch of
samples from Monte Carlomethods, rather than a function.
If that's the case, thenstarting the inferential
process, using that as a prioris not easy, you'd have to put
some sort of kernel densityestimator here, if it's possible

(27:49):
that you might be better offrunning it on the entire
previous data set. There's a lotof interesting work there about
filtering samples and trying toapproximate it a prior based on
a sample posterior.

Jeremy (28:03):
I think your reference then to essentially how human
beings really update theirbelief, and they do it based on
their observations that sensingof their environment. I think
that's a really nice analogy.
And probably why I guess Bayeshas been such a popular go to
then for scientists, especiallynow data scientists over the
last two or three decades.

Joseph (28:26):
It is yes. And I think what we'll see is more of a
melding of the traditionalstatistician with the data
scientists. So there are peoplewho run Bayesian neural nets,
for example, where you have notmerely an output function that's
probabilistic for the parametersthemselves that make up the

(28:46):
neural net conceived ofprobabilistically rather than
just optimise to optimise theoutput. Amazon have a very good
forecasting package that runs ontheir Sagemaker platform where
you can set the outputprobability distribution to get
a great variety of things. Soyou could if you're doing

(29:09):
dealing with count data, youcould use the Poisson
distribution, if you're dealingwith over dispersed count data,
you could use the negativebinomial.

Jeremy (29:17):
So where do you see this going? You mentioned a couple of
techniques earlier around kernelestimation. What's the sort of
the next step for someone reallywanting to get into Bayesian
inference and use this in a inan exciting way in their in
their work?

Joseph (29:31):
Well, one thing we haven't mentioned at all is the
problem of model choice. SoBayes. The Bayes theorem applied
to parameter estimation comeswith the notion that the chosen
model is it might be linearregression is your given for
everything so the probability ifthe data is probability of the

(29:52):
data given the model. Thelikelihood Yes, the prior is the
probability even if theparameters given model. And even
for something like linearregression, you might have the
choice between fitting with astraight line or fitting with a
quadratic, and the quadraticwould probably give you a better

(30:12):
fit under most circumstances,because you've got one more free
parameter to play with. But thatdoesn't necessarily mean it's
the best. The best model now,this is where data science can
can help. Because the form ofBayesian approaches, you're well

(30:33):
aware, Jeremy is that youcalculate the model evidence,
here are two differentsituations, you calculate the
probability of the data giventhe model by integrating the
posterior and then you use thefact that the probability of the
data given the model isproportional to the probability
of the model given the data. Nowintegrating the posterior is

(30:54):
even harder than sampling fromit. And there are those some
some interesting ways to dothat. So you could take an end
run around the problem bymodifying your Monte Carlo
sampling process to jump betweendifferent models, for example,
they're differentparameterizations. And if the
two parameterizations are not sodifferent, that the likelihood

(31:14):
is so different then a jump willhave some probability of being
accepted. I did this for my PhD.
At one point, it was aboutwhether looking at star clusters
and deciding how many differentage populations were there. So
it was a question of, the rightmodel, as well as the right
parameters, how many clusters,how many populations as well as
how old they were. So it's quiteinteresting. But tricky. It's

(31:37):
very tricky to tune properly.
Whereas the data scientist wouldsay, at this point that you're
just over complicating it, youjust have your, your testing
data set, you measure your modelaccuracy based on that, and then
you pick your bestparameterization b ased on that,
and most of the time, I'd agreewith this.

Jeremy (31:58):
So if in doubt, keep it simple. Yeah. Seems like a nice,
nice mantra to attack most datascience problems. Excellent.
Joseph, thank you very much forjoining us in the DataCafe
today. That was really exciting.

Joseph (32:10):
Thank you, Jeremy.

Jason (32:15):
Joseph said something really cool in that interview
about how Bayesian reasoning ishuman reasoning. And it really
stood out to me, because it'sactually what we were talking
about earlier on this, I bringin my own data. And as a human,
I respond to my environment bygathering data. My senses are

(32:36):
what's gathering the data, butthen we're trying to translate
that way of reasoning into atheorem into logic into
algorithms, and then apply itand test it in the real world
for these examples, like thetank that he talked about.

Jeremy (32:49):
Yeah, as I said, it's sort of a theorem that's almost
beyond comes out of the future,from Thomas Bayes perspective,
because it's it speaks of havingthis constant stream of data
that you're able to process andthen update your model, your
algorithm your decision based ondata that you're seeing. And

(33:12):
that's exactly the sort ofarchitecture that you get in
modern machine learning models.
We're maybe being fed byupdating datasets, or sort of
user clicks from a website, orwhatever it is, that's feeding
your feature set. So it chimeswith the human process of
learning and adapting from achild right through to an adult.

(33:35):
Yeah. And it also, I think,works from the perspective of
modern day modelling and datascience tooling, almost.

Jason (33:46):
Yeah, that's actually where when we were growing up a
child plays for the reason ofexperience in interacting with
the environment. And when webuild our models, we talk about
running them in a sandbox, orplaying in the sandbox,
scientific rate, what's the usefor that for the model? And what
data do we need to add? Or howdo we tweak or fine tune it?

Jeremy (34:06):
So one of the things that occurred to me when talking
with Joseph was how he talksabout the problem he had in
selecting the model from theperspective of a Bayesian sort
of way of approaching a problem.
I thought that was quite a nicesort of piece of honesty,
almost, from Joseph, becausewhat you have when you're when

(34:29):
you're constructing a Bayesianmodel, as a data scientist, is
you have this decision to take.
And it's not just a decisionaround a set of parameters. It's
a decision about what modelshould I apply in the first
place? Should it be a Poissonmodel or a binomial model? Or
should it be normallydistributed, or gamma

(34:51):
distributed or something likethat, you'll see but you've got
all of these many, many possiblemodels to choose from, whereas
what he said was a datascientist would say Oh, well,
I'll just throw a random forestat it, or I just throw a neural
network at the problem. And I'llget it to learn the pattern that
is emerging from the data thatway. So I liked that. But I, it
occurred to me that evensomething simple like regression

(35:14):
has a sort of Bayesian elementto it.

Jason (35:17):
Yeah, and even before we would get into complicated
models, you can see it when weapply linear regression, that
you have a certain stability tothe model based on the current
dataset that you have, you canupdate it with more data, you
can add more data points, andyou can then refit it, and you
get a new updated version ofthat model. And so in the case

(35:38):
of linear regression, maybeyou're classifying a trend in
the data. And maybe that trendhas shifted, because of some
unknown, or maybe there's somereason to go and investigate
what that unknown is ifsomething has caused a shift in
your data. And I think Josephalso talked about the effect of
outliers, and whether you needto account for them. If an
outlier is going to dramaticallyshift your model, maybe it

(36:01):
wasn't stable in the firstplace. And you need to look at
the distribution in yourbeliefs, and look at how stable
the model is based on how muchdata you have, or whether the
outlier is actually reallyinteresting, and got to figure
out what's causing it.

Jeremy (36:17):
That's a really good shout. And sometimes the outlier
is an artefact of the collectionprocess. Or sometimes it's an
artefact of the sensor, or itjust may be, you have got
mangled along the way, whoknows. But it can be a nice way
of picking up that kind ofthing, again, coming back to
what Joseph was pointingtowards, which is that sometimes
you, you can get a better fitfrom a more complicated model.

(36:39):
But that may not be what youwant, you might actually want to
be in the constraints of aslightly simpler model in order
to cut through that kind ofnoisy data situation, because
otherwise, very classically, youget an over fitted model that
isn't going to be a goodpredictor for anything in the

Jason (36:56):
And I think there was some modelling that happened
future.
around the trend of COVID. Andwhen you added him exactly what
Joseph said, another variable sothat you can make it a
quadratic. But if you extendthat into the future, and treat
it as if it's a forecast, yousee the effect of that quadratic

(37:16):
fly off in one of the directionsthat you don't have any data
for, it's not constrained. Andit's no longer valid, you can't
be using this as a forecastingtool, just because it fit really
well in the interval that youdid have data.

Jeremy (37:31):
Yeah, interesting. I think there are lots of pitfalls
to using a statistical modelwhere you have to have an
understanding, I guess, of theunderlying dynamics sometimes of
what you're looking at, to beable to make some of those
initial modelling decisions. Butwhen you do have that
understanding, when you do havethat training, then it's

(37:53):
enormously powerful. And it canbe a tremendous benefit to the
data scientists to have thatlevel of insight. And that and
that experience, which if youjust use a sort of Machine
Learning Toolkit, where you havemaybe a blackbox neural network,
typically that you're throwingat the problem, maybe that

(38:13):
doesn't come through, and yourun a much greater risk of
accepting data points aslegitimate, and as affecting
your output function. And youroutput classification, if that's
what you're doing, as a resultof not having that greater,
greater depth. So that balance,I guess, between the simplicity
of doing something when you sayI'm deliberately not going to

(38:36):
try to understand this system,I'm just going to throw a box at
it, versus the extra insight andunderstanding and depth that you
can bring, when you say I got avery, very strong hunch that
this is a Poisson distributedprocess. And I'm going to base
my modelling on that. And thatthat gives me probably a much
more convergent, accurateprocess much more quickly.

Jason (38:58):
Something else that Joseph mentioned, I'll ask you,
how should we bring more of thisway of thinking into data
science, and I guess, overcomeor see, where's the benefit
versus the situation you justmentioned about taking something
off the shelf, which is valid inmany cases, I just want to see,
you know, the usefulness on thecurrent static dataset. And

(39:21):
there does need to be a biggerunderstanding of what's going on
because I've got quite a selfcontained problem. Let's say it
doesn't have a medical outcome,like we talked about with the
COVID testing.

Jeremy (39:33):
I think the power from the data science perspective and
the way of thinking about aproblem it comes from when
you're using this Bayesianupdate inference rule, if you
like, comes from being able torecognise the fact that your
data is not static. Typically,or very rarely. It's very rare

(39:57):
that you're given a problem wesay here's a here's a body of
knowledge, and it's never goingto change again, we just want to
know, should we go left? Shouldwe go right? Should we spend a
million dollars? Should we spend$100 million. And that's it,
it's more often the case thatyou're given situations where
the data changes that what wastrue yesterday may not be true

(40:20):
tomorrow, because the data hasshifted, and maybe the model has
shifted as well. And that, Iguess, is where things can get
quite quite challenging. Fromthe perspective of a Bayesian
model, if you were assuming itwas one type of model, and then
the very essence of that hasmodified. So it's really
embodies the concept of changewith respect to data, and

(40:41):
especially drastic change, whichcan affect you know, which we've
seen anyway, with the pandemic,of late in demand figures going
haywire, and all kinds of sortof societal behaviour changing
dramatically, where you see thatyou have to be super careful
with that coming into any modelbe Bayesian or otherwise as to

(41:04):
how that's going to affect thefuture operation of that model.
Maybe you want to sandbox it alittle bit, maybe you want to
put a mark around that data setand say, I wouldn't put too much
belief in this, this set of dataif I were you because the
probability that we have apandemic is, I hope, really
small usually. So it speaks to alot of that kind of dynamic

(41:29):
streaming of data and how youreact to change in that. So
yeah, I like it from thatperspective.

Jason (41:35):
That's really cool. And thanks, Jeremy. I think
hopefully, some people listeningtoday will have updated their
own ideas about what Bayesianinference is, and come away with
a new idea for how it can beemployed in data science.

Jeremy (41:47):
Thanks Jason.

Jason (41:49):
Thanks for joining us today at the DataCafe. You can
like and review this on iTunesor your preferred podcast
provider. Or if you'd like toget in touch, you can email us
Jason at datacafe.uk or Jeremyat datacafe.uk or on Twitter at
datacafepodcast. We'd love tohear your suggestions for future
episodes.

All Episodes

Episode Transcript

Popular Podcasts

New Heights with Jason & Travis Kelce

Dateline NBC

On Purpose with Jay Shetty

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}Bayesian Inference: The Foundation of Data Science