Episode 103: Deep Learning

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Speaker 1 (00:00):
Welcome to episode one oh three of the Live with
the Maverick podcast. The theme for today's discussion is deep
learning and we are very excited to have with us
a return guest, Ron Rickman.

Speaker 2 (00:13):
Ron Richmond.

Speaker 1 (00:14):
Ron is a qualified actuary and insure tech founder who
is a thought leader in the actuarial data science movement.

Speaker 2 (00:22):
So welcome back, Ron, Dom.

Speaker 3 (00:25):
It's a real pleasure to be here, and thanks for
having me back on the show.

Speaker 2 (00:29):
Very excited to have you back.

Speaker 1 (00:30):
And honestly, we've had maybe one or two I think
may be the only the actually you know, the third
repeat guest, and we only have that guest when I
know there's you know, lots of great potential for content,
and just really appreciate some of the things you're doing
in the profession helping to modernize and all the research
papers you're doing and we're going to have a chance

(00:51):
to showcase some of those later.

Speaker 2 (00:53):
So just very excited to have.

Speaker 3 (00:54):
You here, don Lockwat's great to be here.

Speaker 2 (00:57):
Great So just love to give you an opportunity to
injurview yourself.

Speaker 3 (01:01):
So Dom, like you mentioned, I'm Ron Richmond. I am
an actory with a very significant interest in how the
profession can modernize how we can do things better. I've
been in the P and C part of insurance for
about seventeen years now, held various different roles from chief
actory most recently at all Mutually Insure before stepped out

(01:24):
to do something entrepreneurial. Before that, I was within the
AIG Group where I was Chief Actory and Chief Risk
Officer for their Africa entities. So I've had a pretty
good view of actual practice as it pertains to property
and casualty insurance and hopefully some of those insights that
I've gonered over the years we can share here today.

Speaker 1 (01:45):
Yes, I'm looking forward to exploring those and you have
a very multifasted background as you've spoken to and before
we get into the details you mentioned, I mentioned an
insurance You just mentioned that you recently started a new
entrepreneurial venture and insure Tech. So how would you describe
this adventure?

Speaker 3 (02:01):
Thanks Ton for the opportunity to discuss it a little bit.
So I'm now about almost two weeks into this new venture,
so things are becoming a little bit more defined as
the time passes. Basically, as I think we're going to
discuss in the rest of this episode. I've had this
real interest in applying ar applying deep learning and techniques
from machine learning to a lot of actual practices, And

(02:24):
what I hope we'll do within this insure tech is
a provide some software to actors who'd like to be
able to exploit this more obviously and more easily within
their day to day work, and also provide consulting in
some of these themes. So I think this union between
actual science and some of these modern technologies is really fruitful,

(02:44):
and by making that available through consulting, I think we
can benefit large parts of the insurance industry. So hopefully
you'll hear more about that in the coming weeks and months.

Speaker 1 (02:55):
That's exciting and it's good to know with those who
might be familiar with me know that I'm in a
software space as well, and having just come back from
the CIS annual meeting and seeing all activity our own
software and what's happening with a I know. I think
it's an exciting time for actuaries and for other risk
professionals to be in the space. So you talked about

(03:16):
some of the goals and some of the opportunities. I'm
curious to know whenever someone starts an entrepreneurial venture other
than just a pure opportunity. You know what motivated you
to start something on your own? You know you've because
previously you are still doing some of this research. I
think you're still tapping into those talents and exploring those

(03:37):
opportunities with the profession. So why know, and you know
what really gave you that push to and the conviction
to move out on your own.

Speaker 3 (03:44):
I think one I mean, just as something that's always
guarded me well, is sometimes you have to listen to
that in a voice that you have telling you. And
maybe that's it sounds very natural, but I think sometimes
the way that we process information and we can understand
which way to go, some of you just have to
listen to those signals. So why now? I think a

(04:06):
lot of things came together. And obviously it's not just
about the evaluation of upsides and downsides in a Monte Carlos,
a Monty Colors simulation, you have to also think about
qualitatively doesn't make sense, And just the world we live in,
with the amount of increasing interest and the amount of
technological focus on AAR, it felt right to do it

(04:28):
at this point in time. So I'm glad that that
I got the opportunity, and yeah, suppose sometimes you'll only
be able to evaluate because the right choice down the line.

Speaker 1 (04:38):
Now one more, one more follow up on that, because
I you know insut tech obviously it's been a hot
topic in industry. Also recently attended a session on that
at the CS Annual. If you were to think of
your particular startup, and of course it's going to be
heavily focused on AI, but is there a particular lane
or sub focus that you know you're going to in

(05:00):
terms of your particular venture.

Speaker 3 (05:03):
Yeah, so, dom I think for me, something I've been
very passionate about throughout my careers just how can we
do things better as actuaries? What in our normal processes,
whether it's reserving for liabilities or calculating capital requirements, how
can we do that better not only to provide better
results for management, but also make sure that we're doing

(05:24):
the best job that we can. And I think there's
almost been a technological shift, maybe similar to the technological
shift when spreadsheets came about at the beginning, I suppose
of the modern actual profession. I feel like today that
technological shift is happening around us with better use of
large language models, and these models becoming commoditized and quite

(05:44):
cheap or really thorough methodologies in machine learning and deep
learning for things like concerts andty quantification. And because these
things now exist, I think that there's a really strong
business case to be made to say that we can
do some of the things actors have always done. Those
things can be done better, faster and with better results

(06:06):
fast takeholders, and that's for me, I think a key
motivating factor. I'd love to see that tual profession move
forward and thrive in an age of R and not
get left behind. And I think that part of the
work we'll be doing in the startup is making sure
that every actor has got access to some of these
tools and some of us thinking.

Speaker 1 (06:26):
Great and would you say it's it sounds very P
ANDC heavy. I know that I like the new discussion
with PNC heavy, But is it agnostic to lener business
or would you say it's more focused on the PNC lines.

Speaker 3 (06:38):
So we definitely have a product that during the life
insurance space, actually two products. As in the life insurance
space that was my first introduction to working as an
actory many years ago, and there's some elements of life
insurance practice as an actory where again you can apply
some of these tools. So we'll definitely be looking at
the life insurance space as well.

Speaker 2 (06:58):
Okay, good to know.

Speaker 1 (07:00):
Well, the first conversation that we had I didn't mention
your a repeat, guests. The first conversation we had was
an artificial intelligence that's AI and machine learning, and it
was more foundation for the foundational in nature to help
actuaries understand what the technologies are and how we could
begin to to start using it. This conversation is going

(07:21):
to take things a step further before we get into
the meat of deep learning. I still do want to
ask a fundamental question on AI, which I think will
help us to more easily navigate the conversation, and I
actually be looking forward to this question. So the term
AI itself, it's broad and to understand its capabilities. I
think it's helpful to this entangle AI itself from traditional

(07:43):
methods and also to understand key differences along the AI spectrum.
So how would you distinguish between the following I want
to mention four terms which I'm sure actuaries and data
scientists here very often. So statistics that's the first one, statistics,
machine learning, deep learning, and generative AI. How do we
distinguish between those four?

Speaker 3 (08:06):
Sure, So I'm going to take give you my opinion
and I'll tell you how I think about it, which
is quite orientated to what is the goal when you're
doing a statistical analysis versus when you're building a deep
learning model. And maybe as we work through the different
goals and the different intentions that practitioners can have, that
will help to explain the mental model I've got about

(08:28):
how to think about this. So I think, just as
one caveat before we start, one has to remember that
these fields are incredibly broad. So even within statistics, I'm
thinking of the famous paper by Leo Brahman called two
Cultures and Statistics I can give we can maybe provide
the reference after we chat, and basically he in that

(08:49):
discussion talks about a more data driven approach to building
models using black boxes versus a traditional statistical modeling approach.
But what's really important is he calls it two cultures
in the statute. So when we say the word statistics,
we're really talking about a very broad discipline that definitely
has overlapped between machine learning and deep learning. So I
just wanted to say that caveat just at the outsets.

(09:11):
The way I think about statistics, in the way we
often introduce to it as actual is is we're talking
about making inferences. Can we make an influence about is
one population mean different from another population mean? What's the
best way of making those inferences? Is it bout building
a linear regression model? Is by running a randomized controlled trial.
There's all these different techniques that we have around making

(09:33):
inferences about things we might care about in the world.
And that's really for me, something that distinguishes statistics from
the other disciplines that we're going to briefly discuss. That's
very foundational in the history of statistics. It's about making inferences.
Machine learning, I think, has got a very different approach,
where the primary goal of machine learning as a field,

(09:53):
more or less to me, seems to be making good predictions.
It's either that you've got a lot of previous outcomes
of particular process. That process could be how does a
P and C policy develop or what sort of claims
arise on a motor coverage? And as long as you've
got a lot of data, the machine learning approach basically says,

(10:14):
let's take that data and using some sort of algorithm
come up with predictions. So the key distinguishing idea between
inferential statistics like I described and machine learning is one's
a field focused on inference and another one's focused on prediction.
You can also try use your inferential models for prediction.
Sometimes you find out they're not so good. I remember

(10:36):
reading a blog post a few weeks ago about how
someone tried a Caggle competition using his generalized additive models
from his statistics background, and it was quite disappointed when
they didn't perform particularly well at prediction. So I think
understanding the goal and understanding the tools that arise as
a result of working towards that goal is really important.

(10:57):
Let's move on to the next rung of the hierarchy.
So what distinguishes deep learning from machine learning? In my mind,
deep learning is really all around this concept of representation learning.
How can you get an algorithm to gain some sort
of understanding or in a very narrow sense, cognition about

(11:20):
the world. So, when you think about a classical deep
learning algorithm, like a convolutional neural network applied to images,
what that network is able to do when you feed
an image and is come up with a hierarchy of
concepts and say, okay, this is a picture of something
that looks like a four legged animal as opposed to
a four legged table, right, And it builds this hierarchy

(11:41):
of things with a certain certain sense of semantic meaning.
So that's what distinguishes deep learning from machine learning. For me,
it's that idea of building up a cognitive representation of
the data that you give us. And then finally, generative AI,
in my mind is a subset of deep learning, or
at least it is today. The whole idea of generative

(12:02):
AI is you can take deep learning as a discipline,
but instead of making a prediction, you can predict parameters
of a probability distribution. So let's think a little bit
about a large language model like chat GPT. What it's
doing is you giving it a prompt, which you can
think of you as your covariates or your rating factors
if you want to use a pricing actoris terminology, and

(12:24):
what it's doing is predicting a probability distributions to say,
what's the most likely next word that's going to come
up in the sentence, and then you sample that word
and you put it back. So what generative AI is
all about it's a deep learning model that's predicting distributions
as opposed to just making single predictions lack points estimates.
So Tom, yeah, happy to take your feedback. But that's

(12:46):
how I think about what is each of these disciplines
trying to accomplish and do, And for me, that's what
leads to the distinguishing characteristics.

Speaker 1 (12:55):
That's very helpful. And there's something a subset of thatta
I want to discuss. There's you talked about the fact
that statistics sometimes overlaps with machine learning and speaking of
that overlap, generalized linear models GLMs, in my opinion, my
humble opinion, they're somewhat ambiguous when when oftentimes I'll see
articles contrasting GLMs with machine learning models, and on the surface,

(13:22):
that seems like that seems like a viable comparison because
GLMs they have a closed form.

Speaker 2 (13:28):
Their their closed form.

Speaker 1 (13:30):
Multiplicative solutions predicated on distributional assumption. So for instance, you
may fit a frequency modelly using a passan distribution, severity
distribution using pareto or you know, size of loss distribution
and at the same yet at the same time, they
can be treated in a similar or they're sometimes oftentimes

(13:53):
treated in a similar fashion to a machine learning model
in terms of how they're trained. So one question I'll
put to you is that are GLMs themselves considered statistical
models or you know, AI slash machine learning models.

Speaker 4 (14:09):
So I would say that it would be at least
using the categorization that I've given you before, it would
be very difficult to construct GLMs, even regularize GLMs as
AR unless your marketing team is in overdraft.

Speaker 3 (14:23):
And the reason for that is there's no concept of
representation learning in GLMs except at a very basic level.
So for me, that rules out the possibility of them
being AR. Let's talk about the decision between statistics and
machine learning. I think you can build a GLM using
a very classical statistical mindset where you perhaps have a

(14:43):
set of data and you want to build the model
to perhaps investigate different causes of play.

Speaker 2 (14:51):
So what you do is you use.

Speaker 3 (14:52):
All your data, you fit a GLM without regularization, and
you might look at P values for example, to say,
well are some of these curve passions significant or not?
And on that basis, that's how you're going to build
your model. That's a very statistical way of doing it.
You can take exactly the same model form, which would
be a bunch of cover its multiplied by coefficients, and

(15:15):
that gives you an outcome, and you can take a
completely different lens. Instead of using all your data, you
might decide to use cross validation, or you might decide
to split your data into a learning set and a
test set, and that enables you to measure how Willissen's
model going to predict, because if you remember, the whole
point of machine learning is generally as a field, is
to make good predictions, right, you might regularize that model

(15:38):
and aggressively remove coefficients that don't add to the predictive
capability of the model. So I think a GLM is
just a model methodology, but what you use it for
determines whether or not you're thinking more about a statistical
application or if you're thinking about a machine learning application,
which is prediction. So I think it's both, and maybe
that's what leads to some of the ambiguity. I think

(16:00):
something very interesting from an actural perspective is when we
started out using regression models, probably the field of machine
learning wasn't as developed, so I think people to quite
a statistical perspective on these models and that's now turning
into much more of a machine learning focused with really
the same model form. And again I think that's an

(16:21):
evolution that's happening in the actual profession.

Speaker 2 (16:25):
Excellent. That's very helpful.

Speaker 1 (16:27):
Something I want to ask because I know you're in
this space and you're pushing the application of artivision intelligence,
machine learning, etc.

Speaker 2 (16:36):
To actuarial practice.

Speaker 1 (16:38):
Now, something is important certainly that I see in the
software space is you get exposed to the idea of deployment.
So it's one thing to build and to develop a model,
but what you find, at least what I've found in practice,
is that a large percentage of the models that they're
developed are not ultimately deployed in production. So from your perspective,

(17:00):
I mean, I'm in the US, I'm primarily focused on
this side of the world. In your side of the world,
whether it's the UK, Africa, some of those geographies, what
are you seeing regarding adoption of adoption and deployment of
machine learning models, machine and deep playing models.

Speaker 3 (17:16):
Quite frankly so, I think deep learning there is a
lot of interest. I think there's a lot of academic interests.
I don't think that's yet at the stage where there's
good systems for deployment, at least in that actual real world,
So I think that holds back a little bit of
the adoption, at least during the part of the world
that I'm in. I think if you talk about more

(17:37):
traditional GLM style models, yeah, a lot of models get built,
only some get deployed, and only some of the ones
that I've deployed are ever monitored and checked. After that,
I think that happy moddle is the machine learning space.
Lots of companies have deployed gbms. In practice, gbms being
generalized boosted generalized boosting machines with are a great technology

(18:03):
and a great machine learning approach for productions with tabular data.
I think some companies just run on gpms these days
and have completely given up on glims. I think other
companies are taking a more cautious approach to perhaps upgrading
some of their GLMs to gbms where that might make sense.
Sorry I gave the wrong I was thinking of generalized

(18:24):
linear models. It's grading boosting machines, not generalized boosting machines.

Speaker 2 (18:29):
Yes, so yeah, I hope it.

Speaker 3 (18:31):
What are you seeing in the state and around deployment
of particular model classes?

Speaker 1 (18:37):
I mean, I think there's a couple of considerations, because
the first thing you have to think of is line
of business, so within property and casualty, of course, the
personal line space is a little bit more highly regulated.
And what's interesting is that and I just actually just
actually getting back from the CIS annual meeting we had
we actually did have our wrong table with someone from

(19:00):
the Connecticut Department of Insurance who's actually involved with any
IC and some committees there, and he's saying that, like
one of the questions I ask is are you seeing
more submissions for tree based and more advanced models, things
that are not outside of the run of GLMs, And
he's saying, yes, that there's an optic So certainly there's
there's there is an increased appetite, and also with myself

(19:25):
working in the software space, certainly interest in at least
exploring them. I will say that companies appear to be
moving cautiously. They're not necessarily saying Okay, oh this is great,
let's run with it. You know, there's a lot of
change management around that. So I would say that at best,
it's it's it's cautious in terms of adoption deployment itself.

(19:49):
I think that's that's another question which which might be
a little bit harder to answer. I still do think
that the majority of those models are not necessarily deployed.
There are also some that if they're not used directly,
for instance in the personal line space, if they're not
used directly for a rating plan, they may be use

(20:10):
it as an internal benchmark as well. So that's another
way that companies will will use some of the more
advanced models if they're not filed up if they have
to be.

Speaker 3 (20:18):
So the question absolutely yeah. So maybe I can make
one comment, and I think there's an inherent conservatism within
the insurance industry. There's a lot of money at risk,
and generally it's preferable to do things that have always worked.
And it's only for example, even if you think about
the stress risks or the more specialized motoraters for people

(20:41):
who otherwise can't get coverage with normal insurance. There's a
certain way of doing things and that permeates the industry
because it's worked for so long. So I think that
natural inclination to being conservative with new technologies is a
good thing as long as it doesn't lead to suboptimal outcome.
To whether footballer see alters or shareholders as well.

Speaker 1 (21:04):
Yeah, and I think I I just, I genuinely think
that in time. I mean a couple of things I
want to say, which hopefully will come through the course
of the episode, is that when it comes to the
more advanced models, it's not just as simple displacing some
of the older models. So there may be cases where
the GLM maybe more maybe more appropriate, possibly because it's

(21:25):
easier explainable, because the scope is more narrow in terms
of the whether it's the lineup business or the context.
Maybe it's just it's sufficient to have that explanatory power
that you need. So I think that's That's something else
that that's important to note as well, is that it's
like with any kind of technology or tool software, is that,

(21:48):
for instance, Microsoft Excel is still around. You know, there's
more advanced software, but Excel still holds its own with
with things of a certain scope. So I think I've
always like to take that that very nuance roach to
the technology and to software. No agree, No, I did
mention the theme of today's quession is deep learning. So

(22:09):
let's get into a little bit deeper into this. No
point intended into the topic. So one of the most
common deep learning models is neural networks. I think you
might have spoken about one of them, maybe briefly earlier,
but let's.

Speaker 2 (22:21):
Let's recap that.

Speaker 1 (22:22):
How do you describe a neural net If we're to
talk about neural nets, just how do you describe that
particular type of deep learning model.

Speaker 3 (22:30):
So, just coming back to that concept I mentioned before,
representation learning, where a model comes up with its own
almost cognitive representation of the data that it's that it's
understanding in inverted comments, that's really what neural nets exhibit.
Neural nets are one of the most classical algorithms that
exhibit this property of representation learning. And the way that

(22:54):
neural needs do this is almost bar coming up with
their own description of the input data. So, and that's
why I'm thinking about neural nets just to describe them
as think of them as many gerlms or many regression
models stacked on top of each other, and the earlier
ones in that stack that are closer to the data.
What they're doing is they taking in the input data

(23:16):
and coming up with a new representation that's optimal for
making predictions with. So let's say you're talking again in
the pricing context. If you think about when you build
a GLM, you might have an interaction between past number
of claims in the five years and the age of
the driver. If you're going to build the GLEM, you
have to specify that manually. But something that neural nets

(23:36):
excel at is finding those corresponding variables where you need
to amplify in the presence of both of them or
almost downweighed sometimes if needed. And that's really what a
neural net is. Like I said, it's just layers of
regressions that are stacked together to come up with its
own representation of what's important in the input data. And

(23:57):
then it's effectively just a GLEM on top of those
new variables that the model has learned in order to
make the predictions. And often you'll see those diagrams, I'm
sure if I mentioned it a lot of your listeners
will be aware of them. Is with all of those
circles which represent variables, and then those connecting lands which
represent regression rights. And really what that diagram is is

(24:18):
just a diagram of step recression models. So hopefully that
helps a little bit.

Speaker 1 (24:23):
That helps, And yeah, I've seen the diagrams and at
the first time that you look at them, it can
be a bit dunting, what I imagine, but ultimouse study,
you know, it starts to make more sense.

Speaker 3 (24:32):
Absolutely.

Speaker 2 (24:34):
You know, something that we mentioned to in the premium.

Speaker 1 (24:36):
I've heard this before, not from first time experience, but
just from people who have worked with these types of
models is in the past that neural nets themselves have
been criticized for being over parameterized. So what precautions can
be taken to reduce the risk of over parameterization for
neural nets?

Speaker 3 (24:55):
Absolutely, maybe say a few things about that. So firstly,
what is over parameters? It's very much a concept that
started in statistics, where if you're aiming for a simpler
and more parsimonious model, you try not to increase the
number of variables that you might have. So let's say
you've got fifteen input variables, you probably in a normal
than neuroaggression, don't want to take the interaction of each

(25:17):
of those variables together. You'll get a very large number
of variables going into a model, and that which is
what we'd classically call over parameterized. I think something that
the first generations of neural nets suffered from was this
idea of over parameterization. And when I talk about generations
of neural nets, if I remember my history, we were
actually on something like the third generation of neural nets. Now,

(25:39):
you had your very first ones in the nineteen fifties
called perceptrons, which weren't really even useful for regression modeling.
Your second wave of neural nets, which was probably from
the eight within the eighties and nineties, were very similar
to regression models. They maybe overfitted, and you'd be very
worried about the number of parameters. But in this third

(25:59):
genera of neural nets, I think we can safely say
this time is different. And the reason it's different in
over parameterization is maybe less of an issue. Is that
we've come up with new deep learning model forms, so
you're no longer using the old model forms from the
eighties and nineties, those diagrams with the circles and the
lines connecting them. There's all sorts of new methods of

(26:21):
designing and fitting neural networks which inherently mitigate the over
parameterization problem. And that's why you can have large language
models which are based on a model architecture called the transformer,
which can scale up to even hundreds of billions of parameters,
and you can fit these models without necessarily overfitting, and

(26:42):
that's because the model form itself mitigates over parameterization and
mitigates the downside consequences of over parameterization, which are, hey,
your predictions just won't be particularly good. Maybe also just
to mention that, besides for all the unique architectures which
have now been developed and are available three or four
lines of path and code these days, another thing is

(27:04):
a whole range of regularization techniques. So we've mentioned this
concept of regularization in this discussion, but basically it's a
way of modifying the fitting process of coefficients in a
statistical and machine learning model. And there's very highly effective
regularization methods that are available for deep learning. So I'd
say concern around over parameterization is one hundred percent valid

(27:27):
if you're dealing with the simplest neural networks that you
might come across today. But as these things get more
advanced and you layering in all the modern advances, that
kind of vanishes and goes away.

Speaker 2 (27:39):
That's helpful. So it sounds like there's been some evolution.

Speaker 1 (27:42):
The forum has evolved a bit, and you know that
challenge is being commonly addressed.

Speaker 3 (27:51):
So Tom, if I could make one suggestion if any
of your listeners are interested, there's a fantastic paper called
Dropout Jeffrey Hinton and a few of his co authors,
and drop Out has become one of the leading methods
for avoiding over paramatrust neural networks and has actually got
some very nice analogies in the paper, as do all

(28:12):
the technique works. So if I can recommend that to
anyone who's looking for some weekend reading, that can be
a fun place to start.

Speaker 2 (28:20):
Excellent.

Speaker 1 (28:21):
So when we talk about neural nets and deep learning,
you know, coming back to the beginning of the conversation
and some of the things I mentioned, our own adoption, deployment,
and the big in the bigger picture, what you're trying
to do with being able to apply some of these techniques,
these more modern techniques to actuarial practice. There's three trade

(28:44):
outs that I typically think of when we go to
a more advanced model beyond say the GLM. There's accuracy,
of course, the potential for the month to be more accurate.
There's a complexity. Of course, with more accuracy could come complexity.
Looking at when you start to incorporate non linear relationships,
for example, and then also the explainability. I guess those

(29:07):
two go hand in hand, explainability to internal stakeholders, regulator,
et cetera. So when you think of deep learning models
not necessarily limited to neuralness, but when you think of
that trade off between the accuracy, the complexity, and the explainability,
how do you how do you think of that trade off?
Because I know sometimes they're they're marketed, and even me

(29:27):
being in the software space, when I think of machine
learning models, sometimes I have to be a bit more
nuanced in terms of how I discussed them because I
don't want to promise just guaranteed more accuracy. I know
that there's a trade off there, so that it's not
that that those those three things.

Speaker 3 (29:42):
So can I add one extra dimension? Now we're working
in a full dimensional space, which is speed of production. Okay,
another interesting one. So so let's just run through yours
and then maybe you can mention mine. Very briefly. I
do think a lot of research and a lot of
my own practice has shown that machine and deep learning

(30:04):
models do outperform on an out of sample basis if
you compare them to GLMs. What you can do, so
I think on that the mention of the trade offs,
you can get greater accuracy with these models. Of course,
you can take some of the lessons as to what's
driving that greater accuracy and go back and enhance your
GLM model, which is also a possibility. So also important

(30:27):
to say they're not completely orthogonal. If you've got a
really well performing machine learning model, that can then inspire
you to make some changes to a GLM. That can
also be a really powerful and methodology. I think a
lot of machine learning models without a DOWS are more complex,
and the way you can maybe think about complexity is
can you reason and rationalize your way through just what

(30:48):
is this model doing? And generally to hold in your
head all the different bits of a deep learning model
is extremely difficult, whereas a GLM more or less it's
a linear model form. You can get the coefficients and
more or less work out what's going on. So I
think you're absolutely right that machine and deep learning models
have that increased complexity. Explainability is an interesting one because

(31:11):
it's quite easy to take machine and deep learning models
and modify them so that they become inherently explainable. So
in the past few years, together with Mario Wittrech and
some other co authors, we've come up with this idea
of what we call the local GLM net, which creates
a GLM for every particular policy that you're making a
prediction for, and in that case you've got almost perfect explainability.

(31:36):
So maybe it's a little less clear that machine and
deep learning models are less explainable. And I'll make a
counterpoint that I often make to people in the other direction,
which is how explainable are GLMs? And there's two components
to that. Let's say you've got a fifty variable GLM
for personal neser pricing. Just understanding the impact of those

(31:57):
fifty variables and holding them in your mind at once
is not as easy as if you've got four or
five rating factors, right. And another component is is that
one has to remember when you fit to GLM, you're
not getting marginal effects. So if you're looking at a
one way analysis in pricing, which is what is the
relationship between claims, frequency and age, you often see a
decline from young ages into middle ages. That may very

(32:20):
well not be there just because the different variables which
are correlated with each other, So you're no longer looking
at marginal effects, and I'd argue that we're not that
good at really understanding marginal effects in GLMs either. And
then the last dimension of the one that I added
is just around efficiency and how quickly it takes to
produce a highly accurate model. And I think in that case,

(32:43):
the machine in deep learning models clearly one, because to
build a really good GLM by hand, the old way
that actors have always done it, that can take an
extremely long time, and you can be sitting there for
three months playing around with different combinations of variables, where
if you have your footing routines sorted out, you can
fit a deep learning model in maybe a minute or
two minutes. So I think exactly as you said, there's

(33:05):
a number of trade offs, and it really depends what
are you aiming for. If you're in a market where
accuracy doesn't matter and you have to do a rate
filing explaining the rational expectations of why your coefficients look
the way they do, probably a deep neural networks not
the right answer. But if your accuracy really matters and
you need to get something out quickly, a well tuned

(33:28):
GPM training popeline might be what you're after, and I
think being open and transparent about what your models can
do and what the very different model classes, how they
lie in that set of trade offs, I think is
really important.

Speaker 2 (33:44):
That's very helpful.

Speaker 1 (33:45):
Only just one more thing I add is the reason
I brought up the accuracy. I remember one and burn
mynd I was not the one who developed this model,
and actually once told me about a model it was
developed I think in Python, one of open source or
one of open source languages ron to come up with
reserving indications, and I think it came up with an
answer of a range sorry of zero to a billion,

(34:06):
and I was thinking, Okay, well that's not very hopeful
something with that much varians and variability.

Speaker 2 (34:11):
So it's it's just something I thought of because you know.

Speaker 3 (34:16):
Yeah, I think I think the one thing that's clear
is and maybe this is why I was emphasizing before,
you have to have a good model fitting popeline, because
you can get really poor results from machine and deep
learning models if they not fit properly. So just because
you're using a popular open source library like x g
boost in a nice open source language like oh, if

(34:38):
you're not doing it right, you're going to get results
at all not particularly impressive. And I think that's exactly
what you're pointing to, is you can produce really poor
results with machine and deep learning. Whether it's not as
harder to do that thing with gel limbs. It's maybe
a conversation for a different day.

Speaker 2 (34:55):
Yeah, that's helpful.

Speaker 1 (34:57):
And one of the things I want to shift yours
to know is I want to start to discuss some
of the takeaways from your papers. And I think this
is a great opportunity because some of the papers that
you write there, they're very technical and they have some
very great insights, and I think we have a good
opportunity to just get straight to the source, you know,

(35:17):
speaking to the author, and be able to understand the
value in those So one of your recent papers discusses
smoothness and monotonicity constraints for neural nets using ICT the
individual conditional expectation.

Speaker 2 (35:34):
So first thing is.

Speaker 1 (35:35):
Like, what was your motivator motivation for writing this paper?

Speaker 3 (35:39):
Yeah, so dont at a nice place to start. So
I think what we've accumulated over the years in the
actual profession is a set of desirable characteristics fortu real models.
Often we know that certain relationships should be monotonic. So,
for example, if a person's had a significant number of

(35:59):
CLI in the past, that's generally a good predictor of
the number of claims on a personal lines policy that
they'll have going into the future, and you generally want
that relationship to be monotonic. What do you mean by monotonic?
If you've had more claims in the past, your prediction
should increase, shouldn't decrease, right, So that's one sort of
a desirable characteristic that you might want for models. Another

(36:23):
desirable characteristic that you often find more in life insurance
practices the ideal smoothness. So in life insurance practice, when
you derive mortality rates from a set of data, often
you've got a whole bunch of noise that you don't
really believe is there. You might think the reason that
you've got all of those noise is just because you
don't have enough data. So what act life actors will
generally do is smooth out or graduate those mortality tables.

(36:47):
So what the point of this paper using the ice
INET was is how can we incorporate some some of
these desirable characteristics into neural net modeling. And that's really
what the whole idea was important to say with things
like geerlems and gpms, you can impose monotonicity constraints quite easily.

(37:07):
You can smooth gearlems, but you can't smooth gbms yet.
So what we try to what I try to do
in this paper was give neural nets the best of
all characteristics, but coming up with a general methodology for
enforcing any sort of constraint that you'd like on the
outputs of a neural network.

Speaker 2 (37:27):
No regarding the IC in it, the individual current.

Speaker 1 (37:30):
If I'm saying strong conditional expectation, just what does that
just do to For those of us who may not
be familiar with that expression, what is that?

Speaker 2 (37:37):
Then? Why was that the technique that you use?

Speaker 3 (37:41):
There's this whole field around post hoc explanations of black
box models. So if you're working in the field, you
may have heard of SHAP, which is a very last
technique for doing that, but it's a little bit more
complex all in. Individual conditional expectation is is let's say
you put you pick a particular personalized policy that you've

(38:03):
got a pricing model for, and you want to understand
how does the model behave as you increment up or
down one of the variables. So let's say you start
off with a driver is eighteen. What's the minimum driving
age in the state? Sixteen? Okay, so let's go as
someone who sixteen, you might want to say, as they
go from sixteen to seventeen, eighteen, nineteen, twenty, all the
way up to old age, what will the premium that

(38:24):
you give them be? How's that premium going to change
with that particular variable. So there's a very obvious and simple,
intuitive model interpretter stability analysis that you can do. And
the whole idea that we took in this ic net
paper is when you're training an arsnet, in addition to
making predictions, you also make predictions across all of those

(38:48):
variable dimensions like we've just mentioned, perhaps age, and make
sure that the predictions all relate to each other in
a sensible way. So it's actually a really simpler Yeah,
there's quite a lot of code that you need to
make implement that. The results are actually quite nice. You
can now smooth out and enforce monotonicity on neural nets
in a very intuitive and hope appealing way that talks

(39:09):
to the sort of analysis you might perform on a
model anyway.

Speaker 1 (39:14):
So it sounds like, if I'm getting this right, the
key takeaways are more using that particular technique to impose
the smoothest and monotonicity essentially improve the results regarding to
be able to achieve smoothness and to get monotonicity where
you may have variables jumping our own, you know, as

(39:34):
it kind of when they should be going in one direction.

Speaker 2 (39:36):
Would that be fair to say?

Speaker 3 (39:38):
That's a brilliant summary. And maybe the other interesting thing
that one can say is that it's not just about
good actual modeling practice. Often, if you've got model variables, sorry,
variables in your model that jump around quite a lot,
you can upset all sorts of stakeholders, most importantly your
policy holders. So there may be some variables where just

(39:59):
for commercial reasons, you want to impose that sort of
smoothing otherwise sensible relationships. So I hope the idea of
the ic net or the as net can allow actories
to use neural nets in a way that is both
more actually but also commercially appealing.

Speaker 1 (40:19):
Yeah, that's that's helpful, And I hope folks take the
time to read these papers. This paper and a couple
of hours that we'll discuss. So continuing the conversation on
deep learning. Applying classic actuarial concepts to deep learning. That's
something that you've experimented with in recent years. That's not
just a topic of this episode. But you know you

(40:39):
don't have to research on this. So taking a step back,
you know, what does actuarial deep learning look like? If
you now we're combining concept of deep learning now.

Speaker 2 (40:48):
With actuarial practice.

Speaker 1 (40:49):
If you coin that expression actuarial deep learning, what does
that look like to you?

Speaker 3 (40:54):
So, Tom, I really like that term, and I might
borrow it if I may as well. I think a
lot of what one has to be aware of when
you start getting into this actual data science field is
that most of these models were developed outside of insurance.
Insurance isn't the primary it doesn't occupy a primary place

(41:17):
in the thinking of the people developing these models. So
for that reason, a lot of the things that we need,
particularly within actual practice, just hasn't been accounted for. So
we've just spoken about smoothness and monotonicity. There's also a
positive aspect to it that over many years of actual practice,
we've kind of figured out how do you build good
models and insurance markets. And an example of that might

(41:39):
be the idea of credibility. Just because you've got one
policy with a particular amount of claims experience, don't give
that data point full credibility. Look at a wider portfolio
and blend between different sources of information. So I think
the way I describe actual deep learning is how can
you bring in all of these good ideas and good
practic this is that have been developed within thetual discipline,

(42:05):
and how can you apply them within this fantastic, new
modern technology that we've basically all got access to. It's
bringing together the good parts of both and making sure
that the models that you produce are fit for purpose
if you want to deploy them virtual purposes.

Speaker 1 (42:22):
I'm glad that you mentioned credibility. We're talking about classical
actuarial concepts and how you blend those with deep learning models,
and that brings me to your next paper. I know
you have several papers, but we're primarily going to touch
on three in this episode. The second paper, the Credibility Transformer,
and I'm quoting from the paper itself. You demonstrate a

(42:44):
novel credibility mechanism which leads to predictive models that are
superior to state of the art deep learning models. That's
what you claim in the paper itself. So two questions
I have is what is a credibility transformer and how
does it enhance the learning models?

Speaker 3 (43:02):
So transform let's start with what transformer models are. So
transformers are probably the most important model class within deep
learning that were developed in a twenty seventeen paper called
Attention is All You Need by a bunch of researchers
at Google. The lead author's surname is Vaswani, And basically
this technology now underlies all of the large language models

(43:23):
that you see AFT. In practice, it's all a bunch
of transformer models, maybe amplified and made bigger than the
original one, but that's what a transformer model is. And
the insight that we had when you look closely at
the structure of a transformer model is that you can
actually restructure the model to take account of credibility between

(43:44):
two things, and those two things are what are an
individual set of policy characteristics. So if you're talking again
in the brass in context, it might be past Clane's history,
age of the driver, where they love et cetera, versus
what is the overall portfolio experience? So if you think
of your classic credibility formula. It's generally a blend between
a more individual set of experience and a portfolio set

(44:08):
of experience. And what we were able to show is
that if you structure your transformer right, you can actually
have your transformer perform that credibility calculation for you, which
is a blend between a portfolio's experience and then individual
model experience. And what do we mean that it actually
enhanced the state of the transformer model. So we took

(44:30):
an open source data set and we fit a very
complex deep learning model a transformer to it, using all
of the tricks of the trade from the large language
model world. I'm not going to go into those in
too much detail, but basically all of the good modeling
practice that has developed since that twenty seventeen paper on transformers,

(44:52):
and what we found is that, okay, just fitting one
of these models does lead to quite accurate results, but
applying this credibility mech fanism within the transformer even improves
over those and in some instances the jump in improvement
in accuracy is similar to the difference between a simple
neural net and a GLB, so quite quite significant. And

(45:16):
that's really what we showed in that paper, So for me,
that's almost become a case study that I like to cite.
But how you can bring traditional actorial thinking and there's
nothing more traditional than credibility theory into the latest cutting
edge models, and a good reason for doing it is
not only that you can then get actorial stakeholders to

(45:37):
understand what's happening in a much more intuitive and foundational way.
You can actually get real poosts in predictive accuracy from
doing that.

Speaker 2 (45:46):
Now, what was a reception for that? You know?

Speaker 1 (45:48):
Are you seeing people? I know the papers? I think
it is fairly recent, at least within the past year
or so. Are you familiar with anyone who's tried to
implement that in practice?

Speaker 3 (45:57):
So we will be releasing open source code to allow
people to implement that in parts so people can try
it out, and next week all presenting it at the
Actual Society of South Africa's annual convention. So I hope
that there will be some interesting feedback from participants. But
it has been received quite well, particularly bar some audiences

(46:19):
in South Korea and Japan.

Speaker 1 (46:21):
That's excellent and I think that's the essence of this episode.
It's not just even though the theme is deep learning.
It's not just about deep learning, because, like you said,
deep learning in a vacuum is a concept or a
construct that may or may not be relevant to actuaries
without that broader insurance and risk management concept. So I
like that blend of a vactuarial theory with deep learning.

(46:45):
That's that's what we're trying to get at in this
in this episode. Absolutely no to your the third, the
third and final people that we're going to discuss today.
Something that certainly is near and dear to the Maverick
brand is the actuarial profession itself. You recently published an

(47:06):
AI vision for the actual profession for the cis E Forum.
So what can the actual profession do to capitalize on
this era of nets and generator AI?

Speaker 3 (47:18):
Yeah? I think don There were a few main points
I try to make in the paper, and maybe it's
more accurately called the manifesto. I think the one is
that we should embrace AL in the actual profession. We've
got this fantastic history and legacy of deep knowledge of
financial statistics, and there's no reason to be averse to

(47:42):
a new technology which may do certain things we've done
classically that may allow us to do those same things better.
So I think one thing I'm trying to advocate for
in that paper is let's be AR friendly and let's
look at it not so much as a threat, but
as an opportunity to enhance what we've always done. And
if we take this concept of actual deep learning, or

(48:04):
what I'm calling the paper the AI enhanced, Actually, the
vision that I have and what I hope the professional
will respond to, is that we blend the best parts
of what the actual profession has always been with the
opportunities and new technology is enabled by deep learning. That's
really what it's about. And I think if I summarize
the point that I'm making that paper, I think we

(48:26):
stand tod add a little bit of a crossroads. So
we either can embrace this new technology and become leaders
in our traditional fields of whether it's insurance or investments
or pensions, and maybe we can even go outside of
our traditional fields as we build our deeprofessional expertise with
these models. Or I suppose there's a downsoud scenario where

(48:47):
we don't see, we don't jump on the wave with
our surfboard and surf and the opportunity passes us by
and instead of an AI enhanced profession, we may be
competing with other professionals using ARS. So again, what I'm
advocating for is, I think we need to seize the
upside because there's also some significant downs out costs of
not doing that. And really the vision is one of

(49:11):
blending AR with the traditional principles, practices, and the legacy
that we have as being part of a great, old
historic profession.

Speaker 1 (49:23):
I love the vision, and I think you use the
term AI enhanced actuary, which I think is very simple
and encapsulates what you just said. It's blending the two.
I have a couple of follow ups. When I was
reading through the paper, I noticed a few things. One is,
you talked about reserving, and you talked about an actual
versus expected analysis that would typically be a company a

(49:45):
reserve review, and you talked about I guess using AI
to augment that and come up with observations. And the
first thing that came to my mind is, you know,
thinking of actuaries reserving actuaries who've done that for several
decades is just curious. I know, maybe we have not
enough time to observe this, but what do you think
the reception to that thought process? Is because that's just

(50:08):
something that I know has been burn on butter for
reserving actuaries, do you feel like they're going to feel
like their territory is being encroached.

Speaker 3 (50:14):
Onn Yeah, I mean, I think reserving actories are a
large part of my career has been in reserving I
think a couple of interesting things. And then reserving actories
are under immense stress and pressure as the demands of
them increase, perhaps particularly outside the US. So you've had
the introduction of Solvency two, so you have to perform

(50:35):
a cut off reserves in your Soulncy two classes of
business using the solency two specification. At the same time,
if you are in an IE for a seventeen compliant jurisdiction,
you also then need to perform a cut of eye
for a seventeen compliant reserves. So more and more is
being asked of your actories on the one hand, and
at the same time better data is becoming available. You

(50:58):
might now be able to reserve a muti book part peril,
or you might reserve for different states instead of just
doing one overall reserving triangle. So the overwhelming feeling I
see among reserving actrees that I deal with is people
know they have to do more, but they don't really
have time or resources to do it. I think that's
exactly where AR can plug in. Instead of you having

(51:19):
to puzzle over reserving AVE analyzes coming from fifty lines
of business and work out which accident your problems are
coming from, if you've got a trustworthy AAR system that
can immediately pinpoint which lines of business have got a problem,
which accident JOS is coming from, and draw the parallels
among lines of business. So, for example, is there an

(51:40):
issue of social inflation or is there a particular issue
with claim severity in several lines? That can immediately put
you in a foss in a position where you understand
what's happening in your triangles much faster than you otherwise
could have. So again, I think it comes down to
what's our attitude. Is it about enhance the great work

(52:01):
we've always done, or is it around okay, the machines
are taking over, And I hope and I think our
attitude told very much be the former rather than the latter.

Speaker 1 (52:12):
Yeah, And I fully agree, and I agree with the
thought process. I do certainly think and it's not necessarily
a bad thing. I think that when we have these
types of changes, like you said, you talked about the
vision of using AI to speed up some of these observations,
so you have to spend the time going through the
triangles or you know. But I think in reality you're

(52:34):
going to have I think there's going to be folks
who may be exposed because perhaps that's the way they've
always done things, that's what they were focused on. And
you may find that there are some folks who were
spending the majority of their time doing some of the
things that AI can now do. So I think, certainly,
I think, even though I think that will be not
necessarily the majority of folks, that will be certainly some

(52:56):
The second thing I wanted to ask about is regarding
I think you mentioned this, you talked about in the
event that we don't embrace it. Now you have people
in data science, people who may not have the domain knowledge,
people who or maybe they do, maybe have some domain knowledge,
and they're unconstrained by the requirements of professional guidance or regulation.
I think this one is interesting because I really do think,

(53:21):
and this topic I think came up several times in
the annual meeting, is that that's really where I think actuaries.

Speaker 2 (53:27):
At the value is the context around the modeling.

Speaker 1 (53:29):
I think you can have the most sophisticated techniques, but
if you don't have the right context for the models themselves,
I think that's where actuaries a lot of the value,
being able to put it in the context of the
domain and in the context of the business and the decisions.

Speaker 2 (53:42):
So that's an opinion I guess that I share.

Speaker 1 (53:45):
I don't think it would necessarily be as simple as
if actuaries were not to embrace it then being unilaterally displaced.

Speaker 2 (53:52):
I think there could still be some displacement, but I
think it's one thing. Yeah, go ahead, sorry.

Speaker 3 (53:56):
I think what's interesting about that is actually is a
great act providing the industry knowledge and the context for models.
But if management finds that some of the data scientists
down the corridors are making more money using more advanced models,
that's when you're going to have a real commercial consideration,
and the softer value add that actories can currently bring

(54:19):
may not be enough. So I'd very much like to
see a future where actories are the best modelers in
the insurance domain, as well as being able to provide
that full overlay of the professionalism of the ethical perspective
and the business context. And I don't think that there's
any reason why we can't do that. Maybe it's just

(54:41):
an extra course you need to take, or it's an
extra curriculum to go through. And I think that's again
part of the peril versus the opportunity. If the opportunity
sees you can have actories being the pre eminent modelers
in the insurance domain and perhaps even extend outside. But
if not, and other modeling professionals can do a better

(55:03):
job the commercials smartly to a different outcome. So yeah,
I think I think that's exactly the upside. In the
downside scenario, excellent.

Speaker 1 (55:13):
Well, if I were to recap that what we talked
about today, you know, we had a great opportunity to
to talk about from the fundamentals of AI, look at
that how statistics differs from machine learning and deep learning
and understand you know that spectrum of AI. We you know,
we talked about deep learning and using neural nets, which

(55:36):
is one of the most common examples describing what that is,
talked about some of the ways of addressing the over parameterization.
We spoke about a few of your papers which were
specifically applying either applying actuaryal concepts or addressing very specific
things that actuaries would contend with through the context of

(55:58):
deep learning. And then we close with a vision for
the future. So I can't think of a bet. Well,
you know, a few few episodes have been I think
more valuable than this and and I love the cutting
edge episodes where we're we're looking at things, where we're
being forward thinking, thinking of talking about these cutting edge

(56:18):
topics and being able to share that with the community,
not just share it, but share it in a very succinct,
meaningful way, I think is the most rewarding part of this.
And you just want to thank you Ron again for
joining me a second time and sharing your expertise and
very excited for this new chapter for you and in
terms of the entrepreneuri adventure and as always as keep

(56:40):
in touch and you know, just wishing you the best
as you move forward with things.

Speaker 2 (56:44):
Tom.

Speaker 3 (56:44):
It's always a pleasure to speak with you. I always
have left inspired and looking forward to our next conversation
and keep doing the great work on your podcast. That's
one of my favorites. So thank you so much.

Speaker 1 (56:55):
We'll do and I hope actually I hope that our
next conversation is in person.

Speaker 2 (56:59):
I hope to get this that forgot some point.

Speaker 3 (57:01):
Absolutely, we look forward to welcoming here excellent.

Speaker 2 (57:04):
Well, have a wonderful rest of the day. Wrong, take
care by bye.

All Episodes

Episode Transcript

Popular Podcasts

Stuff You Should Know

Cardiac Cowboys

The Joe Rogan Experience

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}Episode 103: Deep Learning

Episode Transcript

Popular Podcasts

.css-r6mb8g{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:1;overflow:hidden;}Stuff You Should Know

Cardiac Cowboys

The Joe Rogan Experience

All Episodes

Episode 103: Deep Learning

Stuff You Should Know