[Bite] Why Data Science projects fail

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Jason (00:00):
Welcome to the DataCafe.
I'm Jason.

Jeremy (00:03):
And I'm Jeremy. And today we're talking about
failure.
Actually, have we realised wehave a birthday to celebrate?

Jason (00:21):
We have a birthday? Yeah,

Jeremy (00:23):
We are one. How about that?

Jason (00:25):
One year old on the air?
It's so good. I didn't realisethat a year has like crept up on
us. And what a year it's been.
It's been such a pleasure

Jeremy (00:33):
It has it really has, and I didn't, I didn't, I didn't
know what to expect from thisbut I'm blown away by the fact
that it's gone a year, the factwe've done what this is the 17th
episode, is it and...

Jason (00:44):
I don't even know actually.

Jeremy (00:47):
17 episodes not bad going. And I certainly didn't
think we do that many. And it'sbeen such fun. Loved it.

Jason (00:54):
Yeah. And major. Thank you. And shout out to all of our
brilliant guests for that monthin fantastic. And I know that
you've engaged with them a lot.
So thanks to them for theirtime.

Jeremy (01:04):
Yeah, they make it what it is. And it's been it's been
fantastic talking to all thesereally interesting people,
certainly.

Jason (01:11):
And of course, our listeners. Thank you to all of
our listeners tuning in.

Jeremy (01:15):
Absolutely.

Jason (01:16):
And now, what we're going to talk about today is failure.

Jeremy (01:22):
I think this is good failure sort of anyway.

Jason (01:25):
Yeah, I think so insofar as I've been reading these
reports about why data scienceprojects can fail, at least
motivating it from a data pointof view. So the data says that
data projects can fail and up to80. What is it 85% of big data
projects fail according to aGartner report in 2017.

Jeremy (01:46):
Wow.

Jason (01:47):
Which is massive. Yeah.
And I think they had a 2020report that said, CEOs say that
the ones that land only abouteight percent of them generate
value for the actual successrate is really low. Gosh, and
the reason why question is fromthe point of view of Yeah, just
because something has failed,doesn't mean you haven't got a
lot of learnings that then mean,you're more likely to succeed

(02:08):
next time round. And so that'sobviously not captured in these
reports.

Jeremy (02:13):
That's one way. But I mean, obviously, it can go the
other way, which is you've hadyour fingers burned. And so you
never go that down that routeagain. Right. So true. Yeah,
true.

Jason (02:22):
Yeah. And so I think what we can chat about today is some
of the points that we couldmaybe avoid, or help people
avoid for your own data scienceprojects failing, potentially.

Jeremy (02:33):
Exactly. So although this is about failure, really,
it's about let's see how we canlearn so that we can succeed. I
think that's, that's the upper.
That's the real upper. Yes. So Iwant to play a game with you
today, Jason then so if you're,if you'll indulge me,

Jason (02:48):
I love games.

Jeremy (02:51):
I have, I have, I reckon, six pretty broad reasons
why data science projects areprone to prone to failure. And
it can be combinations of thosereasons. Of course, it can be
more than one. But I have sixcore sort of reasons why. And I
call those the micro reasons,but they are they're really
important. But then there's abig Whopper. There's the big

(03:12):
reason why I think a projectprojects fail. So I want you to
maybe from your own experience,or from experience you've read
about I want you to have a crackat seeing whether you can you
can get some of these, and we'llhave a chat about that.

Jason (03:25):
Okay, so this is like, data project. Bingo. I hit them
all. Okay. Oh, yeah, we need allthose same words, that should
have been a one year gift to us.
Yes. Okay, so I have to justcome up with something that
immediately to me is something Iwould like seen as a possible

(03:45):
failure from the exposure thatI've had to projects. And number
one on my list is probablyengagement, right? and
stakeholders need to be engaged,engaged with you as a solution
provider engaged with theproblem, and the company engaged
with the strategy of thecompany, because you want it to
align with their overarchingKPIs. And once they're on board,

(04:07):
great, that's like a majorobstacle number one.

Jeremy (04:12):
Yeah, that is definitely that is definitely one of the
key reasons why projects fail. Ithink if you don't have engaged
stakeholders, I've worked inseveral projects where
stakeholders have fallen out oflove with the idea of doing the
project or never really boughtinto the idea of doing the
project in the first place. Ordon't realise sometimes

(04:33):
sometimes some of the personsaid, Is this the person setting
the project up having thisgreat, oh, we should talk to the
dead science team. They'rereally great. They're, they're
not they're actually nothing todo with the pathway to, to
delivering that project. And sothe people who actually have to
work with the tools that youdevelop, are going, sorry what's
this, where did this come from,we didn't ask for this. And so
that's a real issue. And ofcourse, they're just as much

(04:54):
stakeholders, in fact, really,they are the stakeholders in
this case. So yeah, stakeholderengagement number one, that's a
really good really good get.
Well done.

Jason (05:03):
Oh, I should get a coffee. Oh, yeah. Just sitting
on that victory, like, great.
And get another coffee here. Andnumber two, the data itself has
to be clean, or at least in someform that's usable. Absolutely.
I was at a talk this week andsomebody referred to their data.

(05:27):
I don't know what to call this,but they called it a swamp. It
might have been a cloud or mighthave been a lake somewhere, but
they refer to it as a swamp. AndI was like, wow, yeah, there
could be a swamp of data thatpeople are saying, Let's get
some scientists in to look atthis swamp. And it's like all we
need to prep the data betterthan this. Yeah.

Jeremy (05:49):
Yeah. How it's being collected, when it was
collected, and tell if thatcollection mechanism has
changed, as it always, always,almost always does feel good
time. Who knows about the data?
You know, I think organisationsthat do this, well typically
have curators, you know, who arereally experienced, really savvy

(06:09):
people who look after, insometimes individual tables, and
they they know, they knoweverything about the columns,
the metadata, the provenance ofthat data. Yeah. And and they
can they they are the go topeople for that, that data set
and allow you to then say, oh,what can you tell me all about

(06:31):
that? That how we're collectingthat that longitude and
latitude? And what level ofaccuracy we have? And when we
changed equipment? How did thataffect the the output? All of
this stuff is just so soimportant, but that so often,
you only discover when you'rewell buried within within the
project, quite how parlous astate it can be in.

Jason (06:54):
I've engaged with some data analysts who blow me away
their knowledge of the depth ofthe data that they have.

Jeremy (06:59):
Right? Right, exactly.
Sometimes it's communityknowledge, you know that a team
has that data, that knowledgeabout the data, or the or some
stakeholders have the knowledge.
But it's rarely in one place, Ifind, you know, you often have
to search and hunt high and lowto find the people who don't
really know. So here we go,stakeholders and data we have
two good hits straight off thebat. Yeah,

Jason (07:20):
This is this isn't a bite episode anymore, though, we're
going to be here for a while.
I've a list to go through them.
And number three, one of thethings that I think is the
actual team, the teamwork, theskills that are brought
together, we talked about theneed for diversity of skills and
team for em, you can bring inany amount of bias or a lack of

(07:46):
understanding, if you just havea skills gap, maybe in your team
or just one person working inisolation.

Jeremy (07:52):
Yeah, yeah, you've hit hit another one here. So yeah,
the science, the team who undershave that scientific knowledge,
or don't maybe, you know, it'ssuch a big subject, you can have
a team of 15 people, and maybeyou don't have coverage of, you
know, Transformers in NLP, youknow, deep learning neural
networks, it requires a hugedegree of expertise in a really

(08:15):
broad area. So, I'm gonna rollinto this as well, not just the
not just the coverage of thescientific method, but also the
ways of working with thatscientific method. You know, are
you? Are you a team that goes, Iwant to go down this rabbit
hole, I'm not going to come out,until I've extracted every last
percentage of goodness, or areyou a team that is is is

(08:38):
prepared to do maybe quiteshallow investigation initially,
and then and then deepen it asnecessary? Because I think a lot
of the reason some projects gogo south is is when individuals
maybe decide they're going tojust go go after one particular
problem that's fascinating themand not actually focused on on
the big picture. And the sciencebecomes sort of almost more

(09:00):
interesting than the businessproblem. And that can be a bit
of a bit of an issue I've seen,certainly,

Jason (09:06):
Yeah, it can suck people in.

Jeremy (09:07):
Yeah, it can. Good stuff. This is going really
well. This is this is this isn'tgonna be too long. So we got
lost, we got science, we've gotdata, we got stakeholders what
do we think?

Jason (09:19):
Well, one bit we were talking about there made me
think of deployment, likegetting what you've built,
deployed in the business andproductisation. And one of those
skills gaps could be the abilityto actually get a model,
automated, whether whether it'sversioned and tested and
actually implemented then, foryour user, whoever that might

(09:40):
be, stakeholder, customer.

Jeremy (09:41):
Yep. Yeah. Yeah, it's absolutely the there are many
issues around this. And this isprobably not one this is
probably several reasons but ifyou don't have good engineering
capability, within within oradjacent to your team or within
your project if you're workingin a cross functional team Then
then it's going to be a realstruggle. And, you know, data

(10:04):
science teams that have goodengineers within them are always
more capable, I think and are ina much stronger position to just
to demo what they've done tostakeholders quicker. And then
if you get demos, then you getfeedback, you get feedback, you
get improved product, you getall that virtuous circle stuff
going on, don't you? Suddenly, Icall out particularly in this,
which is a bit of a curse of thedata scientists sometimes, which

(10:27):
is that that you're working on aproblem, you're developing a
model. It's a really nice modeland it but it's quite
computationally intensive. And,and I've seen this several
times, whereby the datascientist pulls out the
solution, after maybe half anhour, or an hour's worth of

(10:47):
processing from their, fromtheir model. And well, you know,
guess what happens that thestakeholder comes to them and
says, what an hour? I was hopingthat was going to take two
minutes. Yeah. Well, I've onlygot 10 seconds for that results
come out to the for it to beuseful. You know, the
engineering doesn't just cover,being able to deploy it, it's
actually got to be done in atimely way as well, which is

(11:09):
really hard.

Jason (11:11):
I'm just going to run this notebook in my laptop in my
locker overnight. And then andthen the results, come out.

Jeremy (11:17):
Exactly. And of course, you know, this is data science,
right, it tends to tends to bemodels, which are consuming
inordinate amounts of data. So Imean, we're not, you know,
there's a big scalability issueto address. It's not really one
issue. It's probably two. Yeah.

Jason (11:30):
And which can be addressed. But yeah, it's not
always immediately addressed.
Yeah. So any other guests, many,how many of

Jeremy (11:36):
You've got four. You're doing really well. So it's to to
go and you know, false guests,but I haven't say no surprises
here yet.

Jason (11:44):
No surprises. Yes. Maybe over promising. And what I mean
by that is sometimes you say,yeah, we'll build you. This
could be echoing the rabbit holewill build you this amazing
machine learning model. Turnsout, they just needed something
quicker, like demo that you justmentioned. He did a demo, he did
a quick cycle. This is adeliverable. And something I've
heard about, or people havesaid, is, if you aren't

(12:08):
embarrassed by your first demo,then you've demoed too late.
That's good. You over promisedsomething and you end up
spending too long.

Jeremy (12:16):
We're just, we just at that stage of the project at the
moment. Yeah, we want to go backto the customer really quickly.
And I'm thinking this is sosimple. He's got this gets
embarrassing.

Jason (12:28):
Sounds like you're right at that point. Yeah.

Jeremy (12:31):
That's a really good point. I like that one. It's not
it's not I haven't got overpromising on my list. I guess I
probably would have rolled thatinto stakeholder. But it's it
easily be ways of working but.
Yeah. Okay. That's good. I likeit. So we alluded to it a little
bit, I get to give you a hint.
So we alluded to a little bitwith the skate the scaling
issue, because if a computationis needed in a particular time,

(12:53):
it's usually needed for areason, right?

Jason (12:55):
Oh, is this outlining the problem from the outset?

Jeremy (12:58):
Right, right. Right.
Right. So this is this is what,what is it feeding into? Right?
What is what does the downstreamlook like? Right? What is the
decision that we're actuallyinfluencing? With our Yeah, I
saw, I saw a nice thread onLinkedIn a couple of days ago,
where a guy was going, Oh, youknow, so, so often. So often
people are using their datascience teams to generate

(13:20):
insight. And you know, what,what really does that achieve,
you know, if you if you'reputting your insights into a
dashboard, or, and he said,Well, why don't I tell you
exactly what happens, youproduce some nice graphs, and
they get put into a PowerPointdeck. And then a month later, it
gets presented as a meeting,it's like, well, you're already
you're already a month behindthe curve, when you do that, but

(13:42):
probably two or three, given howlong it took to run the tool,
and then get it get it across tosomebody. So you know, if you
want impact from a data scienceteam, you've got to design the
forward process that takes thatscience and show which decision
is going to be influenced by it.
And therefore, of course, showthe impact that then comes from

(14:04):
that.

Jason (14:05):
I like that.

Jeremy (14:06):
That's a big one for me, I definitely preached that but a
lot.

Jason (14:09):
I was thinking of something similar when we were
talking about stakeholderengagement, about the roadmap
that along that roadmap that youwere talking about to get
impact. There'll also be a bunchof like, gate points or decision
points that may have astakeholder who needs to be
brought in to get their buy infor the value to be driven. Yep,

(14:31):
that Yeah,

Jeremy (14:32):
Absolutely. So I mean, there's there's there is overlap
in the so the stakeholderengagement Yeah, the with the
decision with the impact thesethese of course, are all
critical things that that playtogether and that none of these
tend to sit in isolation. So thelast one we have we have five
now we got a we got one more toget this it's a slightly fiddly
one. And it depends on theindustry that you come from as

(14:53):
to whether you care about thisor that you everyone cares about
it, but whether it's topper mostin your mind, I think

Jason (14:58):
Interpretability?

Jeremy (15:00):
Ah, no pos Yeah, actually, I might have to just
give that to you. Yeah. Okay.
interpretability usually meansthat you want to know why. And
if you want to know why the toolis producing the decision,
typically there's a regulatorsitting not a million miles away
from you saying, you have toshow me why you've made that

(15:21):
decision, maybe in a medicalhealthcare setting, or in a, in
an ethical governance setting orin a statutory setting. You
know, there are lots ofindustries finance, for
instance, where there is a hugeregulation framework that has to
be not just not just carefullycurated and adhered to, but it

(15:42):
has to be shown to be adheredto, you really got to make sure
that you're compliant with yourregulator, with your industry,
with your even with your companypolicies, of course. So all of
that has to be part of part ofwriting the tool. And, you know,
any data scientists will know ifthey've done this for any length
of time that you fast becomevery, very aware of and then

(16:03):
expert in the regulatoryframeworks that you're operating
under.

Jason (16:10):
Yeah, that's really interesting.

Jeremy (16:12):
So there's one last one last challenge, which I think
for right now,I just want my prize.
You get your prize, prize. Okay,so it's not a challenge anymore.
It's really just to get a lastquestion. So those those I
reckon, are the not so microreasons for why data science
projects fail. But there's,there's something which I think
ties all of these together, andthe companies that succeed have

(16:34):
it, and the companies that thatstruggle anyway, shall we say,
are ones which maybe don't havethis for this particular
property, you want to hazard aguess as to what that might be?

Jason (16:45):
Oh, like, I wanna say, investment from the top.

Jeremy (16:48):
Yeah, pretty much. I mean, it's, it's basically
culture, I think, because you'vegot to, you've got, you've got
to have that cultural desire toembrace what can be, I think, a
really radical departure fromyour usual ways of working and,
and, and also, you feelreasonably comfortable with

(17:08):
quite quite quite abrupt changein the way that you do things.
And that can be a big strugglefor some companies, I think,
certainly, I both worked for andand and seen this distance
companies that have reallystruggled with that cultural
standpoint, where gettinggetting changed into into the
company and adopting this newtechnology, disruptive

(17:29):
technology, and it is highlydisruptive sometimes can be a
real struggle for them. I don'tknow if you've had any
experiences like that.

Jason (17:35):
Yeah, that point you just said about disruption's pretty
key. And I've heard people talkabout needing to embed an
allowance for curiosity, and thepsychological safety that comes
with being okay with not so muchfailing, but experimenting, and
your experiments may notdeliver. But that's why you ran

(17:56):
the experiment to see what needsto change. And any aspect of all
the points we've hit on could bewhat needs to change.

Jeremy (18:03):
I love that and allowance for curiosity. I think
that's a really nice phrase. Itties up very nicely with how a
lot of good teams execute theirprojects, which is to say, they
don't say, we will produce thismodel, and it will do this, they
say, Can we produce a model,which actually impacts the
company with 5% improvement inwhatever metric and and you

(18:26):
know, it's just as okay to comeout with a no to that question
is is with a with a yes to thatquestion. Because you know, it
both outcomes are likely Right.
Yeah.

Jason (18:36):
Very good. That's been really enjoyable.

Jeremy (18:38):
Well, I guess I think it's safe to say that, you know,
culturally, DataCafe has tickedall the right boxes. We're still
going off for a year we have wehave a high allowance for
curiosity on this show. So

Jason (18:49):
Yes, very much so. Ah looking forward to what the
future holds. Thanks for joiningus today at the DataCafe. You
can like and review this oniTunes or your preferred podcast
provider. Or if you'd like toget in touch. You can email us
Jason at datacafe.uk or Jeremyat datacafe.uk, or on Twitter at
datacafepodcast. We'd love tohear your suggestions for future

(19:10):
episodes.
Can I? I can shout Bingo.

Jeremy (19:18):
You can shout Bingo.
Excellent

All Episodes

Episode Transcript

Popular Podcasts

Fudd Around And Find Out

The Breakfast Club

Dateline NBC

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}[Bite] Why Data Science projects fail

Episode Transcript

Popular Podcasts

.css-r6mb8g{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:1;overflow:hidden;}Fudd Around And Find Out

The Breakfast Club

Dateline NBC

All Episodes

[Bite] Why Data Science projects fail

Fudd Around And Find Out