Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Narrator (00:07):
You're listening
to the Assurance Show.
The podcast for performanceauditors and internal auditors
that focuses on data and risk.
Your hosts are ConorMcGarrity and Yusuf Moolla.
Yusuf (00:20):
Okay, Conor.
So this is the firstepisode for 2022.
I need to keep remindingmyself, keep thinking it's 2021.
And today we want to talkabout the word analysis in
the phrase data analysis.
Why this came up is that - justin my own head, and I know in
various conversations we'vebeen having, there's often
(00:42):
a difference of opinion ordifference in understanding,
of what that phase of thedata work that we do as
part of audits actually is.
So thought it would be good tohave a conversation about it
and define it, either looselyor not, but understand where
it comes from what it means.
(01:03):
And when we havingconversations with individuals
that we report to, orindividuals that report to us.
And we talking about analysis,what does it actually mean?
So if somebody says to yougo away and do some analysis,
or if you're telling somebodygo and do some analysis,
what does it actually mean.
Conor (01:18):
Sounds like one of
these topics that you've
been thinking about overthe holiday break Yusuf.
Yusuf (01:23):
Ah, maybe a little bit.
Conor (01:25):
At the back of your head.
Yusuf (01:26):
Yeah, just a little bit.
Through various engagementsover the years, it's one
of those things that neverreally got properly defined.
And like I said, everybody'sgot the differences.
And so yes, I did think aboutit a little bit over the break.
Conor (01:37):
So as with most things,
we probably should start with
definitions or the historyof the word or its etymology
and where that comes from.
So, where do we startwith the word analysis?
Yusuf (01:46):
Etymologically, it
comes from the, combination
of two root Greek words.
The first one is "ana",which is "up" and the
second one is "luein".
I hope I'm pronouncing itcorrectly, so any Greek
listeners, please just,apologies in advance for not
pronouncing that correctly.
But ana for up and "luein"which means "loosen".
(02:09):
So when you combine those, inreverse order, it's "loosen
up" is where it started.
Conor (02:16):
And So when we're
thinking about that definition,
even in modern times, in termsof data analysis, what does
that sort of speak to foryou in terms of loosening up
with respect to the data?.
Yusuf (02:28):
So the various
definitions and there's
no sort of standardizeddefinition for what analysis
is or what data analysis is.
Obviously various peoplehave put the their thinking
hats on and tried tocome up with something.
But broadly speaking,it's breaking datasets
down into their individualcomponents to understand
and explore and evaluate.
(02:48):
That makes sense in terms ofwhere the word comes from.
So the word comes from loosenup and so breaking things up
into their individual componentsto be able to understand
those individual components.
Because obviously,datasets can be large,
either broader or deep.
And so you need to firstunderstand what the
individual componentsthat make the data up are.
(03:10):
And then breaking thatup into those individual
components helps us to thenanalyze those individually.
Conor (03:15):
And obviously
as Assurance and Audit
professionals, quite a lot ofthe time the drivers for our
work, when we're looking atdata, are either a risk has been
identified or an opportunityhas been identified or there
has been some sort of problem.
And that then becomes thedriver for somebody saying,
let's go and get somedata and do some analysis.
Yusuf (03:36):
And then within
that broad phrase, data
analysis or analyzing thedata, there's the various
steps that we'll undertake.
Firstly, understanding thebusiness, then understanding
the data, then cleansingthe data, preparing it,
joining it, matching it,modeling it, analyzing it,
exploring it, whatever.
So we'll talk aboutthose in a sec.
But, yes, data analysis is thebroad sort of umbrella term,
(03:59):
but then within a data analysisproject or data analysis phase,
there's the analysis sub phase.
Often when we breaking up dataanalysis work as part of an
audit, we'll say we've gotthe preparation phase, which
is the initial phase, theplanning phase, then there's
the analysis phase, and thenthere's a reporting phase.
That isn't necessarily verydistinct, because often
(04:20):
you're doing analysis in yourpreparation and you're doing
analysis in your reporting.
So when we say analysis phase,what exactly do we mean?
And that's really what we wantto explore a little bit today.
Conor (04:31):
Where do we start?
So we understand that we'veeither got one of those
three things, a problem,a risk or an opportunity.
We want to use data to helpus to understand, or to
work out what's going on.
And like you've just explained,loosening up the data
initially or decomposing it.
Where to from here.
Yusuf (04:49):
There's several words
that we use, and maybe we can
explore each of those words tounderstand what they mean to us.
It's difficult to getto an exact definition
of what analysis isgoing to be in an audit.
But if we look at thedifferent types of words
that we use, we might beable to get fairly close.
So I've got a list ofabout a dozen words here.
Explore.
(05:09):
Examine.
Understand.
Profile.
Find patterns.
Match.
Check the system.
Run rules.
Evaluate hypotheses.
Check what happened.
Find what does this mean.
Ask the so what.
Find exceptions, and Anomalies.
Not in any particular order.
(05:32):
All of those things contributeto our analysis work.
So maybe we can explore each ofthose in a little bit of detail.
Conor (05:40):
Okay.
So I didn't get to jot those alldown there as you were talking.
I just can't write that quickly.
I think the firstone was explore.
So where to from here.
Yusuf (05:48):
So exploration
happens at all phases
of data analysis work.
Trying to get our businessunderstanding, we will explore.
When we're trying to getour understanding of the
data, we will explore.
When we get the data, youknow collect the data and
bring it in, we will explore.
So exploration is one ofthose things that happens
across the data analysisphase or set of phases.
(06:10):
And this is about reallyunderstanding what is,
and isn't, in the data.
How much of data wehave, what the scope is.
What some initialpatterns we can see are.
What sort of cleansingwe need to do.
And then once we've doneall of that, exploring what
the relationships betweendifferent datasets might be.
Once we've evaluated rulesand tested some hypothesis,
(06:32):
it would be exploringthe results as well.
So explore happens acrossa range of different
aspects of our analysis.
Conor (06:38):
And obviously one of
the most important things we
need to determine during ourexploration is do we have
the right data to help us getto the audit objective that
we're actually looking at?
Yusuf (06:48):
That's right.
Yeah, that's exactly right.
So part of that wouldbe a subset of explore
and that is understand.
We need to understand whetherthe data that we have will
enable us to answer the questionand a subset of that then
would be profiling our data.
And that means getting afeel for what the range is of
the data that we're lookingat, range in terms of the
(07:09):
highest and lowest values,earliest and latest dates.
How many different typesof text we have, what
categories we have.
So that profiling is usuallysort of summary type information
that we pull together fromthe data to understand and
detail what it is that wehave and what we don't have.
Conor (07:26):
Maybe get some early
impressions about what might
be going on in the data interms of the population.
But also to go back perhaps anddetermine if there's any, um,
any gaps or any sort of thingsthat looked odd, even at the
outset that we perhaps needto get a bit more context from
the client or the auditee on.
Yusuf (07:42):
We also look at patterns.
So we look to understand,and this is again, all
through the phases.
So during the initial phase, thepre analysis phase, and we're
slowly getting into why thisis such a difficult phrase to
explain, but were looking forpatterns, in our exploration.
So in our preparation phase,we're looking for what are
(08:02):
the different patterns thatwe're seeing technically within
the data, but then really thepatterns we're looking for
when we're doing our analysisis trying to get a feel for
what the different types oftransactions we're looking,
looking at transactionswhere the different types of
transactions look like broadly.
So trying to identify whatthe flow might be or identify
what the relationships betweencertain types of transactions
(08:23):
would be, that enable usto see those patterns.
Conor (08:26):
And so when you're
looking at those patterns, even
from an early stage, would youhave any expectations going
in before you commence youranalysis, about what patterns
you may see in the data?
Yusuf (08:37):
That initial business
understanding that we obtained
would help us to predeterminesome of the patterns that
we might want to see.
And then when we're lookingfor those patterns initially
through our profiling, we thenwant to determine whether We
seeing exactly what we expectedto see or not, because that
helps us to determine whetherwe actually have the right data.
(08:58):
And then when we actuallydoing the analysis, we then,
again, looking for those sortsof patterns to understand
whether the majority ofwhat we expecting is in
the majority of the data.
We usually don't expect to seemore exceptions than rules.
And so the original rules aroundhow certain things work is what
(09:19):
we want to see in the patterns.
And then that understandingthat pattern helps us identify
where those anomalies are.
But if there're more anomaliesthan real expected transactions,
then we may need to just tweakour thinking a little bit.
Conor (09:31):
Or indeed there may
have been a significant change
in business processes orsomething we're not aware of
that may have led to those,large volumes of anomalies.
Okay.
What's the next word on yourhit list there for analysis?
Yusuf (09:43):
Matching.
So, matching is where we lookto join two datasets together.
Matching has two connotations.
One is a reconciliation,so matching the overall
transactions to a summary.
So the detail thatwe have to a summary.
But more importantly, there'sthe joining data up in
order to be able to properlyunderstand what's going on.
So data normally resides inmultiple datasets and we want
(10:05):
to bring that data together.
So either open data withproprietary data or proprietary
data with proprietary data.
To be able to more broadlyanalyze the exact transaction.
So the typical sort of usecase for that would be where we
have master data of some form.
So that might be a list ofcustomers or list of vendors
or list of employees or alist of system administrators.
(10:25):
And then we have thetransactions that go below that
and maybe a transaction log.
And so matching would bebringing that data together.
And then we also wantto then match that to
potentially external datasets.
So where we're bringing opendata in and trying to match to
be able to extend the range ofthe data that we have beyond
the specific initial dataset.
Conor (10:47):
So a word we're seeing
more and more now in terms
of data analysis in the auditsphere is blending of data.
Is blending the sameas matching, or is that
slightly different?
Yusuf (10:55):
Matching is
a broad concept.
How we match data fromdifferent datasets together.
The technique that we would usewould be a blending technique.
So joining and blending wouldbe the sort of the underlying
techniques that we would use.
Now obviously everybody usesthis terminology different.
So, you may be listening tothis and think, oh, no, I
don't think about it that way.
And that's fine.
They all have different meaningsdepending on how you use it
(11:17):
or how you've been taught.
But the way that I thinkabout it anyway is matching,
is the broader bringingdata together and blending
is a way in which we do it.
One of the ways in which weblend is we join datasets
together through somesort of inner join or
outer join or whatever.
Conor (11:32):
Okay.
So what's the next conceptwe need to be thinking about
then when we're lookingat this word analysis.
Yusuf (11:37):
Okay.
So this is where most of ourminds go when we think analysis.
And that's answering hypothesesor executing on rules.
Depending on which way yougo in terms of your approach
to doing analysis, it wouldeither be I have a hypothesis
and I want to prove it ordisprove it initially, or I
just go directly to rules.
Now whether you have ahypothesis or not, you are
(11:57):
going to devise some rules.
Because the way to answerthe hypothesis is to break
that down into a rule.
The difference between startingwith rules and starting with
hypotheses is where you wantto end up and whether you're
going with an objective basis orwhether you're going with just
a basic rule-based analysis.
We typically gohypothesis based first.
And then once we've identifiedexactly what the hypotheses
(12:18):
are that we're going toprove or disprove, we then
break that down into a setof rules that are specific
to those hypotheses.
So that's where we thengo and say, does A plus
B plus C equal to D.
Or if-this-then-that.
Or are there any situationswhere this particular master
data element doesn't link tothis transactional data element
or this transactional dataelement happens after another
(12:39):
transactional data element.
So there's all sorts ofdifferent types of rules
that we have - masterdata rules, transactional
rules, blending rules.
Not necessarily the largest- so we think about analysis
as rules, but that isn'tnecessarily the largest part
of what we're going to do,which is why it's important
to understand more broadlywhat analysis entails.
Conor (12:57):
We've covered there
developing your hypothesis,
so you can prove or disproveit .And to do that, quite
often, then you need to,again, break that further
down into rules so that youactually run that testing.
What do we need to thinkabout next then in terms
of the analysis umbrella.
Yusuf (13:13):
Yeah.
So what then falls out to thoseroles would be exceptions.
So exceptions to the rule, ifyou like, which we then need
to evaluate to understandwhether we've disproved that
hypothesis for a particularset or we've proved that
hypothesis for a particular set.
And so exceptions - beforeexceptions become anomalies,
we need to do some systemchecks, so I'll talk about
(13:34):
those together - system checksand other types of checks.
So an exception, in theway in which we've been
talking about it overthe years is an exception
that we see technicallywithin the data output.
We expecting a certainresult from a rule and 90%
of the data aligns with that.
But 10% of the data results inan exception before we translate
the exception to an anomaly.
(13:56):
We then need to understandwhether that exception
is a data problem.
Firstly, so we look at that,but then secondly, whether that
exception can be explained.
One of the ways in which we dothat is we talk to the business
to understand what thoseexceptions mean, often we don't
want to go in trouble them.
So we, in an ideal world, wehave access to the system from
(14:16):
which the data was extracted.
So we can go in and havemore detailed check as to
what the transaction entailedand why this exception
might have occurred.
So we take that exceptionand then we go and have
a look at the system.
We then talk to the business.
And then you know, withoutgetting into too much
detail here, we then comeup with a set of anomalies
that will result from that.
(14:37):
And those anomalies then need tobe analyzed further between us
and the business to understandwhy exactly did this happen.
So the exception checkingis why did this happen
technically within the data?
And then we whittle thatdown to the anomalies, which
are the real exceptions, ifyou want to call it that, or
business exceptions, whichwe then go and evaluate.
(14:57):
Those are the types ofthings that we would do
broadly across an analysis.
And then there's threequestions that we answer
as part of our analysis.
That is what happened - and sowe do a whole bunch of steps
to understand what happened.
We then say, whatdoes this mean?
And then, so what.
So the what happenedwas the earliest.
Looking backwards.
What does this mean isoften helped by bringing
(15:20):
different datasets together.
And then the so what, is whatdo we need to do about this?
Or what do we need to, howdo we need to report this
and what is managementneed to do about this?
All of these terms are partof our examination, which
is another term we can useas part of our analysis.
And loosening up ourdata and bringing it back
together is what definesour overall analysis effort.
Conor (15:44):
Okay.
So we've covered a lot ofkey concepts and words there
that pertain to the wordanalysis, including, where
it came from in terms of itsGreek etymology and so forth.
And you stepped us throughthe various things we need to
think about under the umbrellaterm analysis, but no data
analysis happens in isolation.
We don't just do one auditper year or one bit of
(16:06):
data analysis per year.
We need to learnlessons from that.
How do we take the wordanalysis forward to our work
more broadly as auditors?
Yusuf (16:17):
I think we need to think
that think of data analysis
as the overarching I guess theoverarching term that we use to
evaluate that data as opposed toa particular phase and so one of
the things that I think I needto do a lot more of and we need
to do a lot more of is not useanalysis as a term for a phase
(16:38):
in the data analysis itself.
Because I've been caughtin that trap so many times.
Often you go there becauseyou don't really know exactly
what you're going to be doingas part of a particular phase.
So you know, you're going to beunderstanding, collecting data,
you're going to be reportingand visualizing it, but there's
this sort of this gray inbetween where you're not exactly
sure what it's going to be.
(16:58):
So you just sort of slap onthe terminology and it's kind
of like, I've got an analysisphase in my data analysis.
What does that really mean?
So being a bit more specificabout what we're going to do and
thinking about more deliberatelyabout what we're going to do
upfront can help us avoid thatand reduce the ambiguity that
goes with what that particularphase in the data analysis.
Conor (17:19):
And obviously we
reflect always on the work
that we've done in the reportthat we've just completed.
If on that reflection ofthe analysis work that we've
done, we say, oh, this wasactually a profiling stage.
Or we were looking forpatterns here then that can
obviously inform how we do ourdata analysis in the future.
Yusuf (17:37):
That's right.
Yeah.
It also makes it so that,that sort of approach is
really good because it helpsus define in future more
closely what we're going to bedoing, and that's important.
So that there's a sharedunderstanding amongst everybody
in the audit team as towhat the specific process
we undertaking right now is.
Because that ambiguity,we found, that ambiguity
can create difficultyin that understanding.
(18:00):
And it may result in ustaking longer than we thought
we would have, or shorterthan we thought we would
have, for certain phases.
So yes, looking backwards, wewon't always get it right the
first time, but every timewe look backwards, from our
retrospectives or whateverit is else that we do as part
of our end of audit work,try to define that a little
bit better so that the nexttime we have more definition.
Conor (18:19):
Yeah, and more definition
is good for managing all
stakeholder expectations aroundwhat exactly is required.
Yusuf (18:25):
Including your own.
Conor (18:26):
Discussion today
about the word analysis.
And we loosened round thatword and broke it down into
what are some of the subcomponents that sit under it.
Some of the key ones welooked at were exploration,
understanding the data,looking for patterns within
it, profiling what that meansand what you get from that,
matching versus blending, havingan hypothesis based approach
(18:48):
and using rules to actuallytest for those hypotheses.
And then obviously allthe validation of your
exceptions and movingtowards true anomalies.
Yusuf (18:58):
Good stuff.
Thanks Conor.
Conor (18:59):
Thanks Yusuf.
Narrator (19:00):
If you enjoyed
this podcast, please share
with a friend and rateus in your podcast app.
For immediate notificationof new episodes, you can
subscribe at assuranceshow.comThe link is in the show notes.