Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
(00:00):
Hi everybody, and welcome back to the Better Business Analysis
podcast with your host, BenjaminWalsh.
And today we're diving into a topic that every modern BA needs
to understand. That's right, this BA Bytes
episode will be focused on modern data management and
analytics. The Better Business Analysis
(00:22):
Institute presence, the Better Business Analysis Podcast with
Benjamin Walsh Data is the backbone of decision making.
It's the backbone of AI, machinelearning and as BAS we need to
know how to work with it, analyse it and ensure its
(00:44):
quality. But with new technology like
data fabrics, modern analytical methods, and automated
pipelines, how do we keep up? Well, don't worry, I've got you
covered. In today's episode, we will
break into 10 things you need toknow to get started with modern
(01:05):
data management as a business analyst.
So the first thing we need to really talk about is that number
one, data isn't structured anymore.
OK? So if you're in the world of I
guess relational databases, which was new when I started in
IT, or you are around cleaning data and putting in spreadsheets
(01:28):
and so forth, the world has 10 steps ahead of you.
Once Upon a time data was mostlystructured and there were neat
rows and columns in a database. Think Excel if you haven't
worked in databases before. But now we have structured, semi
structured and unstructured data.
And before we were trying to getthat to be structured so we
(01:50):
could use it. But now techniques have kind of
caught up and also we are more close to source.
And So what I mean by that is text, images, incentive things,
so IO, TS, sensor data, social media interactions, we need to
be able to deal with all of those things.
And to be honest, you can't structure all that information
(02:14):
and keep up with the structuringof that information in time to
use it. So we need to better deal with
that from those 3 formats we just talked about structured,
semi structured and unstructureddata.
And as BAS, we need to understand where data comes from
and how to work with these, I guess, diverse formats in terms
(02:37):
of collecting it, capturing it, processing, storing it,
transforming it. So we can get it into a form in
which it is consumable for whatever use case that we want
to use. So that might be back into data
pipelines or systems. It might be a Power BI report,
so business intelligence reporting.
(02:57):
It could be integration with other systems.
It could be back to consumers. So there are lots of different
consumption use cases and some of those are structured,
unstructured and semi structuredand their formats differ as
well. That leads me into #2 the data
(03:18):
management life cycle. It is critical.
Data doesn't just appear and disappear.
It follows a life cycle. And some of these are quite
different. Gartner has one.
There is kind of some data governance forums that have a
good life cycle. You would have heard a version
of this. Here's one.
It's ingestion, storage, processing, transformation,
(03:41):
analytics and disposal. But there are different ways of
talking about that. So I said consumption, which I
prefer analytics, which is a consumption type.
So there is a really standard data management life cycle and
you can argue about whether or not data goes in a straight line
or which it doesn't, whether or not it can go back through those
processes. But there are fundamental
(04:03):
building blocks. There's about five of them.
And even with different terminologies and different
organizations, they're they're really consistent.
OK. And I would take a Gartner
approach here and just not get involved in organizational
distractions. Usually your government or best
practice in the private industryhas already defined these
things. Now knowing where data is in the
(04:24):
life cycle in those five or so steps helps BAS define
requirements and align stakeholders on expectations of
data quality, what they might need to do to collect the data
or what state it might be in. And I'm experiencing this right
now with a very key client and expectations, I tell you, are
(04:45):
all over the place. OK.
And people don't realize we needto invest either money, time,
process, change engagement in order to enrich your data #3 is
important. And this is changing the game.
So even if you are maybe a data architect, a data analyst
(05:06):
working in a more traditional environment, data warehousing
environment, you need to know #3which is data fabrics, OK?
And they're changing the game. So it's weaving data through
pipelines. Traditionally data was managed
in silos. OK, So think of different
blocks, maybe one block per datamanagement life cycle step or
(05:27):
per application or per use case.So maybe you've got data in
Salesforce, maybe you've got some data in a data warehouse,
maybe you've got data in CRM Dynamics 365, maybe you've got
it in SQL databases, maybe you've got it in spreadsheets,
maybe you have it in survey forms.
This is a typical organization, right?
And so there there is a way thatwe structure what we call that
(05:49):
subject area data or entity datato have a bit of an idea about
where we should store things andwhy we use applications and and
whatnot. And that's leading the way in
terms of application design. But we also need to be aware
that there is simply very few places that are able to work in
one monolithic system like AERP,SAP for example, and that be
(06:13):
their only system. A lot of people move to that,
but that has it's own constraints.
And so there's this acceptance that we're always going to,
we're not always be in control of I guess the ecosystem of our
data. So what our customers use, what
our data consumers want to use, the technical landscape that we
(06:36):
are exposed to. So we need to be able to connect
to this environment. And so therefore you need to use
a data fabric to do that. And that's a new approach that
integrates all data across systems into a unified
architecture. The architecture is still
unified at the high level, both business and technical and it
allows probably real time accessand better analytics.
(06:57):
Now I've said probably real timeaccess, real time actually cost
money. And when we say real time, it
could be a in a day. It's we're not talking about
microseconds here. And things do take a little
while to process through if you want them to be in a right
state. So when we say real time, just
be careful with that term as it be at now.
This is something that Beas needto advocate for when discussing
(07:19):
modern data strategies. And I am currently writing a
paper about this to accept multicloud, maybe on premise on cloud
solutions, transitional states. And we need to really think
about data fabrics as a solutionthere as opposed to
consolidation #4 and we've touched on this, data fabrics.
(07:42):
OK. So the thing about data fabrics,
there is a product called Fabric, Microsoft Fabric, which
we'll get to in a minute. But it isn't the only product
out there, but #4 is data pipelines.
OK. So think about these as
pipelines in your house where you need water to go.
That's a good analogy. And it connecting to the main
(08:03):
pipe, which is also connected toother infrastructure that
provides water to your house. Now these pipelines are like
factory assembling lines, right?So you could think about them as
a factory along a factory or water going through, being
routed through a pipe. And maybe it changes from fresh
water to dirty water to hot water.
(08:26):
Data moves through various stages, OK?
And we need to extract, effectively extract data.
And there's a broader kind of high level abstraction of the
data management life cycle covers a few steps, which is
like collection and capturing the data and maybe getting it
(08:47):
into the state you want. Then we've got transformation
and then we've got load. And this is an old term and we
call it ETL or ELT depending on which way around you do the the
loading and the transformation. Now understanding how these
pipelines work. So these are the technical
capabilities needed to meet the life cycle we talked about, OK.
(09:08):
So I'm going to say that again, the data flows through the data
management life circle, OK. Now more conceptual, they're
both business and technical capabilities, but under the
hood, if you like, and the just that connects our business layer
down to our technology solution,we have these steps which are
broadly now referred to as the data management steps.
(09:31):
And historically, we're talked about in terms of ETL and
understanding how those pipelines work, right, in both
the new world data management life cycle or ETL world, which
are kind of one in the same, just different terminology and
groupings that will help you as ABA ensure that data is
processed correctly and useful for decision making, right?
(09:53):
So I'll give you an example. I have collected data in a
survey. I've surveyed all my customers
about a new product that I have launched.
Now that product might be a web product and might be on my
website and that might be integrated with my CRM solution,
which is say HubSpot. Now I may have a greater
(10:15):
architecture than just those components, but let's just keep
it simple here. I may have sent out a mail, I
guess SurveyMonkey, sorry, survey, and I've integrated that
with HubSpot and I service maybethe product I've got on the
website. So when they use the product,
(10:37):
survey pops up, which happens tobe something different, which is
SurveyMonkey. And when they capture the
feedback, it goes back into HubSpot, right?
So my data moves around. Now in that case, we are
collecting data through SurveyMonkey.
We're actually collecting it there and we're capturing it in
HubSpot. We might be transforming it into
(10:59):
HubSpot. We may be connecting it with
other information from, for example, the website and the
product that we're using. And then we might be say loading
that into say reporting tables and Power BI, for example, out.
And so that we need to think about what is the state of the
data in all those different steps.
(11:20):
Another way of looking at those steps is to look at it in the
data management life cycle term,which I prefer, and to think
about this something called the medallion model where we kind of
classify our data in terms of bronze, silver and gold in terms
of it's usefulness. And so as it moves to the data
(11:42):
management life cycle and gets closer to consumption, it gets
better. And so it's a gold form, OK.
And that's also another way thatyou can look at data in a modern
way. So you may hear those terms.
And that's much better than thiskind of ETL process because it
doesn't really allow you to knowthe quality or it doesn't give A
tag of quality along the way, which is the most important for
(12:04):
most organization #5 is that data quality has levels, as we
just talked about. And you can actually look at
these, not just in this gold layer model, medallion model,
but you can look at these through 6 dimensions.
And poor data leads to poor insight.
So we, we need to be really important about that.
And so one is accuracy. How accurate is the data?
(12:27):
And the trick to making sure that it is accurate is to focus
on its capture. So making sure you capture it in
an accurate way with validation,OK.
And you don't want to build in awhole lot of validation checks
because that might take a long time.
There's completeness. So what data do we need from
different sources to add to the picture to know that our
(12:49):
product, the feedback we've got through SurveyMonkey and the
product itself on the website come together to give us a
complete picture? We need it to be consistent.
So we need to collect it again and again and again through
multiple different time periods,maybe different customer
(13:10):
segments in order to compare it.We also need to factor in
timeliness. So if you've collected data from
last year, you're making a form this year, it's just not good
enough. So a lot of the solutions that
we use traditionally take a longtime to process in a lot of
effort, time and effort. And so we need to use these new
modern techniques to be able to,like we said, real time it.
(13:32):
But what we mean by that is justget it in a more timely fashion.
So within the period in which you need to make the decision.
So if that's within a day, then you need to get it within a day.
If you need it within 1/4, you need it within the quarter, OK.
And that that's the whole process of getting it,
collecting it and capturing it, you know, accessing it,
transforming it, storing it and getting it ready for
(13:55):
consumption. So there's quite a lot going on
there. The data needs to be valid
truth, if you like. OK, so we need truth to the
data. It needs to actually be true.
If you, for example, if you're involved in statistics, you'll
know all about data quality and,you know, surveys and the margin
there and all the rest of it. We can't make decisions based on
(14:16):
a small data set generally, right?
We can make assumptions. And so that's why we say big
data because we need it to be valid and to be accurate.
And the last bit is we need someuniqueness.
So what what we mean by that is we don't want duplicate
information coming from different sources having
different versions of the truth.That's why we talk about single
(14:38):
source of truth, which is the most used word in IT ever.
What we mean that a single view of the truth, not source, OK,
because you will have multiple sources.
So that term, you can kill that term whenever you hear it and
say that term is old school. There are multiple sources.
What we need to make sure and sources are good.
By the way, we need one view of the truth, right?
(14:59):
No doubt our conversation would would be where it needs to be
without adding a bit of boringness to the conversation.
And that boringness comes into two very important areas, which
some people love, I find boring.But do you know what?
I I'll tell you how I get aroundnot being brought out of my
brain when I dip into this. And that is governance and
(15:19):
compliance. So we have regulations, we have
government policies, we have internal policies.
And data governance isn't optional.
OK, you have to do it. BAS need to work with compliance
teams. We need to understand privacy
acts. We need to ensure that data is
handled responsibly, securely and ethics are used.
(15:39):
Now, if this doesn't blow your trumpet like for me, then there
are so many good models out there.
The trick is don't come up with your own look, see what best
practices and adopt it. OK, and then and then if you
need a massager, you can, but I would assume that every, I don't
know education government department in the world has very
(16:01):
similar governance across it. You need to have internal
governance turned out over Baker'cause that's where bureaucracy
can kill good outcomes. So you need to apply data
governance. You can't ignore it.
And So what I deal with when I had something that I don't enjoy
as much, like some vegetables I don't like, is you eat them
first, right? Get them done 1st, and then get
(16:22):
on to the stuff you do enjoy, which might be improving
outcomes through the data you'vegot and insights.
OK, so that's number six. And if we move on to #7 we've
touched on this quite a bit lately, and that is AI and
machine learning. OK?
They're driving insights. But it's so important that to
(16:45):
realise that if you do not have great data, your AI and machine
learning are a waste of time. So this is a prerequisite for
your own, using your own data tomake informed decisions.
Data is not just about spreadsheets and dashboards.
And yeah, sometimes they're really good, but AI powered
(17:06):
analytics can uncover patterns that you can't.
It can predict trends, it can make automated decision making.
And B as need to understand how to interpret those insights,
communicate them effectively andexplain why maybe based on the
data that's been inputted into this, been consumed by these
tools, why those insights might be different to what were
(17:31):
expected. And that if you've pushed for
these tools early, when your maturity model is low, even
though you you want to use them,you want to get these outcomes,
every CIOCTO in the world is pushing for these tools.
If your data is crappy, your insights are going to be crappy
#8 do. This is so important.
(17:53):
And this is where we need to make sure that data is not owned
by digital or IT OK per SE. Not an ivory tower exercise here
#8 is self-service analytics OK?And it empowers teams.
There is a lot of kit out there in the data space.
There are a lot of tools you could use and they really need
(18:15):
to be selected based on your environment.
You need to choose the right tool for the right job and the
right environment. So business unit users don't
want to wait for IT to generate a report anymore.
But not only that is your business users might be data
analysts, they may require data analyst capabilities and maybe
(18:36):
BAS outside of digital and they need access to the data, they
need access to pipelines, they need access to continually
improve these needed access to run their own jobs.
So what tool are you going to use as an interface layer?
So they don't have to be data engineers, but you've set it up
so they can build on top of thatinfrastructure.
(18:57):
Again, the data fabrics, you know, pipelines, visibility,
modern analytical tools, right? They offer self-service
capabilities, meaning anyone canaccess and visualize data.
So that's starting at the analytics end.
The analytics tools are now exposing the data pipeline so
you can see where the data came from and maybe know why you're
(19:20):
getting the insights you're getting.
BA should help design intuitive interfaces and ensure that
stakeholders get the right insights and know why that data
point is the way it is #9 data. Storytelling is a must have
skill so facts don't drive decisions, stories actually do.
(19:43):
And BAS need to go above and beyond charts, numbers and craft
compelling narratives around data.
The collection, the ecosystem, the application framework, the
customer journey to make. Insights clear and actionable
(20:04):
for stakeholders and #10 always align data strategy with
business goals. There is no point in having
great AI pipelines, massage dataif it's not going to be used.
If we go back to our HubSpot example, if we're surveying
customers on the features they enjoyed about our product
(20:25):
through SurveyMonkey, but we're never going to use that to
actually make a change to our product because our product
strategy doesn't incorporate enough, I guess, ad hoc customer
feedback from the website. Then don't do it.
What's what a waste of time. At the end of the day, data
management isn't just about technology, it's about business
(20:47):
value. Every data initiative should go
back to strategic goals or investment objectives and a bit
of business case model and explain whether that's reducing
cost or increasing efficiency orimproving customer experience.
Why do we need to invest in thisdata project?
(21:08):
I've heard horror stories of IT or data teams building data
products, spending millions building data products that no
one wants to use. So what you might find is 2
things. In that case, 1, you haven't
capture requirements and you're not meeting objectives.
So therefore your user base, internal user base or your
(21:30):
customers are not getting what they asked for or what they
want. And there might be another
thing, another insight that I'veexperienced.
Sometimes people want to fish for themselves.
So in a modern data analytical world, we need our users to be
able to fish on top of these tools and a secure, you know,
(21:51):
pond with fish with the rod thatwe give them.
But it is no longer it's job to own data.
I will see you next week.