Why Data Normalization Costs Consumer Brands Millions in Sales

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Speaker 1 (00:02):
From Alloy AI.
This is Shelf Life.
Today, we're doing a deep diveon the nuances of data in the
consumer goods industry.
Today, brands connect to dozens, if not hundreds, of partners.

(00:24):
Goods industry Today, brandsconnect to dozens, if not
hundreds, of partners.
How can you look across all ofyour retail, e-commerce,
distributors and supply chainpartners to get a complete
picture of your business?
I'm your host, abbey Carruthers, product Manager at Alloy AI.
We'll be back with two of myAlloy colleagues Manfred Reichi,
subject Matter Expert in CPGData, and Matthew Nias,
engineering team lead, rightafter this.

(00:46):
As a consumer brand, youconnect with dozens of external
partners and internal systems toget a complete picture of your
business.
Each one is different.
Alloy AI makes it easy toconnect data from retailers,
e-commerce, supply chainpartners and even your own ERP,

(01:06):
then easily surface insights todrive sales growth.
Every day, brands use Alloy AIto see POS trends, measure
promotion performance and makebetter replenishment decisions
with their retail partners.
That's why we're trusted by Bic, crayola, valvoline, melissa
Doug Bosch and many more.
Get a demo at alloyai today.
All right, manfred, matthew,welcome to Shelf Life.

Speaker 2 (01:31):
Great to be here.
Thanks for having me.
Good to see you, Abby.

Speaker 1 (01:39):
Good to see you too.
Okay, to start us off, couldyou guys share for our listeners
just a little bit about yourprofessional backgrounds and how
you came?

Speaker 3 (01:45):
to work with consumer goods data.
So I started my career inconsulting, implementing SAP,
these massive projects thatinvolve, you know, configuring
software to help companiesoperate more efficiently and
mostly keep track of theirfinances.
I was kind of lucky at the time.
Many of my projects involvedworking with master data teams.
I didn't know what it took toconfigure a software to make it

(02:08):
work right um, both foraccounting purposes and for
actually like running a business.
So I did that for a few years.
I actually traveled to like 10different states rolling out sap
.
And then when, whenever Ijoined, I joined Alloy about six
years ago and I stumbled uponthe same need master data

(02:30):
configuration and all thesedifferent things.
So almost a decade of workingwith data across the board.
And I don't know, I happen tobe one of those two people that
really likes this topic.

Speaker 1 (02:41):
Love, that A unique trait.
And Matthew, what about you?

Speaker 2 (02:46):
Yeah, I came here straight out of university so I
went to UBC right here inVancouver, worked at some pretty
big tech companies, but reallyliked Alloy being small, being a
little scrappy and working onreally interesting problems and
being on a small team getting alot of ownership over those a
small team getting a lot ofownership over those.
One of the first big projectsthat I worked on when I was here

(03:06):
was how to ingest reallygeneric data and have standard
validations for that, and then,right after that, working on a
new product model that allowedmuch more transparent product
matching to happen, which Ithink we'll talk about a lot in
depth today and I really enjoyedthe technical aspect of those
problems getting a lot ofownership and also seeing the
impact of that with customersworking with customers, trying
to handle all the edge cases andreally feeling a lot of
ownership over reallyinteresting problems.

Speaker 1 (03:29):
Well, we're definitely going to want to dive
into those details.
Okay, so zooming out.
Today we're talking about datanormalization.
Could one of you just give us ahigh level of view what you
understand data normalization tobe and why it's something that
consumer brands should careabout at all?

Speaker 3 (03:44):
I've worked with normalization for almost a
decade and I've never stopped tothink about what it actually
means.
So I actually asked ChatGPTthis morning as everyone does
nowadays, right and it gave me asuper long answer.
I'm not going to read the wholeanswer because I actually only
liked one little piece.
It says normalization is theprocess of trying to make data

(04:06):
more informative, consistent,comparable or standardized.
So that's very generic.
The standardization piece iswhat I like, because ChatGPT
then outlines these four or fivedifferent applications of
normalization and signalprocessing, data science and all
these statistical things.
But it missed what I think isimportant for retail companies,

(04:28):
which I usually describe as amulti-language translation layer
.
So part of what normalizationdoes, when you think about
standardizing retail data, isthat every retailer speaks their
own unique language and I'm 99%sure it's going to be different
than the language a brandusually uses internally.
So normalization really meansstandardizing all these

(04:53):
different languages into acommon language that can help
companies analyze data in oneplace, rather than having 17
different Excel sheets indifferent languages that only
apply one retailer at a time.
So yeah, a little bit of what Ithink it means and what it does
.

Speaker 2 (05:09):
And I'll add that it's not just the CPG brands who
are our customers or theretailers.
There's also distributors.
There are data providers anddata harmonizers.
There's category data that weget.
There's so many differentplaces that our customers are
now getting data from and, justlike Manfred said, they're all
reporting it in differentlanguages and not just in
products or locations.

(05:29):
We'll get to metrics and time.
There's all these differentdimensions that people are
reporting in their own uniquelanguage and to have a global
view of what's happening withinyour organization, you really
need to do that normalizationacross a lot of different
dimensions, across many dataproviders.

Speaker 1 (05:52):
So you've both mentioned language and
translating a few times.
Now what's really the benefitspeaking one language?
I know, manfred, you said youknow in one place but for these
companies, what are the insightsand the outcomes that they can
unlock from having that data bein one consistent language?

Speaker 3 (06:05):
So if you think about running a business, right, when
you sell consumer packagedgoods, you are going to want to
sell in as many retailers aspossible, right?
So from a business standpoint,you don't actually care about
where you're selling.
You're trying to sell as muchas you can across the board.
You can sell at Target, you cansell at Walmart, you can sell
at Target, you can sell atWalmart, you can sell at Best
Buy, amazon, and you're reallytrying to open up channels every

(06:28):
year and open up many salesstreams.
When you then consider datalanguage, right Like?
All these systems areintegrated within their own
walls to speak their own uniquelanguage.
So I was blown away a few yearsago when I found out that most
brands make decisions with onlyusing their own data.

(06:50):
What did they send into theseretailers?
You actually have no idea whatthe consumer ended up doing at
the shelf at Target or at thewebsite, online with Amazon, and
the reason people or brandswere not analyzing their
business based on the consumeris because they didn't know how
to translate between all thesedata sources.
Right Like?
There was no let's call it noduolingo, right?

(07:13):
No translation layer betweenWalmart language, target
language, amazon language, intoyour own to your own.
So I think that the analysisthat they're now able to do by
understanding the consumer andnot just looking at a very
segmented world of one retailerat a time but no, what are

(07:34):
people doing across the whole US, across these 20,000 plus
points of sale helps them reallymonitor what's actually going
on at the store and react,versus having a siloed team
where maybe your target teamtalks to the Walmart team and
maybe they can kind of align.
But if you have people runningon different Excel sheets that

(07:56):
can't speak that same languageit's like doing business with
you know Asia and you don'tspeak Mandarin You're not going
to be able to have aconversation.

Speaker 1 (08:05):
Awesome.
And then, matthew, can you tellus, from a technical
perspective Asia, and you don'tspeak Mandarin.
You're not going to be able tohave a conversation.
Awesome.
And then, matthew, can you tellus from a technical perspective
.

Speaker 2 (08:13):
Why is this translation something that's
hard to do?
I think there's multiple levelsof it.
From the technical warehouse,data warehouse perspective, we
obviously have to ingest thedata into a common language just
to be able to store it.
You don't want a differentdatabase schema for every single
one of your different dataproviders, and so there's some
form of transformation that hasto happen just in order to
ingest the data.
But then, on top of that, Ithink really what Manfred is

(08:33):
talking about is how we actuallyuse that data afterwards.
Everything is being reportedunder different identifiers,
maybe in different time.
Granularities People are sayingthat they have different
identifiers for what's reallythe same product, different
descriptions of locations thatare really the same location,
and if I'm trying to view any ofthat, I need a common language,

(08:53):
just like Manfred said, andwe'll get into some of the
details later about how we dothat on a more technical basis.
But really this challenge isfirst transforming the data so
that we can physically store it,but then also making all these
relationships between commonproducts, between common
locations, having roll up anddesegregation so you can see
things at different time.
Granularities converting sothat all of our calculations are

(09:14):
consistent for metrics.
Those are all the things thatwe're thinking about in order to
view data within this commonlanguage.

Speaker 1 (09:19):
On the technical side , so you mentioned a few
different dimensions there.
I've heard you both talk aboutproducts, locations, time, and
there was a fourth.
You're going to have to remindme what was it Metrics.

(09:40):
And I know, Manny, that evenmaster data management on the
product side is something thatyou and I have had many passion
discussions about over the years.
Can you share a few examplesabout?

Speaker 3 (09:54):
what makes product normalization or product master
data management so difficult forconsumer brands.
So it's actually a combinationof three things.
Matthew's going to talk allabout the technology, because I
think his team has done anawesome job on the tech.
So I'm going to focus on thepeople and the process side of
master data management.
Because let's say that, abby,you work for Willy Wonka's

(10:17):
Chocolate Factory and you wantto sell a brand new bar of
chocolate.
Right, you worked super hardwith Willy Wonka himself.
You've designed this deliciousflavor of something.
You want to sell it, you wantto produce it, you want to
package it, you want to get itout the door.
Well, you have to configurethis product within your walls
and create what they call aproduct mess.
Right, like it's going to getassigned a SKU number.
For simplicity, let's call thisbar of chocolate SKU 123, which

(10:41):
is Abby's secret flavor.
Now you want to start sellingthis at Target, at Amazon, at
Walmart.
Your sales team has to go tothese customers and you have
this new master data item, theSKU.
You have to go convince Walmartand these retailers to place an
order on your behalf.

(11:02):
They're going to create theirown SKU number internally to
place this order for 1,000 barsof chocolate from you.
So there's all these stepsinvolved in creating.
For every item you want to sell,you need a product master, you
need to understand the recipe,you need to understand how
you're going to sell this, andtypically that's just what

(11:25):
people refer to as master datamanagement.
You in the Willy Wonka worldhave a team dedicated to making
sure that every item is uniquelyidentified so you can run your
business.
As soon as you go outside yourwalls and you start talking to
all these other retailers, itgets very difficult.
You bring in more people.
It's no longer just your teamwho manages your master data.

(11:45):
You have to talk to the Walmartteam, the Target team.
You have to be very clear inunderstanding what they are
going to call your item, keepinga cross-reference sheet between
these two items.
So there are many technicalcomponents that are complicated,
but even the fact that you haveto loop in a bunch of people
you need very defined processesto create the SKU to make sure

(12:06):
you can sell it under the rightnumber it's pretty involved in
the amount of steps you have togo through to properly run a
business and configure thesoftware to even be available
for you to do that.

Speaker 1 (12:18):
And can you share any examples or particular stories
you've come across that havemade product master data
management challenging to solve?

Speaker 3 (12:28):
So, like everything with software, the hard part
comes with all the exceptionsyou might encounter, right?
So I'll keep telling the storyabout we're launching Abby's new
secret recipe of Willy Wonkachocolate and you've now
convinced Walmart, target andAmazon to place orders from you.
Your supply chain team is goingto sell cases of 12 bars at a

(12:50):
time, so when they order, peopleare going to order in cases,
turns out.
You know Target just wants toorder differently.
They prefer ordering in eachbecause they want more
flexibility and you want yourproduct at the store.
So although you have your entiresystem configured to know that,
like, when one word places anorder, it represents a case of
12 units, it turns out that nowTarget wants to be treated

(13:13):
differently.
So you have to configure anexception somewhere in your
system to remind you that whenTarget orders 10 of your
chocolate bars, it doesn't mean120 bars in 10 cases, it
actually means 10 cases, itactually means 10 bars and like,
if you don't keep track of this, you can imagine how you can
just ship significantly moreproduct.
You can affect all yourmerchants if you don't charge it

(13:34):
accordingly.
And now you're stuffing achannel because you haven't
properly managed the master dataassociated to this one.

Speaker 1 (13:43):
Actually reminding me of a recent example where we
were working with one brandwhere they called an each, a
pack and a case an each.
And it's just another greatexample of not only the
technical challenge but thelanguage challenge on top of
that.

Speaker 3 (13:58):
We have a term for that here.
Like we always language policeourselves across the board,
right?
Like to your point, what is aneach?
You know I will assume thateach represents one individual
unit.
One bar of chocolate, Somepeople in supply chain, and each
means a case.
Some people refer to it as apalette, right?
So even the words in English weuse can trick us into confusing

(14:22):
ourselves on, like what weactually mean.
So we try to always languagepolice ourselves and make sure
that we are speaking in numbers,not in words, and confirming
okay, 12 bars or one case of 12bars.
It's tricky.

Speaker 1 (14:37):
Absolutely.
Numbers never lie.
So, matthew, what does it meanwhen we add unit conversion as
another layer on top of thistechnical challenge of mapping
products?

Speaker 2 (14:47):
Yeah, so the way we think about product matching is
that you're basically drawinglinks between products that in
the real world are the sameproduct.
So Walmart might have their ownidentifier for a product that's
really that same bar ofchocolate and you have your own
identifier with a Wonkaidentifier that's that same bar
of chocolate.
And with those links we arecopying over attributes, walmart

(15:09):
has all of this colorfuldescriptions and store
information and sellinginformation about their
chocolate in their ownidentifier and then you might
provide your own description,your category, your subcategory,
some information about Abby'ssecret flavor on your own side
and whichever way you're viewingthe data, we want you to see
all of that rich information.

(15:29):
If you're viewing it in termsof Walmart sales data or your
own internal SKU, we want you tosee all of that descriptions.
With unit conversion.
That becomes more complicatedbecause some of those attributes
are going to be shared and somearen't.
You can imagine thatdescription is going to be
relevant whether it's an eachesor a case of chocolate, but
maybe a conversion factor thatthere's 12 bars in a case,

(15:50):
that's only going to be relevanton a case and in the inverse, 1
over 12 will only be relevanton the each.
So we have to startdistinguishing attributes and
their purpose within our systemwhat is relevant across unit
conversion, what's specific to aspecific type of unit
conversion and what is relevantcross unit conversion, what's
specific to a specific type ofunit conversion.
And we code this all in so wehave logic which is saying what
kind of attribute is this, whatis the nature of this match?

(16:11):
As we're doing all of this,this attribute association, so
that you're still viewingrelevant actor data in your
dashboards- Sounds like unit ofmeasure conversion is a fun one,
then what else?

Speaker 1 (16:22):
Any other examples of fun edge cases that make this a
tricky problem to solve?

Speaker 3 (16:28):
I have so many.
So let's talk about productrollovers.
Okay, so we'll keep telling thestory.
You just introduced your newrecipe and you like SKU 1, 2, 3.
It's actually done pretty well.
But someone told you that ifpackaging was red instead of
blue, you actually sell more.
People tend to buy redchocolate bars more than blue.

(16:48):
So you go back to Willy Wonka.
You convince him to changepackaging.
Well, it's the same chocolatebar, so are you going to change
your internal number for it orare you going to keep
manufacturing SKU 1, 2, 3?
Become the decision right.
We go back to your process.
What does your rulebook sayabout when you introduce a new
SKU and when you don't?
To keep it simple, we'll sayyour team says no, no, no, it's

(17:11):
the same recipe.
Packaging is just anafterthought.
We're going to keep it SKU 1, 2, 3.
We go back to your retailers.
We go back to Target, walmart,amazon.
You think they're going tofollow your best practice and
your process.
You know we talked about.
Target was the one with thetricky unit of measure.
Let's say they actually acceptand they keep the same number

(17:32):
right.
This time they're going to beeasy on you.
But Walmart doesn't like it.
They want their internal systemto differentiate a blue SKU
from a red SKU, even if it's thesame chocolate bar.
So now you're dealing withWalmart's going to have two
internal identifiers for yourproduct.
What are you starting to do,right, like it's still the same

(17:57):
bar of chocolate.
In your system, it representsone identifier of a master data
item.
Walmart orders are now reportedon two different ones.
How do you handle that?

Speaker 1 (18:03):
How do you handle that.

Speaker 2 (18:06):
Yeah.
So this is another example ofwhat I think of as the long tail
of product matches.
When you're thinking aboutmatching products, you can think
, okay, walmart reports a UPC, Ireport a UPC, that's all great.
Maybe Walmart reports it as adifferent thing.
Walmart UPC and you have UPC.
That's also pretty easy.
You just remap the column names.
All these other examples, unitconversion, these rotating SKUs

(18:30):
are all these edge cases thatwe've encountered and that we've
built systems to handle.
In Manfred's example, if youhave a different granularity
that you're reporting on youhave a different SKU for the red
versus blue packaging andWalmart doesn't then you need a
many to one match where many ofyour products map to one Walmart
product, and when you'reviewing the Walmart data, you
have to be able to disaggregatethe data into your SKUs or have

(18:52):
a clean roll-up so you're stillseeing the sales data accurately
within some higher categorylevel attribute from your own
perspective.
Similarly, we've seen retailersthat actually rotate SKUs.
Maybe it's seasonal, maybethere's other reasons, and so
Target may be reusing SKUs,maybe every quarter, maybe every

(19:16):
year, and those have to map allto the same of your internal
SKU.
You have many target productsthat are matching one of your
own vendor product and so youhave many different ones in both
directions.
And in all these cases, even asthese product masters are
changing, you have to haveconsistency.
You have to be able to maphistorically as well as next
quarter's sales, so that you canlook back two years of history
and all see your data that'sfixed and still accurate.
Some other examples that we haveis manual overrides.

(19:37):
Even with all of this handlingthat we have, the reality is
that data is messy.
We see this all over the placeand we haven't designed a system
that is supposed to handle 100%of it automatically, because
that's just not realistic.
The way that you really preventthis is that you build really
good systems to have users comein, fix mistakes that happen,
and even as you're receivingconsistently incorrect values

(19:59):
from retailers, your fixes stillpersist.
So we have ways to make surethat when you're manually
matching products that Walmartconsistently insists is some
other product and they're wrong,that your values are actually
true, so that you're seeing allof your analysis to be accurate
in your own system, and the talecontinues.
Obviously, there's more andmore edge cases as you go down
this and there's lots of thingsto think about.

Speaker 1 (20:22):
Edge cases is what make it fun, right?
Exactly.
So I think it's interestingthat you talk there about, you
know, the roll-up of the data atdifferent levels, because
that's starting to get into howwe use it no-transcript.

Speaker 3 (20:45):
It depends on who you are.
I'll use the red and blue is agood example.
I think Matthew reminded me ofwhere this is actually very
common is with seasonal skews.
If you imagine going into thechocolate world, you're going to
sell very well in Valentine's,you're going to sell very well
in October for Halloween andthen usually around Christmas.

(21:05):
Every year you might have aslightly different packaging,
right?
So let's say that last year wasblue, this year is red.
If you are a sales analyst whowants to understand right, I
need to be able to predict howmuch I'm going to sell next
season versus what I did lastseason.
You actually don't care aboutthe red and blue.

(21:26):
It actually gets in your way ofdoing analysis, because Matthew
mentioned the inability totrack historical data.
If you know you're going tosell red this year and you're
looking for sales on the red SKU, there's nothing there because
you sold it on the blue SKU lastyear.
So that's where the model thatMatthew designed that he can
support multiple Walmart IDs androll it up to one common SKU

(21:48):
for the sales guy.
It's exactly what he needed.
What did we sell last year ofthis chocolate recipe, no matter
the color, to do this analysisforecast for next year.
If you're a supply chain person, you're on the hook for
producing the right product.
You actually do care of whetherthe orders are coming in red or
the blue SKUs.

(22:08):
Do care of whether the ordersare coming in red or the blue
SKUs.
So what's neat about ourproduct model and I don't think
Matthew has hyped it up enoughis we are designed to support
all these cases.
At the end of the day, weoperate a database right, so
databases are pretty rigid.
But Matthew's team has beensuper creative at designing
these databases to be custombuilt for consumer packaged

(22:29):
goods to easily handle thesescenarios so that the sales guy
can look at it with theaggregated number.
The supply chain guy can lookat it, you know, depending on
what he wants, with a differentidentifier, without actually
having to ever write a SQL query, like it just works out of the
box because of how we designedit and how we read the data from
the first place.

Speaker 2 (22:49):
And one of those things to throw in.
Maybe the marketer is trying tofigure out did my red or my
blue packaging do better?
And they don't actually careabout whether it was the single
individually packaged thing orit was a whole bouquet display
of your chocolates.
They really care out of all ofmy red sales versus all of my
blue sales which one did better.
And we can also do that Likeit's completely flexible to

(23:10):
granularity so you can berolling up across different
product SKUs and just care aboutthe one feature what's the
color of my packaging and lookat that year over year
comparison.

Speaker 1 (23:19):
I love that you mentioned displays there,
Matthew, and the theme of a longtail edge cases.
I know that's another fun one.
Displays are shipped in as adisplay and sold as individual
units.
We don't necessarily have to gointo that right now.
Any other examples you want toshare on products before we move
on to locations?

Speaker 3 (23:37):
Not necessarily an example, but I'll keep trying to
praise Matthew's team and howwe've designed this, because a
lot of what we've talked abouthas focused on just the
architecture design to supportmultiple to many mappings.
Right, the red and the blueversus one roll-up.
We support thenot-out-of-the-box.
We can also support the factthat Target and Walmart are

(23:58):
going to report a differentnumber and maybe they don't do
the red and the blue.
We support that right.
So our database does itautomatically.
But I think he very quicklymentioned something I wanted to
double down on, which is thefact that we actually provide
automatic product mobilizationfor many of our big retailer
data feeds.
Right.

(24:18):
So in my mind it's two steps.
Is the system designed tosupport all the different levels
of granularity?
Right?
Can I speak all the languages?
Check and then does a humanhave to go define the thesaurus,
the Duolingo?
We tried, because we've donethis for so many years.
We know that Walmart and Targetand Amazon provide their

(24:38):
translation buried deep in theirportals.
So we actually designed anautomatic process so that out of
the box, you Abby, in yourWilly Wonka world can get a
normal automatically normalized.
You know report and and hementioned exceptions.
So you know our translationlayer is only as good as what we

(24:59):
find in there, and people aregoing to point to the wrong
thing.
So it was like a year, a yearlong project of how we learned
to be able to override theseexceptions, and it's so powerful
.
When I go back to the peoplethat process, most companies do
not have the right people to bebuilding translation layers

(25:19):
across all their retaillanguages.
It's insane.
And if you imagine, the processrequires you to go do this
every product, every retailer,right Like?
The amount of work growsexponentially.
So the fact that we couldautomatically do this and then
only your only direction is whenwe know there's an issue, you
override an exception.
We've cut your workload I don'tknow down to like 95%, right

(25:45):
Like we can already do most ofthis and you can have the
analysis for your marketing team, your business team, your
supply chain team all in oneplace.
It's kind of fun to think backall that we built for the last
six years, because I used to bethe human doing a lot of this by
hand until these guys built thetech to do this, and it's
awesome that we can offer it outof the box.

Speaker 1 (26:06):
Absolutely, and I think those escape patches that
you mentioned, matthew, are alsowhat's really key there.
I know we've all struggled withwhen data isn't correct, right,
data has problems, data hasmessiness, and making sure you
have the correct tools in placeto be able to work around those
as well, that's really key.
All right, so we spent a lot oftime talking about products.
I want to move us on to thesecond dimension of well, a lot

(26:28):
to get through here.
We talked about locationnormalizing as well, so help me
understand.
What does that mean relative toproduct normalization?
How is it similar?
How is it different?
What are the challengesinvolved there?

Speaker 2 (26:42):
From a technical side , it's quite similar.
You still have the same concept, which is different places.
Different data providers arereporting locations under
different identifiers.
They are reporting them indifferent ways.
Maybe one is reporting Walmartas the entire retailer and they
say you have a bunch of sales orshipments into Walmart and they

(27:03):
can't be more specific thanthat.
And somebody else is sayingactually, this specific Walmart
location with this ID is sellinga lot of product, and you
really care about that.
And somebody else is sayingactually, this specific Walmart
location with this ID is sellinga lot of product, and you
really care about that.
And then someone else is sayingactually, this Walmart location
we don't know what ID it is atthis address is selling a lot.
And maybe the latter two aretalking about the same place.
And for the first case, youhave to be disaggregating your

(27:24):
data.
You have to be breaking downthose sales into where the sales
are actually happening indifferent Walmart locations.
And so it's the same coreproblem, which is they're
speaking different languagesabout the same real world thing
in this case, a location ratherthan a product and you want to
be viewing the data consistently.
I don't want to have to spin upa dashboard and be in my head
summing different rows becausethe location identifiers are

(27:44):
slightly different fromdifferent data providers.

Speaker 1 (27:47):
And so, manny, what are the business problems that
users are looking to analyzewhen they're trying to use data
about a store or a warehouse orsome other real world location?
What are they trying to lookinto when they're using data
that's coming from differentsources that might be reported
in these different languages?

Speaker 3 (28:04):
So if we go back to how I was defining normalization
, which is standardizing datasets so they can be combined
together, you can imagine howtrying to analyze your
e-commerce business is verydifferent than trying to analyze
your retail business, because,you know, target has 3,000
stores, amazon you can actuallyship to unlimited postal areas.

(28:24):
So how do you know where you'redoing?
Well, right, but the fact thatwe can understand the
geographical location of allthose stores, as well as we can
understand all the zip codes ofwhere you're sending your Amazon
products, allow a business teamto analyze how they're doing in
Texas versus California.
So we established thatnormalized language so that you

(28:45):
can start combining these datasets and abstracting from
individual stores to moreregions and different areas that
you might want to analyze.
You can even tag.
You know some people have theirsales teams divided up by
regions, right?
So if you wanted to analyze,you know you have a West Coast
team, an East Coast team andmaybe some in the South.
So three field reps or, likeyou know, representatives.

(29:08):
You can tag all our master databy state with the field rep and
now, within like in the samedashboard, you can just analyze
how are they all performingversus each other by region.
So it's all about making itseamless to report the data in a
standard language in one place.

Speaker 1 (29:25):
And so you talked there about retail stores and
e-commerce consumer postal codes.
What about higher up the supplychain?
What are the challenges youface when it comes to
normalizing data arounddistribution centers, warehouses
, production facilities?

Speaker 3 (29:45):
So it's pretty straightforward to understand
what's happening at the store.
So it's very it's prettystraightforward to understand
what's happening at the storebecause retailers report usually
down to that level.
They tell you unit sales, theytell you inventory there allows
you to analyze performance andstock rates.
All that when you start goingup the supply chain and you are
trying to take action to correctissues.
Okay, you need to know whoservices that subset of stores,

(30:09):
because you don't necessarilyplace order at all.
Thousands of Walmart locations.
There's actually between 30 and50 Walmart DCs that you need to
service.
So the first step we startbuilding what Alloy calls a
supply chain graph so we havethe ability of connecting all
the stores to where they'rebeing serviced from within the

(30:31):
retailer.
And then, if you're going allthe way upstream to your
warehouses, some people have onewarehouse.
They have a very simpledistribution network.
Others have four, six, ten,right.
So if you want to startcombining, you know before you
take action on Walmart, do youhave enough inventory to respond
to something?
It's important to translate allthese insights into the

(30:51):
language that your team speaks,because usually your internal
team is going to be aligned byyour internal warehouses, not by
, you know, walmart DCs, not byWalmart stores, so it's
important to be able to rollthese up to the.
You know the right language totake action.

Speaker 1 (31:09):
So, manfred, you mentioned Amazon and Amazon
Postal Codes.
I know one of the data quirksthat we've had challenges with
it in the past is the fact thatthese days, with e-commerce,
what is a store, what is ane-commerce sale?
Right, you've got tons ofdifferent types of ways that an
e-commerce sale can be fulfilled.
It can be buy online, pick upin store.
It can be shipped from store,shipped from a DC.

(31:30):
So when you're thinking aboutlocation normalization, how does
that world of fulfillmentmethods come into that?

Speaker 3 (31:38):
solution.
Yeah, you're bringing up atricky subject, because when I
started at Alloy, it was veryeasy to determine a brick and
mortar sale versus an e-commercesale.
Usually e-commerce wereassociated to an e-commerce sale
.
Usually e commerce wereassociated to specific
e-commerce warehouses in theretailer world right?
So if you place an order ontargetcom, they would fulfill it

(31:58):
from a specific e-commercewarehouse Super easy, you know,
through the COVID push andthrough all these, you know
modernization of e-commerce.
Like you say, you can now placean order online and pick it up
at a store.
So even though the good isleaving a physical store, it
should be actually recognized asan e-commerce sale, not a store

(32:19):
sale.
So it's talking about how wemodernize our database to keep
up with the industry.
This is actively something weare exploring on how to tag
things.
It's no longer just enough toknow the store number where the
store happened.
You need to know thefulfillment type of that sale so
that you can properly associateyour e-commerce channels to

(32:41):
your store channels.

Speaker 1 (32:43):
Absolutely.
And then again it's even morecomplicated when you're looking
at inventory right?
Because where the inventory isdeducted from depends on the
type of sale as well.

Speaker 3 (32:51):
Yeah, so this goes back to our concept rate of
language policing.
What used to be super easydesignation of e-commerce sales
on the location dimension, it'snow hard for us to know.
A given location can actuallycontribute to two types of sales
now brick and mortar sale andan e-commerce sale.
Now that I'm talking aboutlanguage policing, we've talked
about products.

(33:11):
We've talked about locations.
There's another level ofharmonization that I think
people often don't think about,which is metric needs.
So let me ask you this what isnet sales?

Speaker 1 (33:23):
Putting me on the spot here, manny Net sales, I
would say would say, well,firstly, assuming we're talking
about point of sale sale, so theconsumer sale, um, I'd say it's
yeah, the total volume, uh,consumer bought in a certain day
, not including returns, so netof returns, basically bringing
that product back I love it.

Speaker 3 (33:45):
You went with the, the alloy, which is what I call
the supply chain definition.
You are tracking the flow ofgoods.
What left a store versus whatwas returned back into the store
gives you the net outflow ofgoods.
You talk to a business team.
When you hear the word net, alot of people usually think
about.
You know, in finance terms, it'swhat it brought in and money

(34:07):
versus what it costs, right?
So people will will think abouta different definition of debt
and and you can imagine, right,we've talked about all these
portals speaking differentlanguages when you see the word
net sales at each one of theseportals, we have to make sure we
know what we're looking at and,like our team is actually
trained, as we designed ourfeeds, to confirm if a sales

(34:28):
number includes returns, does itinclude tax?
Does it not include tax?
There's all these questions ofthings that we're trying to get
into the supply chain definitionof sales, because we're trying
to track the flow of goods first, then the correct flow of money
, but we would call it margin,when you want to track the money
you made versus what it costyou, right?
So there's a third dimension ofnormalization around metric

(34:53):
names that we often don't thinkabout.
But you have to be reallycareful so that you're comparing
apples to apples across allthese data sources, not just
what you think is the samenumber, and then you'd be
completely under-reportingsomewhere.

Speaker 1 (35:06):
And so what are some of the things we can do?
If there's one retailer thatreports a certain metric and
another retailer doesn't reportthat metric, but that's a key
KPI that we're wanting toanalyze, what can we do to make
sure that we have that data fromboth of those retailers?

Speaker 2 (35:21):
It's a great question .
One of the main approaches thatwe do to solve that is to
derive metrics.
So we have a full list ofhundreds of metrics that we can
ingest, and they can both beingested raw, which means that
the retailer directly reports it, and this is what Manfred's
talking about.
Maybe the retailer, theirdefinition lines up with our

(35:41):
internal definition of thatmetric.
We'll import it raw as thatmetric.
The other option is to deriveit from underlying raw or
derived metrics and you canimagine it like a graph where
you're building up a series ofmetrics.
You start with whatever is yourraw metrics that the retailer
provides and then, based onsatisfying the requirements, the
dependencies of any potentiallyderived metrics, you derive

(36:03):
those.
A really simple example would bein-stock versus out-of-stock
percentage.
Maybe one retailer providesin-stock, another provides
out-of-stock.
Well, they're just the inverseof each other, one minus the
other value, and so whicheverone the retailer reports, we
derive its flip version.
Maybe some retailers don'tprovide in-stock or out-of-stock

(36:24):
.
Instead they provide theirstock values, how many in-stock
units they have and a minimumthat's required.
They need at least 10 units tobe treated as in stock.
Well, we can calculate what isthat stored in stock or not,
that's just the Boolean value,and then we can take a
percentage across all of theirlocations that are supposed to
supply that product and we cancalculate in stock percentage
from that and we can calculateout of stock percentage.
So you can see these get moreand more complicated, from just

(36:45):
one minus a value to having someper location value that needs
to roll up across tracked itemsand locations across lots of
different destinations, and fromthat you build this graph of
derived metrics.
That is a really rich,consistent, normalized view of
both metrics that retailersprovide, as well as all these

(37:06):
other ones that we've added in.

Speaker 1 (37:11):
So you stop needing to rely on the retailer
necessarily reporting the datato get that particular insight.

Speaker 2 (37:15):
Yeah, I mean, at a certain point we have to rely on
the retailer to give us somevalues, because that's what we
derive values from.
But we don't need them to kindof duplicate themselves.
They only need to provide oneof the values and we'll do the
math on the back end to fill inthat picture that they only need
to provide one of the valuesand we'll do the math on the
back end to fill in that picture.

Speaker 1 (37:29):
That's great.
So any customer stories you canshare, Manny, where people have
benefited from that capability.

Speaker 3 (37:36):
So, like you're saying, retailers have different
levels of maturity in how theyreport data.
So you mentioned one scenariothey might not even report
in-stock percent.
It's a very important metric totrack, rather than just your
inventory.
Knowing your historical trendsof in stock is really important.
So you now get in Allo, you geta computation that works out of

(37:58):
the box.
All we need is inventorynumbers over a given time range
and we can actually compute.
We know whether or not a storeshould have inventory.
We give you a in-stock percentthat you can track.
Now, to have a conversationwith a retailer that didn't have
the capability in their portalto provide this insight.
So now when you go to them andyou say, hey, I'm out of stock

(38:19):
of this item, they might be likewell, we just ran out and you
can be like actually, we knowyou've been out for four weeks,
six weeks, right, and itbehooves them to order more.

Speaker 1 (38:33):
So it's all about bringing the right insight to
order more.
I've definitely seen thatcoming to your retail partners
with that insight, with thatdata, before they bring it to
you, is the way to make forhappy retailers.

Speaker 3 (38:42):
They're overwhelmed.
They have to manage many, many,many brands in many locations.
Right, and if there's anythingthat Alloy can do to provide the
right KPI like you mentioned instock percent is a good one.
It's important to be able togive you that insight so they
take action.

Speaker 1 (38:55):
Absolutely All right.
So I'll move us on to ourfourth and final dimension that
we were talking aboutnormalizing, which is time.
So what does that even mean tonormalize across time?

Speaker 3 (39:05):
fiscal calendars and how they're different than your
regular calendars, and everyindustry will have a different
calendar.
Some of them decide, likeconsumer electronics might

(39:28):
decide to start the year on acertain month, and suites decide
to start a year on a differentmonth.
Every retailer usually lines upwith their own fiscal calendar.
So what we found is that if youat Wonka right want to run a
report that says what did I selllast year, the first question
should be what does last yearmean?
What is your fiscal calendar?
We will define it for youaccording to whether you're, you

(39:51):
know, april 445 or 554,whatever you want.
We will define your calendar sowe can speak your internal
language.

Speaker 1 (39:58):
We will define your calendar so we can speak your
internal language, and then allthe raw data sources, no matter
the fiscal calendar that theyreported, we bring in so that
you can look at it from yourcorrect language.
And so tell me how that workstechnically.
What does it mean to actuallyshift that?

Speaker 2 (40:12):
data into a different fiscal calendar.
Yeah, on the back end we juststore metric values on
particular days.
So when you look at the datakind of raw in our data
warehouse, it looks prettysimple, but there's a lot going
on both to get the data intothat form and then also to view
it in a helpful way in the UIand dashboards and analysis
tools.
So when we're ingesting data,there's two things you have to

(40:34):
think about the granularity thatthe data is reported in and
also the cadence that it comesin.
So you could have daily datathat has a data point for every
single day, but it only comes inonce a month or once a week and
maybe they are restating datahistorically and so maybe they
say hey, actually we told youthat you sold 10 chocolate bars
last Wednesday.
Actually, some were returnedand we don't track returns, so

(40:54):
we're just going to tell you wesold eight last Wednesday.
So that daily data has to beupdated.
It has to be the freshest datathat we have and we'll store it
there.
They might also report weeklydata.
Hey, you sold 100 chunk of ourslast week.
We don't know whether it was areally good Monday or a really
good Saturday, but somewhere inthat week we stored it.
So we have different ways todisaggregate that once you're

(41:17):
reporting in the UI but wenatively support daily and
weekly data we just store thatin our database and then, when
you're querying it, depending onthe granularity that you're
requesting the data in, we'llroll that out.
So, if you're looking formonthly data, well, based on the
fiscal calendar, we will groupthe days that line up with those
months and just aggregate thedata whether it's a sum of sales

(41:37):
or some other more complicatedmetric across all those data
points that line up with thosedates.
If it's weekly data, similarthing If it's daily data, maybe
we have to disaggregate.
We'll do a guess hey, probablyone-seventh of each of the sales
are happening on each day andwith those fiscal calendars, the
really flexible thing about ourbackend is that we can support
any arbitrary date selection.

(41:57):
We don't limit you to hey, youcan look at the last day or the
last week or the last four weeks.
You can say, hey, I want March19th in 2020 to April 17th.
I'm just making up days in 2022.
You can give us any days thatyou want and we'll happily
aggregate the data up and reportit to you, and that's what's
really powering this is thatthese fiscal calendars are

(42:17):
automatically converting whatdoes last month mean into
specific days, based on thepartners that you have based on
the fiscal calendar, and thenreporting that data and
aggregating it up for you.

Speaker 1 (42:28):
So one of the questions I hear a lot is how
did our performance grow year onyear?
Curious to hear from both ofyou.
When we say year on year, howwould you think about defining
that?
Does that mean, you know, thesame Walmart week as last year
week?
Would you think about definingthat?
Does that mean the same Walmartweek as last year?
Week 42 compared to week 42?
Does it mean the same calendarperiod?
Does it depend on whenconsumers were shopping last

(42:50):
year?
Was it week four Thanksgivingor the week after Thanksgiving?
Curious to hear how you thinkabout that.
Is there one particular way ordo we need to support all of
those as different methods?

Speaker 3 (43:00):
about that.
Is there one particular way ordo we need to support all of
those as different methods?
So I will default to the fiscaldefinition.
Right, like?
Let's say that last week wasweek 30 in your fiscal calendar.
We should be comparing a week30 in your fiscal calendar last
year, right Like?
That's the first.
That's how usually most of youraccounting software is going to
be tracking year over year,right?
So we're going to try to lightup.
If your fiscal defines a weekstarting in Monday, ending on
Sunday, we will try to shift allthe raw data Monday to Sunday,

(43:24):
find out the 30th week of theyear and give you values from
that same period a year.
So it doesn't matter if it wasa leap year or something crazy
like it's not necessarily 365days ago or 363, we will line up
to the fiscal.
We have the ability to toggleto a different one.
If you're having a conversationwith the retailer who has a
different definition and theywant a slightly different

(43:44):
definition every year, you canactually configure that too.
So, going back to all thelanguages we're able to speak,
we're going to try to default toyour language first, but you
will always have the ability tospeak a retailer language to
have that conversation.

Speaker 1 (43:58):
So tell us why all of that matters from a business
perspective.

Speaker 3 (44:02):
So if you are a brand who is starting to utilize
retail data for the first time,you are probably operating your
analysis with humans.
You have analysts who aredownloading data into Excel.
Small brands, or actually mostpeople, tend to make decisions
at the monthly level.
You don't have enough time in aweek to operate at weekly or
daily level insights, so you'regoing to be downloading reports

(44:23):
from each portal, usually rolledup to the month.
What can get tricky is if yourWalmart portal, when you define
a month, is defined as fourfiscal weeks that preceded your
period, four fiscal weeks with28 days.
Versus a different portal I'llname Dollar General, a random
one.
Let's say that when they definelast month, they actually mean

(44:44):
last calendar month, whichhappen to have 31 days.
If you are marrying the twodifferent Excel reports at the
monthly level, you're nowcomparing 31 days in one report
with 28 days in another youranalysis is going to be off and
people stick to monthly level.
Because Excel has a maximum rowcount, you max out a little bit

(45:05):
more than a million rows.
The beauty of what Matthewmentioned earlier about the
importance of daily data is thatonce you're using our database
designed for this, we willactively search for all your
data at the daily level.
We want the lowest levelpossible because it's quite easy
to roll up from day to week tomonth, and I can even define
what week and month meandifferently.
Right, but people shy away fromthat just because they don't

(45:28):
have the tools.
But when you come into Alloyand you bring in data in our
database, you're now unlockingthe same monthly analysis
powered by daily insights, likedaily data points, and you don't
have the problem of comparing31 days versus 28.
We will make sure that you'recomparing whatever definition of
the month you actually wantedseamlessly.

(45:50):
So it's pretty important to beable to make the right decisions
and again, time is one of thosedimensions people don't think
about because we all assume timeis the same and I've seen
people make big mistakes bymisconstruing what one is.

Speaker 1 (46:05):
Absolutely.
Those three days can make a bigdifference, right?
Seems like there's a lot morethan meets the eye when it comes
to normalization, Wrapping up.
If you're speaking to a newerconsumer brand company just
getting started with all this,just trying to wrap their arms
around it, what would your pieceof advice be on where to start?

Speaker 3 (46:22):
Ask for help, right?
So actually don't assume like,don't run away from this
challenge because it's hard,right, like most people shy away
from it because for decadesit's been impossible to
normalize all these differentdata sources.
People are usually bound toanalysis within their ERP walls

(46:43):
because they're comfortable withtheir ERP walls, because they
have IT teams that'll give themthese insights.
But you are way too removedfrom the end consumer and all
the retailers you're workingwith.
So, ideally, don't be scared.
Go, one at a time, build thesetranslation layers we're talking
about.
You now know the fourdimensions that really are key
to doing this and you know askfor help.

(47:10):
We've done it.
It works out of the box.
So there's better ways to dothis insight than, like, trying
to rebuild all this from scratch.

Speaker 1 (47:14):
Ask for help, solid advice there and, matthew, any
final words of wisdom to share.

Speaker 2 (47:18):
From an IT side, I would say be really careful with
the assumptions that you'remaking.
We split out this into fourdifferent types of normalization
because we've been doing thisfor a relatively long time.
We have a lot of experiencewith different sizes of
companies that we're workingwith and hundreds and hundreds
of different retailers and howthey represent data, and we've

(47:38):
come to this idea of productlocation, time and metrics and
then, within that, therepresentation decisions we've
made.
Because of all that experience,and even while we've been doing
these projects, we've had torevisit those technical
decisions that we've made.
What granularity are werepresenting things?
What assumptions can we makeabout the data?
And you always have to makesome assumptions, otherwise it's
impossible to representanything.
But we've had to kind ofrevisit ideas where we made too

(48:04):
strong an assumption because wedidn't understand the shape of
the data that we'd be seeing inone month or one year.
So be really careful with thelimitations that you're putting
on yourself, because there's somany edge cases.
The tail is really long and atsome point you're going to be
surprised at what you have tomodel.

Speaker 1 (48:18):
The tail is really long.
I like that as well as a littletakeaway.
That's all we have time fortoday.
Thank you, manfred.
Thank you, matthew, for sharingyour wisdom with us, and we'll
see you next time on Shelf Life.

All Episodes

Episode Transcript

Popular Podcasts

24/7 News: The Latest

Dateline NBC

The Clay Travis and Buck Sexton Show

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}Why Data Normalization Costs Consumer Brands Millions in Sales

Episode Transcript

Popular Podcasts

.css-r6mb8g{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:1;overflow:hidden;}24/7 News: The Latest

Dateline NBC

The Clay Travis and Buck Sexton Show

All Episodes

Why Data Normalization Costs Consumer Brands Millions in Sales

24/7 News: The Latest