Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Matt Gershoff (00:00):
I think the
biggest issue and the hardest
part of decision making is beingintentional; it's being
explicit; it's being very clearabout what the problem is and
then having a principled way oflearning about the problem and
then taking action.
I think most of our value,alongside with the technology,
(00:22):
is in that guidance that we giveour clients and that we help
them be forthright and go intothis with their eyes wide open,
as opposed to promising them amagic bullet.
Debra J Farber (00:32):
Hello, I am
Debra J Farber.
Welcome to The Shifting PrivacyLeft Podcast, where we talk
about embedding privacy bydesign and default into the
engineering function to preventprivacy harms to humans and to
prevent dystopia.
Each week, we'll bring youunique discussions with global
privacy technologists andinnovators working at the
(00:52):
bleeding- edge of privacyresearch and emerging
technologies, standards,business models and ecosystems.
Welcome everyone to ShiftingPrivacy Left.
I'm your host and residentprivacy guru, Debra J Farber.
Today, I'm delighted to welcomemy next guest, Matthew Gershoff.
He's Co-founder and CEO atConductrics, a software company
(01:17):
that offers integrated A-Btesting, multi-armed bandit, and
customer research and surveysoftware.
While having been exposed tovarious advanced approaches
during his master's degrees inboth resource economics and
artificial intelligence, Matt'sbias is to try to squeeze as
much value out of the simplestapproach possible and to always
(01:39):
be intentional about themarginal cost of complexity.
Conductrics has been at thecenter of many of the most
recognized brands optimizationstrategies by delivering
decision-making at scale,reducing the technical debt of
legacy technology and bringingdata-informed intelligence and
refinement across the enterprise.
Marketers, product managers ande-commerce professionals
(02:02):
responsible for customerexperiences can then
continuously optimize thecustomer journey, while IT
departments benefit from theplatform's simplicity and ease
of integration with the existingtechnology stack.
So, I first met Matt at the PEPRconference last year and I
thought he had a fascinatingniche of expertise.
I knew that I had much to learnfrom him.
(02:24):
Today we're going to talk aboutprivacy in the context of A/B
testing, experimentation,optimization, machine learning,
personalization and multivariatetesting.
Welcome, Matt.
Matt Gershoff (02:41):
Great to be here,
Debra.
Thanks for having me.
Really excited.
Debra J Farber (02:42):
Yeah, absolutely
, and I'm getting excited
because PEPR is actually in afew days.
It's Friday, May 31st as we'rerecording this and it's coming
up.
It's on June 3rd and 4th.
I will see you there.
But, before we begin our fundiscussion today, I did want to
address the audience on apersonal note.
Thank you all for your patience.
I haven't published a podcastepisode in about a month because
(03:05):
I've been a little busyplanning my wedding.
After nine and a half yearstogether, an attempt at getting
married in March of 2020 inMexico, only to be canceled five
days before the event due to abrand new COVID pandemic, and a
slew of other roadblocks, Ifinally got to marry my best
friend, Mack Staples, on May24th.
That was just seven days ago.
(03:25):
It was a magical, joyful,overcast day in the PNW, amongst
family and friends and a coupleof alpacas, because why not?
It didn't rain, despite aforecast calling for 50% chance
of showers, and I'm stillglowing from the pure, unbridled
joy of the day.
So, Matt, thank you so much forbeing my first guest after a
(03:47):
short break.
Matt Gershoff (03:48):
Thanks for having
me.
Congratulations and mazel tov.
Debra J Farber (03:51):
Thank you so
much.
Thank you so much.
It was definitely a blend of myJewish background and my
husband's Celtic background.
So, we even had cups that says"mazel tov on one side and the
other says sláinte.
So, but, thank you, Iappreciate it.
All right.
So, Matt, you have such aninteresting background to me,
probably because it's so foreignfrom the work that I do on a
(04:12):
daily basis.
Why don't you tell us how youcame to focus on A/B testing and
experimentation and all thegood work you're doing at
Conductrics?
Matt Gershoff (04:21):
Well, so
academically, back when I first
went to school was, as you said,resource economics; and there's
a heavy focus on econometrics.
And what econometrics is isreally about trying to answer
causal inference questions whenyou can't really run experiments
.
And so, the main objective, orthe main hope, is that one can
(04:43):
find a natural experiment sothat you can answer "If
something were to have happenedthen what would be the outcome.
As opposed to just findingcorrelations, one is looking for
causations.
Then, I worked in advertisingand database marketing for about
five years at a global agency,both in New York and in Paris.
(05:04):
tThere we were using a lot ofmethods to try to figure out "f
we had some sort of promotionalcampaign or if we had some sort
of ad, would that affect buyingbehavior?
And so that included TVcampaigns as well as direct mail
.
This was back in the day.
This is actually right at thecusp of the internet.
I'm a bit older than probablymany of your listeners, and so
(05:27):
this is around 95 to 2000, rightwhen the internet was starting.
But even back then a lot of themodes of analytic marketing
kind of come from that databasemarketing world, and so we were
able to run experiments.
And so with an experiment,often called a randomized
controlled trial," and in sortof the parlance of the day now
(05:50):
is often called A-B tests.
It's really the same thing.
We would run experiments to tryto see "if someone got a
particular offer in a mailing,would they be more likely to
purchase the product versussomeone who got a mailing
without the offer, versussomeone who didn't get the
mailing at all?
And so the main idea is that wewould randomly assign people
(06:13):
into different experiences andthen we would see what the
average effect was.
And so, you know, I had a lotof applied experiences as well
as academic experience in that.
And then, I was working in asoftware company during dot com
1 in Manhattan, and then we werethere during September 11th.
(06:34):
After that, there was sort of a"Hey, you know, I've always
been kind of interested inartificial intelligence, and so
I had a early midlife crisis, Ithink, probably because of just
what happened after in New York.
You're like, hey, you know,life is short.
I went back to graduate schoolfor a Master's at the University
of Edinburgh and I studiedartificial intelligence and
(06:55):
there it was really fascinatingwhere I learned about
reinforcement learning.
Remember this was back in,finally went back in like 2005.
This was 2005.
So, this was really before whatI call "the inflationary period
of in artificial intelligence.
It was really sort of a smallcommunity.
In fact, most of the folks Iwould speak to when they would
(07:15):
ask me what I was going to bedoing was studying, and I said
AI.
"Well, that's interesting, butwhat are you going to do with it
?
Because at the time it was,like you know, not really a
thing at all.
So, there I had learned aboutreinforcement learning and I was
really fascinated by thatbecause what it was is a lot of
(07:36):
the theory that we had learnedin economics, known as optimal
control, which is really whenyou have a set of sequential
decisions, how do you make theoptimal decision across a
sequence.
And, that's kind of a similarproblem to reinforcement
learning.
But, what was nice is thatreinforcement learning gave a
format to actually.
(07:56):
.
.it was like a nice frameworkfor applying some of these
marketing ideas.
So, that was what gave me theidea for Conductrics back in the
day, which was, I had thought(and this was naive and actually
is the reason, I think that wereally should try to use the
simplest thing possible) isbecause I was, at the time,
taking an alternative approach.
I wanted to use these methodsfrom reinforcement learning and
I was fairly have this forpeople to use to solve these
(08:28):
marketing problems and it turned.
It it was just too complicatedand it was just too much
cognitive load for users to usethis type of thing.
It was sort of too much to ask.
So, we really started to focusmore on the A/B testing
capabilities of the software andthe more simple multi-armed
(08:49):
bandit, which is similar to themore general reinforcement
learning problem; it's reallyvery similar to an A/B test,
except where you want toadaptively change the weights of
the different experiences sothat you can try to figure out
what works in the shortestamount of time.
Debra J Farber (09:09):
That's
fascinating.
What are some of the majorchallenges that you're seeing
when it comes to companiesrunning experiments?
What were the main challengesthat you were trying to solve
for?
And then maybe tell us a littlebit about Conductrics and how
you're solving for them.
Matt Gershoff (09:19):
Yeah.
So I would say you know to stepback.
I think, while we're softwareas a service and you know we are
a technology company, I thinkone of the biggest impediments
is magical thinking and a beliefthat just using technology is
going to solve whatever problemsthat you have.
(09:40):
And I think really what it is,is it's really A-B testing or
experimentation programs.
The main value and I think thisis a little bit different than
maybe how others might look atit, but I really do believe that
the main value is in that itprovides a principal framework
for organizations to act andlearn intentionally.
(10:01):
And so by intentional I meanlike to take actions with
awareness, to do it deliberately, to do it with conscious
purpose.
And I think one of the risks isthat companies sometimes, you
know, sort of engage in a bit ofmagical thinking and just
thinking well, if we do thisthing, it almost becomes like
ritualized behavior.
(10:22):
And you know, if we do thisthing that these other companies
do that seem to be successful,like the Metas or the you know
the companies in the Valley ifwe do similar actions that they
have taken, then we will alsoget those same types of results,
which, of course, is theantithesis of science, because
it's really about theapplication of the scientific
method to decision-making andgood science, because it's
(10:43):
really about the application ofthe scientific method to
decision making and good science, even if you go as far back as
George Box, of all models arewrong, but some are useful.
Fame, I believe.
It says 76 science andstatistics.
It's actually a great article.
I recommend anyone to read that.
Especially the first few pagesare very accessible.
It's really this idea that youknow being a good scientist is
(11:12):
to do the simplest thing reallypossible that solves the problem
.
A mediocre scientist is one whodoes things that are, for
complexity's sake, so overparameterize and make things
more complex.
So I think the biggest issueand the hardest part of decision
making is being intentional, isbeing explicit, is being very
clear about what the problem isand then having a principled way
(11:35):
of learning about the problemand then taking action.
And so I think most of ourvalue, alongside with the
technology, is in that guidancethat we give our clients and
that we help them be forthrightand go into this with their eyes
wide open, as opposed topromising them a magic bullet.
I think magical thinking isreally sort of the main
(11:58):
difficulty at a high level.
And then of course there's likehow to actually implement at a
high level.
And then of course there's likehow to actually implement.
You know there's the technicalside of things which is, you
know, kind of kind of detailed,and you know, I'm not sure how
interesting that'll be for yourlisteners.
But I think the main thing isthat we provide a host of
different ways of of using oursoftware in conjunction with
(12:19):
companies, different tech stacks.
So you know, there can beclient side issues or it could
be server side.
There's a bunch of ways ofdoing implementation.
But I do think the main thingis that inference statistical
inference is hard, and byinference I mean, you know,
trying to learn based uponobservation, which is really
what analytics andexperimentation is about is a
(12:39):
difficult problem and it'simportant for organizations to
be cognizant of that and theyhave to do the hard work.
And then we're just there tohelp them implement their ideas.
Problem, and it's important fororganizations to be cognizant
of that and they have to do thehard work, and then we're just
there to help them implementtheir ideas.
Debra J Farber (12:48):
That's
fascinating.
Let's define first like whatexactly is A-B testing?
I know we're talking at a highlevel and that was like really
helpful as an intro to theconversation, but if we could
dive a little deeper in beforewe even get to how you know, how
is privacy relevant?
Let's first define like what is.
Matt Gershoff (13:05):
A-B testing.
Yeah, so there's lots ofcomplicated ways we could talk
about, but I just think, at thebare bones, the most simple
thing is there's really just twoideas to keep in mind when
thinking about A-B testing.
And A-B testing is just a formof experimentation and you can
think of it as a clinical trial,as an A-B test, right.
And so, in your mind's eye, ifyou're thinking about, say, a
(13:26):
vaccine and you want to learnthe efficacy the marginal
efficacy of a particular vaccineversus, say, placebo, there's
two main bits to the A-B testRight.
One is that it needs to have aprocess for assignment, so we're
going to assign different usersor different people different
(13:50):
treatments in such a way that itblocks what's known as
confounding, and all that reallymeans is that the user, so the
person entering into theexperiment, should have no say
whatsoever of what treatmentthey get, and that could be
explicitly like not being ableto ask for the vaccine versus
(14:11):
the placebo.
In the case of digital systems,it might be that you need to
make sure that, let's say,someone's running an old version
of the browser and it isfailing to execute one of the
treatments properly, so thetelemetry does not come back,
and so there's this missing datathat sort of implicitly, is
(14:31):
similar to, like, those types ofusers opting out of one of the
treatments, and so you get,potentially get skewed results.
It's really a system that isvery robust in ensuring that the
end user, the people who haveentered the experiment, do not
have any say in what treatmentthey get, and so essentially
that's this idea ofrandomization.
(14:53):
So we're going to randomize,and so the allocation mechanism
is going to be random, and thatrandomization procedure blocks
the self-selection bias.
So what we don't want is, say,older folks to be taking the
vaccine, if you know most likeif the people who are taking the
vaccine tend to be older folksand the people who are not
(15:13):
taking the vaccine tend to beyounger folks, that might skew
the results, because you'regoing to confound the results of
the vaccine what's known as thetreatment effect with the fact
that the older results mighthave overall worse conditions,
regardless if they take thevaccine or not, and so that's
what you're really trying toprevent.
So that's the first part, and ifyou're able to block
(15:36):
confounding, so there's noconfounding then we can use
statistical methods to try tomake inferences Right, and so we
need to have a way, since we'redoing randomization, so our
groups are randomized.
We need to not just see whichone performs just better,
naively, so you could just lookand say, ok, the vaccine has
(15:59):
lowered the incident rate bysome amount, but you also need
to take into account the randomvariation, so what we call sort
of the standard error or therandomization, sort of the noise
of the experiment.
And so there's these two thingsOne is to block confounding and
two to have some principled wayof removing sort of the
background noise so that we canunearth the signal and see
(16:21):
whether there's a signal there.
And so that's really it.
There's really these two things.
Now there's a lot of workaround what the appropriate
statistical method should be,and folks tend to focus a lot on
that.
But really at its core, that'sreally all you need is just some
process to block confoundingand then some process to
evaluate the treatment effect.
Debra J Farber (16:42):
Awesome.
Okay, so that's real helpfulfoundational info for me and I'm
hoping for a lot of mylisteners as well.
And you mentioned a littlebefore the value that companies
can realize from experimentation, but I was wondering if you had
some examples at a higher level, like if you have
experimentation like how doesthe company overall benefit,
(17:02):
rather than you know in anindividual use, each individual
use case?
Like obviously decisions arebeing made and information is
being collected about howdecisions are being made and
thus the company could takeaction.
But is there like a higherlevel, like value realized?
Matt Gershoff (17:16):
Well, yeah, I
think the higher value is what I
think I mentioned before, whichis that an organization has a
principled way for makingdecisions, such that they are
explicit about what thequestions are.
So it's like we want to seewhether or not these search
engines, which search enginealgorithm, has greater efficacy,
and what is that differencethat we care about, like how
(17:38):
much is a meaningful improvement?
And just by speccing that out,just by having a process that
that question needs to gothrough, forces the organization
to have to be sort of explicitat the start.
And it's like well, you know,if they feel that this other
search engine algorithm isn'tgoing to perform, if no one
(17:59):
believes this can perform abovea certain threshold and that
certain threshold is not reallygoing to be that valuable to the
organization, then it's a wayof almost filtering out wasteful
activities.
So, on the one hand, at a highlevel, it's a great way for
forcing the organization to beexplicit and explicit about
trade-offs and on the other side, I think it's a way of managing
(18:20):
risk and so like, if I'm aproduct manager and, for example
, we have a media company andthey have going back to the
search engine algorithm, theywant to improve the search on
their site.
It's a media company.
They use us to run anexperiment versus the existing
search engine, versusalternative search engine
algorithms, and then see, basedupon certain performance metrics
(18:47):
, whether or not the alternative, the new search engines, seem
to be performing better,certainly not worse, and if so,
if they are above a certainthreshold, then they can more
safely, with lower risk, switchover to the new search engine
algorithm and then providebetter experience for their
customers.
So the whole idea is to have aprocedure that makes folks
(19:10):
intentional and minimizes therisk of delivering pushing poor
experiences to their customers.
Debra J Farber (19:18):
Oh, that makes a
lot of sense.
So now I'd love to turn to youknow how does being intentional
about A-B testing andexperimentation affect privacy?
Matt Gershoff (19:29):
Yeah, and so I
think we did a major rebuild of
the software in, and I think itwas 2015-ish.
We got you know, we startedreading up on GDPR.
This is before.
I think GDPR was 2018, 2019.
I'm not sure, but we had someawareness, especially of Article
25.
And then we had read throughprivacy by design principles
(19:52):
specifically Principle 2, whichinto conductors, especially that
idea of default, and so wedon't want to be paternalistic
and tell our customers on howthey need to use experimentation
, but we did want to have thisdefault option and we wanted our
(20:13):
software to have it, so that ifyou used it, its default
condition would be to follow.
Principle two, which isessentially, you know, is to
minimize identifiableinformation that you're
collecting, and this is aboutembedding this in the technology
as part of a design andengineering principle, that it
should be the default behavior,right, and so that means that
(20:36):
the customer should gesture orbe explicit when they have use
cases or situations where theywould need to collect more
information, which again, isfine because it's ultimately
their call, it isn't our call.
And then also to keep thelinking at a minimum, so to
minimize the linking ofpersonally identifiable
information, and that isinteresting because it's sort of
(20:58):
in direct opposition to we.
Usually what I think of is justin case mindset about data
collection.
So when you, if you're in, likethe data science world or the
analytics world, there's thisview that you always want to
collect more information.
It's a maximalist approachwhich is interesting, which is
like you want to maximize, youwant to have this 360 view of
(21:21):
the individual, so you want tomaximize identifiable
information.
You want to maximize linkingthat information across the
customer so that you can, likeyou can, associate events that
they've done or metadata aboutthem all across and so, and
that's the default behavior.
So, in a way, there's this ideaof what we think of as sort of
(21:41):
this just in case, which is tocollect everything, collect all
the data at the finest level ofgranularity and to collect as
much data as possible, asopposed to that, the sort of the
just enough, which is theprivacy by design.
It turns out that what's niceis that if you do follow the
(22:02):
privacy by design or theprinciple to data minimization,
it does relate as well to thisintentionality, because you have
to think through well, what issort of the value of collecting
it at this finer level ofgranularity?
Really, stop and think about it,a lot of the just-in-case
approach of thinking about datacollection, and even in the
(22:31):
Privacy by Design, the originalarticle, they talk about how
richer data tends to be morevaluable.
So, if you have richer data,that's preferable to data that
has less information or lessgranularity into it, and I think
, in a way, that's the wrong.
While that is true, because itgives you greater degrees of
freedom, it gives youoptionality, and so really, what
tends to happen is that thiscollect everything, because why
(22:54):
not is really about an implicitobjective, which is about
maximizing optionality.
Right, so I want to have theoption, and that's driven by a
couple of things.
That's driven byality right, soI want to have the option, and
that's driven by a couple ofthings.
That's driven by fear.
Right, you know, I want tominimize my regret.
What if we didn't collect thisbit of information?
And that was what the boss isgoing to ask me about?
(23:15):
I don't know if the boss isgoing to ask me about it, but
just in case, right.
So there's like a kind of acover your behind type of
mentality there, and thenthere's the magical thinking
which is like it's that next bitof information that we don't
have is where there's going tobe some sort of huge payoff.
We live in some sort of fattales world.
There's big payoffs that areout there in the shadows, and if
(23:37):
we just add more information,we were able to link some more
additional information or we hadmore granularity about the
individual, then we're going tohave some huge payoff.
And that's, I think, is magicalthinking.
Unless you're explicit, likeyour goal is to maximize
optionality, you really probablyshouldn't be doing it.
(23:58):
And what I like about,especially because there's a
cost.
Now there's this shadow priceof privacy.
You are in opposition toprivacy by design, because the
default behavior should be dataminimization, and that is in
these privacy guidelines, whichyou know much better than I do,
but certainly in, I think,article 25 from GDPR as well as
5A.
(24:19):
There is a cost to doing it,and so we should be cognizant of
why we're collecting data, andso I think what is nice is that
they dovetail.
Data minimization dovetailsnicely with this intentional way
of thinking about it, becauseyou need to state up front what
(24:39):
information we want to collectin it and at what level of
granularity, and it works wellwith experimentation in
particular.
And I first want to say I don'tmean to be paternalistic, in
that there are many cases whereit might make sense to try to
collect, you know.
If you know, yeah, some sort ofsense why you might need it and
you can make a good rationalargument for it, that's totally
(25:00):
fine.
But what's interesting is, atleast for experimentation, we
have what I call just-in-timeproblems.
It's normally in the analyticsspace, or a lot of times in the
analytics spaces we'recollecting data for future
questions.
I'm not exactly sure what thequestion is going to be, but I
want to have this information sothat we can do some exploratory
analysis, or there might besome sort of additional question
(25:21):
that we might have in thefuture.
I'm not sure.
Again, that's related to thisidea of optionality, whereas
experimentation is the opposite.
We have the question first,right, we have a hypothesis Is
this new search engine betterthan this other one?
Or is this new product betterthan what we currently have?
Or is this new marketingcampaign better than not doing
(25:44):
anything at all?
Or the vaccine example, right?
So we have the question andthen we need to collect the data
for this question.
So that's what I mean.
It's sort of just in time.
We were collecting the data forthis question.
So we have an explicit task, andso that means that we can have
data at the task level asopposed to at the individual
level.
And so that's the way oursoftware is built is, on this
(26:10):
idea of tasks, and so for eachexperiment is its own task, and
so then the data is collectedabout the task as opposed to the
individual, and so we don'tlink the individual across as
much as possible, it's only wejust keep aggregate data.
You can think of it asequivalence classes, and if
equivalence classes doesn'tresonate just sort of like a
pivot table, so the data isstored in aggregate and so we
(26:32):
just keep combinations of thetreatments of someone got an A
or someone got a B.
That's just implemented inConductrix.
So we just have a small littletable which just has a row which
says a column, which says acolumn which says treatment, and
there's an A for one row andthere's a B for another row, and
we just increment the counts.
(26:52):
And then we just increment theconversion counts or the sums of
the conversion counts, and justas a technical detail, if it's
a numeric conversion event likesales, we just aggregate or
increment the squares of thevalue and it just turns out with
those three bits of data counts, sums and the sums of squares
(27:13):
and so that's just a littletechnical bit.
You can do A-B testing, and soyou don't need to collect the
data at the individual level.
You only need to store it atthis aggregate level, and that
seems to us to be consistentwith this idea of privacy by
default.
It should be what folks aredoing if they're following
(27:35):
privacy by design, unless theyhave some other reason for not
doing it.
But for the task of running anexperiment, running an A-B test,
that's really all you need, andhence that's probably all you
should be collecting.
If you need it for some otherreason, fine, but for an
experiment, you probably don'tneed to be collecting anything
else.
Debra J Farber (27:55):
That is a
tremendous amount of information
that has been enlightening forme personally.
Just hearing you talk aboutthat it also makes me think,
when you know, thinking from anoperational perspective that it
would be if you have like alimited set of data you're
collecting because you're beingintentional, because you are,
you already know the questionsyou're asking and all that that
(28:16):
as you are addressing privacyrisk in an organization that you
could better audit what hasbeen collected or what, rather
than how do you audit?
For it could be anything andwe're collecting everything you
know.
Matt Gershoff (28:31):
That's right, and
so each table is scoped, and so
in a way it makes it quite easyto look at.
You know each of the datastructures that are there and
what's being stored.
So from an auditing side, butalso it has advantages in that
on the computational side,because you know you've already
done as you've collected thedata, the data is already in its
mostly summarized form, and sothere's less computation that is
(28:55):
required to do the analysis.
And now for a simple A-B testlike I'm talking about, that's
not really such a big deal, butreally the same idea can be
extended to what's the case ofregression.
So really, underneath the hood,behind a lot of these
statistical approaches, isreally doing a regression
problem, and so you can, usingthe same type of aggregation
(29:19):
approach, calculate what's knownas ordinary least squares
regression, and so that's atechnical bit.
What's nice is that the factthat you can do this in a very
efficient way.
It means that we can now dowhat's known as multivariate
testing, which is for the folkswho are more statistical, it's
more like just a factorial novathat can be done.
(29:41):
And we can do other things likeevaluate whether or not two A-B
tests are interacting with oneanother, like maybe one A-B test
might be interfering with theresults of another.
The fact that we can doregression in this way means we
can answer that we can havetools that can alert our
(30:02):
customers whether or not one A-Btest might be interfering with
another A-B test.
We can do things like if you arepassing us side information, so
the data structure need notjust be the A and the B, but
maybe you're passing uscategorical information.
That's one of our designs isthat you could pass us
additional information about theuser, but those additional bits
(30:26):
of information are limited inthat basically, they're already
categorized.
You have to send uscategorizational information,
like segment information.
So rather than being able tosend us numeric information of
arbitrary precision which is ata very fine level of granularity
you don't want to do that in asort a data minimization
approach you want to think aboutwhat the optimal level of
(30:47):
granularity is so that you don'twind up with implicitly unique
identifiers, right, and that'swe don't want to do that
accidentally.
Have sort of like a, have a,you know.
You want to manage sort of theyou know, sort of the entropy of
of the data that we'recollecting, and so we allow our
(31:08):
customers to pass alongadditional information, so you
might have something likeloyalty status, and that might
have five different values tofive different levels, or maybe
there's a tenure, maybe that's10 levels, and so that's capped
at a certain level ofcardinality.
This idea of cardinality isimportant, which is how many
unique elements are there ineach potential data field, and
so, by default, we limit it to10 unique elements, and, again,
(31:31):
that helps constrain how muchdata that you're collecting.
But it turns out, though, thatyou can still do regression on
this data that's aggregated bytreatment, by, say, tenure, by
loyalty status, let's say, andso we can constrain and know how
(31:53):
large the data structure is,just because we already know how
data is going into it, unlikewhen you just collect data at
the individual level.
The size of that data structurewill be the number of users who
enter the experiment, so maybeit's a million or two million
rows, whereas here it'll bebound by the combination of the
(32:14):
number of treatments, by thecardinality, the joint
cardinality of each of the datafields.
So if you had a loyalty statusof three and login status of
login logged out, that's six,and you had two treatments,
that's just a total of 12 rowsthat you would need to store the
data.
So it's nice, and what's alsonice is that we can also take a
(32:36):
look at the count for each row,and then we can report back on
what's known as sort of the K ofK anonymous data, and so we
implicitly store the data at a Kanonymous level, and while you
know, we're not claiming that Kanonymity is like the end all
(32:56):
and be all for data privacy, andthere was a lot of discussion
about that.
It is nice, though, that we canuse this notion of K-anonymous
data as almost like a reportingfor our clients, so that they
can inspect back to your notionof auditing and kind of inspect
and see what's the finest levelof granularity that we actually
(33:16):
have across all of theexperiments and that information
can easily be surfaced so itcan be managed.
Debra J Farber (33:24):
Yeah, that is
really helpful.
I'm struck by the fact that,including the, that making the
constraint throughexperimentation and collecting
only certain data, not all thedata you know, that actually
helps make the process for allalong, like regression, testing,
all of the all of that moreefficient, it sounds like.
But also with k-anonymity,while there is discussion about
(33:48):
how maybe it's not the gold starof anonymity anymore in terms
of creating some level ofassurance of anonymity, it does
sound like from what you'resaying that just to define
k-anonymity is a property of adata set that indicates the
re-identifiability of itsrecords, that at least being
able to have k-anonymous data,like you said, it helps to
(34:10):
indicate to the team what levelof re-identifiability there
might still be that they need toaccount for.
You know, if they're takingprivacy into account, which I
hope they are, yeah.
Matt Gershoff (34:19):
Or if it's a
particularly sensitive case.
Yeah, and it's also like a lotof the times this data is not
particularly sensitive anyway,and so that's usually why you'll
you might get some pushbacks,like I don't really get,
especially in the U S companies.
A lot of the companies inEurope or companies in the U?
S, like the financialinstitutions, like at the banks
that we have healthcare.
This is like of great reliefbecause then it's easier for
(34:41):
their compliance and it's easierto like know exactly what's
happening, and so it can beextremely useful in those
contexts.
That's right.
Debra J Farber (34:49):
Well, thank you
for that analysis.
I think that's that was reallyhelpful, and I kind of want to
turn our conversation to thePepper 24 conference, which is
where we met last year.
Matt Gershoff (35:01):
I'm so excited
for it.
First of all, it's too bad thatthis isn't coming out
beforehand, because I just wantto totally promote Pepper.
I had such a great time.
First of all, I got to meet youlast year, but also it's just
such so impressive the speakersand just the attendees.
And not only were they justsuch high intellectual capacity
(35:21):
and creativity, but it's justsuch a welcoming vibe.
And I'm not really from theprivacy space.
Again, we do a lot of these.
We've added a lot of thiscapabilities because we thought
it would be good design and bydoing so it opens up and makes a
lot of the system better and itis consistent with this just
idea of just being explicitabout why we're doing things.
But no, I really fell in lovewith the whole community from
(35:42):
last year and I'm really excitedto be going again.
Debra J Farber (35:45):
Yeah, I think
you underscored why, even like
nine days after my wedding, I'mlike I got to go to Pepper.
Sorry, love, I'll see you.
People are like why are youhere right now and not on a
honeymoon?
It's like, well, I couldn'tmiss Pepper because I had the
same experience.
It was more of like all this, avery welcoming.
I mean, most people don't comeinto privacy engineering because
(36:05):
they were into privacy first.
It was they got, usually gotpulled into it from somewhere
else and yours just happened tobe through your expertise of you
know of experimentation,optimization and all of that fun
AB testing stuff and like a lotof people will just kind of
come in and have that oneviewpoint or several viewpoints
of how they are attacking aproblem, but they don't
(36:26):
necessarily are following theentire space, right, that is
very unique and as someone whofollows the entire space, I
could tell you there's a handfulof people that I've come across
that are really that knee deep.
So it was a great opportunityto bring people that are working
on privacy and technicalcapacity, maybe in one little
area or it is still very siloed,especially in the research
(36:48):
space, right, you know yourhomomorphic encryption.
Folks are not exactly talkingto your differential privacy
data scientists.
It's a different world.
So what is so lovely aboutPepper is that thought-provoking
talks.
They're short but they have tobe impactful.
This year I was really excited.
I had the opportunity to sit onthe program committee and help
with talk selection and then,relatively last minute, I got
(37:11):
asked to sub in to be amoderator of a panel.
So I'll be moderating privacy,design patterns for AI systems,
threats and protections, and I'malso going to be serving as a
room captain for the last threetalks that are focused on
threats and engineeringchallenges.
But the reason that I keepcoming back is it is a small
conference.
I'd say there maybe there werelike 200 people last year.
(37:32):
I would hope there are morethis year, because just
everybody who went was talkingto their friends and colleagues
about how awesome it was.
But it enabled the andfacilitated these hallway
conversations because you didn'tneed to be in the hallway,
because they were.
It's small enough to really be,to have those conversations,
kind of wherever you are.
And, yes, people are superwelcoming.
(37:52):
There aren't really any vendors.
So it really just feels likefocusing on the practice of
privacy engineering not thetheoretical that exists, but at
other conferences, and so yeah,I agree.
I mean I've been talking it upon my show all year, but I agree
it would be nice if we wereable to record this in time to
get people excited about Pepper.
(38:14):
But I know you're going to begiving a talk under the privacy
preserving analytics section oftalks called being Intentional,
a-b Testing and DataMinimization, and I know we've
talked about a lot of that today.
So I want to you know, at leastpose the question to you of
what are you hoping that privacyengineers take away from your
talk?
Matt Gershoff (38:35):
Oh well, you know
, really I do feel a little bit
like an outsider to thecommunity and so I'm really just
trying to give an applicationlike here is actually a company
out in the world really justtrying to provide good product
for our customers and help themsolve their problems, and here's
an example of where weproactively reviewed privacy by
(38:57):
design principles and here's anexample of a company that you
know try to actually implementsome of them into our software
and the fact that there arethese happy I don't know it's
coincidences, but there's maybesome sort of deeper benefits.
So there's these additionalbenefits that you get from
(39:18):
privacy, where it's not just aconstraint and seeing it as a
blocker, but that it also canoffer some major advantages when
it's appropriate.
Now again, everything is atrade-off and so there may be a
context where it's not, but itdoes have these very nice extra
benefits of being morecomputationally efficient for
certain types of cases as wellas, as you say, being more
(39:40):
parsable or interpretable sothat you can actually see
actually what data is, and it'seasier to manage that type of
thing.
So really we're just going inthere just to be
thought-provoking, coming at itnot exactly from like privacy,
but it's like hey, turns outthat this privacy thinking sort
of privacy by design helps withthinking intentionally, which we
believe is really the mainobjective of experimentation and
(40:04):
so really just giving adifferent way of looking at it
and maybe being thought for,provoking for some folks to be
thinking about.
Oh yeah, I had been thinking interms of this just in case
approach to collecting data andmy organization, and maybe we
can apply some of theseapproaches within our orgs, even
if we're just acting as thesecure curator and then we're
going to release data in thisway.
(40:25):
Maybe I hadn't realized that wecan do all of the stats or most
of the stats, I should say formany of the cases for A-B
testing.
We could do it in this sort ofK-anonymous form.
Debra J Farber (40:36):
That's awesome
and I definitely think that
people will come away with that.
I'm really looking forward toyour talk.
Are there any other talks atPepper that you are excited
about or topics that you'rehoping to learn more about you?
Matt Gershoff (40:48):
mean other than
your panel.
I mean so definitely your panel.
Debra J Farber (40:52):
I'm going to be
helping to feature the other
great speakers as the moderator.
But, yes, other than that panel.
Matt Gershoff (40:59):
Yeah, I mean
mostly I'm just excited to be
around folks.
As many of your listeners know,sometimes in the privacy space
it's about what you can't do,and there's like that procedural
approach that we kind ofchatted about before we got on,
which is just people looking atthe check boxes and trying to,
like, prevent folks from doingthings.
What I love about Pepper isthat it's really people trying
(41:27):
to be cognizant and respectfuland trying to solve the problem
and trying to get good outcomes,and so that's the main thing.
There was some, you know, acouple of interesting chats
other than your sessions, whichis the learning and unlearning
your data in federated settingsby I think it's Tamara Bonacci.
I apologize to them for if Imispronounce their name, but
that looks really interestingabout with some machine learning
and trying to figure out how tounpack data that it may have
(41:50):
learned on.
As I interpret it, that maybeyou don't want them to, because
we're interested in that.
Since we do some of that workwith machine learning, it's
important to know how one mightbe able to unfold that data.
Debra J Farber (42:01):
Awesome.
Now, before we close, do youhave any words of wisdom for the
audience today?
Matt Gershoff (42:10):
Just the main
thing is just to be aware and
it's really important to beagain.
Our whole thing is to bethinking about, like our world
is focused on hospitality.
So to have empathy, to haveempathy for the folks that
you're serving, and do itrespectfully, and to do it
rationally and part of beingrational is to be mindful and to
(42:31):
be cognizant of the trade-offsas well as unintended
consequences, and so I thinkexperimentation, in conjunction
with data minimizationprinciples, helps you do that.
Debra J Farber (42:46):
Well, thank you
very much, Matt, for joining us
today on the Shifting PrivacyLeft podcast.
Matt Gershoff (42:51):
It's my pleasure
and thank you so much for having
me.
It's quite an honor.
Debra J Farber (42:55):
We'll have you
back another time Until next
Tuesday, everyone, when we'll beback with engaging content and
another great guest.
Thanks for joining us this weekon Shifting Privacy Left.
Make sure to visit our website,shiftingprivacyleftcom, where
you can subscribe to updates soyou'll never miss a show While
you're at it.
(43:15):
If you found this episodevaluable, go ahead and share it
with a friend.
And if you're an engineer whocares passionately about privacy
, check out Privato, thedeveloper-friendly privacy
platform and sponsor of thisshow.
To learn more, go to privatoai.
Be sure to tune in next Tuesdayfor a new episode.
Bye for now.