S2E31: "Leveraging a Privacy Ontology to Scale Privacy Processes" with Steve Hickman (Epistimis)

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Steve Hickman (00:01):
You can infer, for example, somebody's race
just based on their zip code.
If you're making decisionsabout who can get credit, the
federal law on credit is thatyou can't use race as a criteria
.
But, you might end upcompletely, incidentally because
the ML you might incidentallybe using things that are proxies
for that; and so, if we havetools that can identify these

(00:25):
risks in the models, now we canstart to see, "can we develop
workable laws that actuallyachieve our goal of privacy, as
opposed to what we're doing now.
Because what we're doing now,particularly with the advances
in ML, it doesn't work anymore.
It's not achieving its goal.

Debra J Farber (00:50):
Welcome everyone to Shifting Privacy Left.
I'm your host and residentprivacy guru, Debra J Farber.
Today I'm delighted to welcomemy next guest, Steve Hickman.
He's the founder of Epistimis,a privacy-first process design
tooling company, and Epistimisprovides tools that can check
your process design as theprivacy rules change, so you

(01:11):
know if you need to fix yourprocess before the rule changes
go into effect.
Today, we're going to talkabout the need for having a
"privacy ontology, in additionto a privacy taxonomy, and the
types of privacy modeling toolsthat can solve current privacy
scalability problems.
Welcome, Steve.

Thank you for having me.

I am so glad you're here.
I know we've had previousconversations and I've really
been fascinated with yourapproach to thinking about the
problems with scaling privacytoday in large organizations.
I know, most recently, youworked at Meta and you also left
to start Epistimis to scaleprivacy.
Why don't we start off with alittle bio from you?

(01:56):
You have such an interestingbackground as an engineer.
Just give us an overview ofyour background and how you came
to focus on privacy processdesign rules at Epistimis today.

Okay, sure.
I have a confession to make.
I've done so many differentthings.
I have to read my own resume toremember them.
So, I worked - it turns outthat that matters because I
worked for quite a while as aconsultant in a lot of different
industries; and one of thethings that comes out of that is

(02:31):
this you begin to see the sameproblem in many different ways.
You know, I worked in - besidesaerospace, I worked in retail,
helping Target set up theirfirst website, and I worked in
telecom a number of years, andpollution control, and many
other things.

(02:51):
What happens is, when you seethat many different things, you
start to see the abstractionsbehind them; and this is where
ontologies come in, becauseontologies are about conceptual
abstractions.
I did a lot of different thingsin a lot of different industries
.
I ended up - actually justbefore Meta, I ended up in

(03:13):
aerospace.
I was working at Honeywell andI was the Technical Lead on a
project for the U.
S.
Military; and that actuallyended up providing a lot of the
technical foundation for whatwe're doing at Epistimis.
Now, what does that mean?
Well, two things.
One, your tax dollars at work,so that's good, I guess; and,

(03:36):
the other one is it's "militarygrade, if that matters to you.
So after that, then I went toMeta.
It was kind of interesting thatI was being recruited by them
and I did not have a Facebookaccount.
Just so you know, I'm kind of aprivate person to begin with.
I did not actually have aFacebook account until I had to

(03:57):
get one to accept my job.
But, I was being recruited andit so happened that the person
that was talking to me initiallywas the director of the privacy
organization.
I was just very frank with them.
I said, "here's where I'm at.
I'm not really interested inworking at Facebook unless it's

(04:18):
on privacy, because privacymatters a lot to me.
" and part of that, in my careermy prior life, is just.
.
.
I guess it's part of it being anintrovert.
I'd also gone to law schoolwhile I was at Honeywell, and my
focus there was on intellectualproperty.
I'm not a practicing lawyer.
I had become aware of thingsfrom a legal point of view then,

(04:40):
so when I was being recruited,if I'm going to work here, this
is the organization I'm goingto work in.
So, I went and started workingin their privacy organization,
and all of the background thatwe have informs each step that
we take in life.
So, as I was looking at whatwas being done in the privacy

(05:02):
organization, the problems thatthey were attempting to solve,
all of this background ofworking in many different
industries and dealing inabstractions and working then on
this military project - whichalso had quite a bit of
abstractions - and the toolingto support them, I looked and I

(05:25):
said, "If you're going to solvethis privacy problem, you really
need all of this stuff.
And so I spent some time thereand tried to help them as best I
could.
Meta is a very largeorganization.
I worked with a lot of peoplewho are very smart.
They have a lot of momentum.
That momentum was not going inthe direction that I thought was

(05:48):
necessary to solve the problemsthat they needed to solve, and
so I thought, "well, I'll justgo solve this on my own.
And that's where Epistimis camefrom.

Thank you.
That really does help connectthe dots, because we weren't
doing privacy for many years.
Right?
But, bringing your entire,varied background to the privacy
problem, it's amazing how youwere able to do pattern
recognition to understand whatare the problems you're
continuously seeing and thenapply it to privacy.
I know we're talking ingeneralities because we haven't

(06:21):
gotten to the meat of theconversation yet, which is
privacy ontologies, but let'ssee, before we get to the
exciting solutions that you'reworking on related to privacy
tooling, privacy by design andengineering, let's unpack some
of the major challenges thatyou've seen when it comes to
scaling privacy.
So, in your opinion, why has itbeen such a challenge for

(06:42):
companies to get privacy rightat scale?

Fundamentally, I think the challenge comes from
the fact that softwaredevelopers like to write code
first, and privacy (if it wasthought of at all in the past)
was an afterthought.
As a result, there's also thisother tendency that most
software developers have; andthat is that they don't bother

(07:08):
to document their code.
What happens then is that youlose all of the semantics, all
of the thought process that wasgoing on when the code was being
written.
A lot of that information getslost once the code's written.
That developer walks away.
They're doing something else.

(07:29):
It's not well- documented.
When you look at that code andyou're trying to figure out
"what does this mean, thatbecomes difficult.
What you see, and this is whatI saw at Meta, and you see the
same thing with tools out there- Privado is one, and there's
other companies that do thisdata mapping where companies

(07:49):
will try to reverse engineer -they'll do code scans and try to
reverse engineer and identify.
"Here's what we think thesemantics are.
In Meta's particular case, itwas a real challenge because
they have multiple technologies.
The front end's written inJavaScript.
The back end's written inPython.
There's some C++.

(08:10):
Then, they bought Instagram andall of Instagram stuff is
written in Rust.
And, there's a little Java.
When you're trying to do thisanalysis, not only do you have
to scan the code and try tofigure out what it means in one
particular language, then youhave to try to connect the dots
between "okay, this function iswritten in JavaScript and then

(08:34):
stuff gets written into a datastore and it's read out in
Python.
.
.
and what concepts?
When you're trying to track aconcept through this entire data
flow model, it's difficultbecause of all these language
and technology changes.
They were putting a tremendousamount of work into this and
trying to account for this whilethe code's being written.

(08:59):
Counting for it while thecode's being written is better
than trying to backtrack, butthe conclusion that I came to
was it's really better just todo it right the first time.
A phrase that I like to usewith people is "if you don't
have time to do it right, howwill you ever have time to do it

(09:19):
over?
"

It makes a lot of sense, whether you're coming
to market and you're using VCfunding or you want to build it
right.
You don't want to just build itjust for product- market fit,
and get out there and then findout that you have to
re-architect everything becausethen you didn't plan that into
your product roadmapnecessarily, and it just becomes

(09:41):
compounded technical debt thatsomeone eventually has to
address.
Or, it's an uphill battlebecause everyone wants to work
on revenue generation ratherthan cleaning up problems in the
architecture that you createdbecause you didn't think about
things early on.
So, I totally agree.

Yes, and privacy has a unique set of problems in
terms of technical debt, becausethere's not just standard
technical debt (if there is sucha thing).
You create a problem; you knowthe problem exists; you know you
need to fix it.
The issue with privacy is,sometimes the problems are
created not by you, but becausethe law changes and something

(10:20):
that used to be okay is nolonger okay.
S o, you can go through thiswhole process of doing the code
scans and figure out and say,"kay, I know what this data
means, I've figured out thesemantics, I've analyzed how
this is flowing through myprocess, I know what the
functions are, I know what thepurpose is.
You figure all that out, butthen, if you change

(10:43):
jurisdictions, or the law itselfchanges, you got to do it again
, assuming you even notice.
So, there's this additionaldimension that is beyond the
kind of debt that you get withjust you wrote sloppy code.

Right, and sometimes it's like you don't
even know where to start, toclean it up.
Right?
I mean, just even catalogingwhere are the problems is a big
effort in and of itself.
Then, you have to actually fixit; and so, one of the reasons
that I am so excited to have youon the show is that in our
previous personal conversations,I've heard you speak about
privacy in a way that I had notheard anyone else.

(11:22):
It had been giving me a greatlike 'aha' moment, kind of like
forgetting about the technicalstuff for a moment and just
going back to more of like asocio- technical "how do we
approach problems?
" kind of thing.
And we had discussed why havinga taxonomy for privacy in an
organization is important, butit's insufficient to managing

(11:44):
privacy at scale.
You've stated that, first andforemost, an organization really
needs to have a known andunderstood privacy ontology, and
so this is where I really wantto expound on this, really where
I want to focus ourconversation for most of the
episode.
We'll start with why.
Why is a privacy ontologynecessary for scaling privacy

(12:05):
processes?
What is an ontology?

Let's see, I'll start with the definition,
because that's important.
So a taxonomy - most people areaware that it's a hierarchy of
concepts; and, if you tookbiology in high school, it's
like genus and species, thatkind of thing: general, specific
, and you can have as manylayers as you want.
And so, that's a good startingpoint.
What you get with an ontologyis more than that, because you

(12:30):
get, not just general andspecific, but you also get
relationships relationships,whole whole or the role that a
particular piece of informationplays in a given context.
So, you're able to work withthese other kinds of
relationships when you'redealing with various concepts,

(12:55):
beyond just the general andspecific that you get with a
taxonomy.
So, that's really key because,if you think about how software
development has evolved in the1970s, we figured out that
having data structures was agood idea prior to that.
And, here's a variable.
"I got this one variable, I gotthis one other variable, I got

(13:16):
this other variable, and thenmaybe then we said, okay, well,
we want to group these variablestogether, and so we started
giving them similar names, likeyou know, first name and last
name or something like that.
But they were still not grouped.
And then eventually we figuredout we need something better.
We need actual structures ofdata.

(13:40):
So, an ontology is basicallythat.
It's something that enables youto create these data structures
and the relationships betweenthem.
The key distinction that youhave with an ontology versus
what you have in source code isthat in an ontology, I don't
care about the physical storagethat I'm using.

(14:05):
I don't care if, let's pickmoney, the concept of money.
I don't care if I'm storingthat in a floating point or a
string or an integer.
What I care about is that it'smoney.
In many cases, I don't evencare if it's dollars or euros or
pounds or whatever.
I don't care about units, Ijust care about the concept.
So, it's that conceptual levelwhere the ontology lives; and

(14:30):
you can have that, not just forindividual concepts, but entire
structures of concepts.
So, if you think about anaddress, we might say, "okay,
it's city, state, zip.
If you're in the United States,well, again, I don't
necessarily care about thestorage, the details, I just

(14:51):
care about these underlyingconcepts.
And by defining these at thisconceptual level, now we've got
something that gives us theflexibility.
It gives us a couple of things.
One is, first of all, that'swhere the rules live.
If you think about how the lawis written, the law doesn't care

(15:12):
about those details either.
.
.

[Debra (15:14):
Well, it depends on the law.
]Yes, I mean, it does depend on
the law.
Some do.
Yes, I agree, but in the caseswhere it doesn't, then you're
not dragging along detail that'sunnecessary, and maybe I should
word it that way.
That makes sense.
] And so, there's that; andthen, you have the ability later

(15:35):
on, if you wanted to associatethis, if you either want to do
code generation, for example,you can add that in the detail
later on if you want to.
But, if you work with things ata conceptual level, then you're
not tied to specifictechnologies, so that in the
case of a company the size ofMeta, where you're using many

(15:56):
different technologies, youdon't run into this impedance
mismatch that occurs becauseyou're trying to keep track of
things, and you're switchingtechnologies, and you can't
figure out "Okay, is this fieldin this data structure written
in JavaScript?
Does it match this other fieldthat's in a Python structure?

(16:17):
" o you end up with a commonlanguage that can be used across
all of these; and it also turnsout that that becomes very
helpful, again, going back tothe rules - that you're not
forcing your lawyers to learnhow to code because they don't
care about those code- leveldetails either.
What they care about is whetheror not you're following the

(16:38):
rules.

So, by having an ontology, it's kind of bringing
the business together so thatall the various stakeholders
have a common language, itsounds like, to talk about
privacy at a more abstractedlevel before you get into the
more technical applications ofdata and approaches and
architecture and all that.
Is that what you're saying.
Is that how it's.

(17:00):
.
.
?

Yes, exactly, and having a common language is
really, really important.
Communication is difficult.
I remember years ago I was in aconversation in some meetings
for some project and there were,I think, eight people in the
room.
Four of us were softwaredevelopers and the other four

(17:22):
were mechanical engineers.
This was for a manufacturingtool that we were trying to
develop, and I came out afterfour hours trying to figure out
if we'd actually communicatedwith each other.
They were all engineers.
It's just some of them weresoftware and some were
mechanical.
You can certainly see that thefurther people get apart in

(17:43):
terms of their backgrounds, theharder and harder communication
is; and so, having a commonlanguage becomes really, really
valuable.
It has the highest level ofendorsement.
I'll put it that way.
If you remember the story ofthe Tower of Babel, this is the
whole issue that God getsirritated at the Babylonians

(18:05):
about, in saying that if theyspeak the same language, nothing
will be impossible to them; andso, I can't think of a more
ringing endorsement.

It's a good point, and it just shows that
how, even back then, how -having to take the religious
stuff out of it - justsocietally, how important it is
to have common ontology for somany things, for understanding
and communication betweendifferent groups of people, and
how, if all of a suddeneverybody spoke different
languages like the Babel story,how hard it would be to scale

(18:37):
things across society orsocieties.
For sure.
So, let's get into a littlemore, I don't want to say
specifics, but some kind of usecases that help crystallize what
we're talking about here.
Talk to us about how Epistimisis using semantic modeling for
rule checking and how this helpswith scaling privacy as part of

(18:57):
this ontology approach.

Okay.
There's two fundamental thingsthat we're doing.
1) is the ontology, identifyingthese semantic concepts for the
data that's being used?
And then, the second part ofthat is, once you've identified
this, defining your process, 2)what the data flow is, through

(19:21):
your process at this conceptuallevel, using these concepts that
we've already agreed on.
If you think about it, in aprocess, basically you have two
things going on.
You have some kind of functionthat has a purpose and then you
have some kind of intermediatestorage for information where

(19:41):
data is going to be at rest.
So, you may receive data in;it may go directly into storage
or it may go into a functionthat has a purpose; and then
eventually it gets stored; andthen it may get read out again
and processed for some differentpurpose; and it gets written
out somewhere else; and then youcan continue that read-write

(20:02):
process cycle however many timesyou need.
If you do this at a conceptuallevel, then all you care about
is, conceptually, here's thedata, the data structures, the
relationships that I care aboutthat's going into a function.
Here's the purpose of thisfunction and here's the data

(20:24):
that comes out.
You don't really care aboutanything else, because then what
you can do is.
.
.the rules, whether they'relegislated or regulatory rules,
whether they come out of yourprivacy policy, or maybe they're
contractual, like you've gottendata from a third party broker

(20:45):
and so you've got somecontractual limitations on your
use of data.
Wherever the rules come from,now you can take those rules,
you can encode them in the tooland actually evaluate this
process design that you've gotagainst those rules; and you can
see "m I breaking the rules?
I'm good, I'm not good?

(21:06):
You may end up with somesituations where "it depends.
Of course, that's the area thatlawyers love.

Well, I don't know if they love it, but it's
the area they live in, for sure.

It's certainly the area they live in.
What happens is, if you'rebreaking the rules, that's an
engineering problem.
If it's an absolute rule, andyou're doing something in your
process design that breaks it,then the engineer just needs to
go fix it.
If it's one of these, "itdepends areas.
That's where you need to bringin the Legal team and say "okay,
what do we need to do?
Is this really a problem?

(21:41):
Maybe the law is just a littlebit vague, or how do we address
it?
The whole idea there, then, isbecause the tooling allows you
to evaluate these rules againstthis process design - the
process design is allindependent of specific code

(22:01):
implementation.
If the rules change, you canjust update the rules and
reevaluate as part of yourstandard build process.
If you go into newjurisdictions and there are new
sets of rules, you just addthose in.
You can be constantly up- to-date.
In fact, you can do thisevaluation before the rules

(22:24):
actually take effect.
If, for some reason, the ruleshave changed and what used to be
okay is not, you're going toknow that and you have time to
fix your process before itbecomes an issue.
Does that make sense?

It does.
It's definitely part of the"shift left mindset De-risk
earlier on, fix the problems inengineering to reduce your
compliance burden later on.
Absolutely -makes sense to me.
Maybe talk about how this wouldfit into the agile development

(22:58):
process.
How would an organization.
.
.
I s Epistimis good for any sizedorg?
How does an organization fitthis into their current agile
processes?

Okay, there's a couple of things.
Yes, potentially, any sizedorganization could use this
tooling.
Now, having said that, verysmall organizations - I have one
customer, for example; it's oneperson's shop.
She's a healthcare concierge.
She's not a developer.
She's never going to be adeveloper.
or people like her, she's notgoing to learn how to use the

(23:31):
tooling.
Instead, I've worked with her.
I'm doing the design work,basically on a consulting basis,
based on her input, and say,"okay, here's your design, let's
evaluate.
In her case, it's HIPAA, saying"Hey, what HIPAA rules are you
compliant?
And that kind of thing, or isanything being violated?

(23:52):
In very small business scenarios, we fully anticipate that we'll
work with consulting partners.
It could be large companieslike Ernst Young or PWC, or
small.
.
t here's a ton of privacyconsulting companies.
You go to the IAPP tech vendorlist, there's 400 companies.

(24:13):
Many, many, many of those areconsulting companies.
I've been in conversations witha few and continue to do more
of that to work with them asconsulting partners with very
small businesses.
Larger businesses, if you'vegot your own in-house developers
, you're probably going to wantto learn how to use the tooling.
Sure, we can train you how todo it, we can get you

(24:37):
kickstarted and let you takeover.
We can help you as much as youneed and let you go on your own
as much as you want.
That's up to you.
Very large companies like Metacould use this tooling.
They're probably going todevelop stuff in-house.
I'm a realist.

Yeah, the really large companies, almost all of
them, take the perspective that"we are so unique, so big and
unique in what we do, thatthere's going to be nothing off
the shelf that we can just comeand customize appropriately.
We'll need to just build ourown.
I'm not surprised you feel thatway.
I would advise any company, itmight be hard to ever sell to an

(25:18):
enterprise like them.
What advice do you have fordesigners, architects, and
developers when it comes tocreating and implementing a
privacy ontology, taxonomy, andsemantic model to get started in
their orgs?

Well, one of the things that.
.
.going back to the whole"common language, I think is
very important.
.
.
I actually think that IOPD wouldbe a great organization to do
this.
We should have.
.
.

&t;T

Yes, correct.
It would be a greatorganization to create and
manage a privacy ontology.
Now, the W3C has some stuff.
There's been some attempts inthis direction in the past, but
really we should have a commonlanguage.
It really makes sense that astandards organization should be
leading that, so that not justEpistimis, but Epistimis,

(26:14):
Privado, PrivacyC ode, Big ID,whoever these companies are, we
should all be speaking the samelanguage, because part of what
that ends up enabling is "toolinteroperability.
" Also, it's the end userdoesn't have to keep switching

(26:34):
their mental model each timethey're going from one tool to
the next, because that wouldresult in a lot of mental gear
grinding.
That's the first thing.
Now, having said that, forexample, the Epistimis modeling
tool, EMT (and the pun isintended, by the way), comes

(26:58):
supplied with a base-levelontology that's pre-defined.
The rules in all of the GDPRand CCPA and things like that,
those rules are defined againstthat base ontology.
So, that's already available sothat end users don't have to go
and try and figure that out.

(27:21):
One of the things.
.
.
okay, this is going to probablyget a little bit technical.
When defining an ontology,there's a couple of things, in
order to make sure that it'susable, you need to be able to
do.

That is (27:38):
1) you need to accommodate the realities that
sometimes people are going tocall the same thing by different
names.
So, you need to supportaliasing.
That's just a practical thing.
Whatever tool you have has tobe able to do that.
EMT can do that.
Since I knew that was an issue,I just built it in.

(27:59):
That's one thing.
2) Another thing is that theontology will be the union of
all the different things thatyou need to represent, but not
every user will need everything.
You need to have a way forpeople to use only slices or

(28:24):
subsets of the ontology fortheir particular application.
That's also built in.
Now, for example, in EMT,the way we do that is, if you're
familiar with SQL forrelational databases, there are
select statements in there.
You can say, "elect this field,this field, this field off of
this table.

(28:45):
The idea there, in databases,is that you get back actual data
.
You're querying a database, youget back actual data.
We actually use the same syntax, but what we're doing is
saying, "here's the subset ofthe concepts from this

(29:05):
conceptual data structure that Iwant to use in this particular
function.
" You're not querying adatabase; you're just taking a
slice off of your ontology.
It does.
] Okay.
You need stuff like thatbecause, like I said, the
ontology has to be the union ofall these concepts.

(29:28):
You don't want to force peopleto actually use everything
because in many cases they don'tneed to.

So, it's also about making it simpler to make
choices.
So, you're only surfacing atthe higher level - he important
choices that you need to make.
Is that correct?

Well.
.
.right.
Yes, the fundamental challengewith tooling, particularly
tooling at this abstract level,is of use.
It's a mindset shift.
Developers are used to writingcode in JavaScript, Java,
whatever; and now you're saying,"Okay, this is similar, but

(30:11):
it's not identical, and so youwant to make the tooling use
familiar paradigms wherepossible, but also identify all
of the pain points, all of theplaces where the differences

(30:32):
could cause them to stumble orbe less productive than they
could be.
And that's actually true forany tool.
I mean, this is not true justfor this, but it's certainly
true for this, because you'reasking people to think at a more
abstract level.
Now, the payoff here - one ofthe nice things about generative

(30:54):
AI is that we are not going towrite code in the future like we
have in the past.
I don't know if you've lookedat things like GitHub Co-Pilot.
There's a lot of differenttools out there now where all
you have to do is you go intoyour favorite code editor and
you write a comment says, "hisfunction does X, and then GitHub

(31:17):
Co-Pilot will just generate thecode for you that does what you
just described.

[Debra (31:21):
that's pretty cool.
] Yeah, it is pretty cool.
But, what that means is, asdevelopers, we're not going to
write code the way we used to inthe past, so adopting tools
that work at an abstract level,that work at a semantic level,
is really where we're going toend up.

(31:42):
So, EMT is developed with thatin mind, so that you can work at
this abstract level, and then,when it gets to the point of
actual code, a lot of that codeis just going to get generated.

So, you don't have to focus on that part
because it'll just beauto-generated from the prompt.

Right.
Exactly, because that won't bewhere the fun is anymore.

Makes sense.
I really like use cases thathelp crystallize a larger idea,
and so I'm going to mention someother privacy tech companies
right now that are in theprivacy engineering space,
because I want to really getclear on where Epistimis'
privacy- first tooling can fitin with other tools on the

(32:28):
market, rather than replace them.
What you're suggesting is not,you know, buy Epistimis and then
you don't need these othertools; but it's also, I think,
at first glance not clear, as wehave all these new privacy
engineering platforms coming tomarket that do different things.
I think if we go through thisexercise, where I will mention a
particular type of tool orplatform out there and what it

(32:49):
does, how you could wrapEpistimis' privacy- first
tooling around these other tools, does that sound good?

Makes perfect sense.

Okay.
I think it'll make sense to theaudience, too, as it helps to
make clear how you can help withprivacy and scaling it.
First I'd like to start withdata discovery and mapping
platforms like a Big ID or aSecuvy.
How does Epistimis' processdesign tooling work with
discovery and mapping platforms?

Okay, so for companies like that, in
particular, those two companieshave a lot more than just that.
So, I want to talk about someof the other things that they do
.

Sure, absolutely .
They do more than just datadiscovery and mapping.
But, that's kind of how we'rebranding them, because once you
do the discovery part and themapping, you could do so many
other privacy, security, andgovernance things.

Right, right.
So, those tools or thosecompanies, the tool suites from
those companies, can be bothinput to .
.
.and it can be a symbioticcycle.
So, for example, if you'redoing data mapping from one of
those.
.
.
using a data mapping tool fromone of those companies, then
that can be input into buildingyour basic conceptual model in

(34:07):
EMT, because any time you dothis kind of mapping, at some
point human beings got to lookat that output and say, "yeah,
that's correct or no, I need toadjust this or whatever.
And when you're doing that, youare effectively building your
conceptual data model, which isthe key pieces that EMT uses.

(34:28):
Okay, so in that sense, thosetools can provide input into EMT
.
Now, the other side of this isif you look at those companies
and they do other things likeRopas and PIAs and stuff like
that, the output from EMT canthen be fed into those tools so

(34:49):
that if you need evidence for aPIA, then the results that you
get from running the rules inEMT can be the evidence that you
use for those PIAs.
Or, in the case of, you've gota process to handle consent
management or Ropas or somethinglike that EMT, because it's a

(35:11):
process design tool.
One of the things that it willdo is it will detect oh, you
need to have the ability toprocess RoPAs; you need to have
consent management.
I don't see it in your processanywhere.
Well then, what you can do isthat we'll have it.
We don't have it yet.
The idea is that you'll have agraphical editor to draw out

(35:32):
your process design and you'llbe able to just drop widgets in.
"Okay, I'm using Big ID, so I'mgoing to use their Consent
Manager, I'm going to use theirRopa tool, and you're just going
to be able to drop that intoyour design and saying okay,
here's my Ropa tool, it's Big ID, we're going to wire it all up.
Here's my Consent Manager, youwire that up, and then what can

(35:52):
happen is that EMT, as part ofhis output, can actually
generate all the configurationinformation you need to
configure those tools so thatthey are doing what they're
supposed to do.

It does make sense, yeah.
So, what about platforms likePrivado, who is also our show
sponsor, which scans a company'sproduct code and then surfaces
privacy risks in the code; andthey also detect processing
activity for creating dynamicdata maps?
How does Epistimis' processdesign tooling work along with a

(36:28):
platform like that?

Well, Privado, I think, is a great example of
what I like to think of asinsurance; and there's a very
positive relationship betweenthat kind of.
.
.what you're doing with thatcode scanning and what EMT does,
because one of the things thatwe know about human beings is
that they don't always play bythe rules.

(36:52):
You can have a design tool.
You might even generate codefrom your design tool.
But, you really should have away to check the actual code
that's out there to make sureit's actually following the
rules that you thought you werefollowing.
You can use, for example, ifyou're just scanning your data

(37:19):
stores, you can use somethinglike Privado and a code scan to
do an initial pass to create asemantic model and your basic
process design.
Okay, that would be used in EMT.
So it can - that part could beinput into EMT and then you can
do all the rule evaluation inEMT.

(37:40):
If you choose to do, as youupdate things, you might want to
standardize; or if you decideto switch technologies - you
know, this was written inJavaScript and we're going to
switch it over to Kotlin orwe're going to do this in
TypeScript now instead ofJavaScript, or whatever, you
could switch technologies.
If you want, regenerate code ina different technology that

(38:00):
goes back out of your code base,well then you can have stuff
like Privado there doing thosechecks on the actual code base
again to say, "okay, well, weverified this at design level,
but what did we actually get anddoes it actually match the
design?

It does make sense.
Thank you.
And then, lastly, what about acompany like Privacy Code,
Michelle Dennedy's company,which transforms written
policies into consumable tasksfor developers and business
teams?
So, for instance, they have alibrary of privacy objects;
implementations for agileprivacy, like success criteria

(38:40):
and sample code; and then theydeliver meaningful metrics on
how the privacy engineeringprocess is going.
So how would Epistimis' processdesign tooling work alongside a
platform like Privacy Code?

Okay.
So, Privacy Code scans yourprivacy policy, generates all
these tasks, puts them out inJIRA.
Here's what you need to do andbasically here's the rules you
need to follow, based on yourprivacy policy.
So, what we want to do isactually turn those in to
executable rules at Epistemus sothat we can verify that you've

(39:21):
actually done what Privacy Codetold you you were supposed to do
.
So then, as you do your design,then we've got those converted.
In that sense, Privacy Codebecomes input in ENT, and so
then we can evaluate those rulesagainst this design you've

(39:42):
built.
And we can even use, forexample, the sample code that
they have - I mentioned earlierabout the BigID or something,
the drag and drop.
We can do a similar kind ofthing with these privacy objects
that Privacy Code has so thatyou can insert those into your
process wherever is appropriate.

(40:03):
And then, when or if yougenerate code or if you're just
just evaluating against themodel, you can say, "okay, yes,
this is what the rule was.
Is the model up to snuff?
Is it actually following theserules now?
So we, you know, we can takethat from just being something
in JIRA to actually verify thatyou've done what you were

(40:28):
supposed to do.

Got it, and that's so helpful, too, for
teams.
So, where are you in thecurrent product development
process?
Are you seeking collaborators?
Are you looking for POCs?
Are you at the point whereyou're selling the product?
You have the audience, you havethe floor right now.
Who do you want to collaboratewith?

We're very much interested in finding pilot
customers, people interested indoing a proof of concept.
You know, right now the initialfocus is GDPR.
Here we are close to the end ofSeptember 23rd.
The goal is to, by the end ofOctober, have a rough cut on

(41:11):
Articles 1 through 45 of GDPR,which are really the only ones
that are addressable with thisapproach.
Then, once that's done, thenwe'll start looking at U.
S.
State law which, in terms ofthe kinds of rulesets that we
support, the basic modeling isall there and you know we need
to improve the documentation andimprove usability.

(41:31):
I'm very much interested inidentifying people who want to
give us feedback right now.
The feedback is important.
It's free to use.
Just, I want your feedback, andwe're interested in identifying
potential consulting partners,people who are in the privacy
consulting business and arelooking for ways to help their

(41:55):
clients.
Because, if you're a privacylawyer or you're a consulting
company, you want to make surethat you're thorough.
You don't want to drop the balland somehow miss something, but
you also want to focus yourenergy on the areas where your
expertise is super valuable.
So, the idea is that EMT canmake sure that you're thorough.

(42:18):
It can evaluate all the rules,make sure that you're not
missing anything, and then youcan focus on the areas that
really matter.
So, we're looking for peoplewho are interested in those
kinds of partnerships.
I'm not going to lie, it'sstill rough.
I mean, this is very early days, so it's sneak- peak time, but
I'm very much interested incollaborations - people on

(42:41):
multiple levels.

Excellent.
Well, I hope that you getmultiple people ringing you up
to talk about how we could makea privacy ontology, you know,
eventually standardized, too.
Right?
I would think that at somepoint, whether you're leading
that initiative or you wereworking with others in some
organization that leads that,that we can get an industry
standard around about a privacyontology to really bridge that

(43:07):
gap between privacy engineeringand Legal GRC.
You know, just the business ingeneral.
Tell us more.
I know you've got someinteresting privacy tools that
you plan to ship in the future.
Tell us a little bit aboutwhat's on the roadmap before we
close for today.

Okay.
Well, the one thing that I'mexcited about right now - I call
it 'war gaming.
' For those people who've readDaniel Solove's piece, "Data is
What Data Does," and I thinkit's just out in preprint right
now.
So he really hits on a veryimportant point there, and that
is that the current approach toprivacy law is fundamentally

(43:47):
flawed.

And he wrote the book on it.
I took privacy law in 2004.
It was the book that he wrotealong with my professor, Paul
Schwartz, and to this day thatit is the same updated of course
, the same legal law book thatis being taught in most law
schools.

(44:09):
So just putting that out thereto everybody who's listening
here.
Proceed with Dan Solove.

Yes, I did ping him earlier today because I want
to get his feedback on this,but we need a different
approach; and so the foundationthat EMT provides is not just
about implementing what'scurrently the law, but it's a
generalized foundation where weshould then be able to war game

(44:38):
new rules, new approaches torules, and see what happens.
In a different one of thesepapers, he talks about inference
and he mentions the inferenceeconomy and whatnot and the
challenge of it's not just whatyou know about something, but
what you can infer about thembecause of advances in ML.

(45:01):
And so, part of the issue hereis can we detect the risk to
people in the models?
Because, we see, well, you'vegot all these different pieces
of data that are all flowing tothe same place.
We have statistical evidencethat says, for example,

(45:22):
statistically, we know in theUnited States, if you know
someone's birthday, zip code,and gender, you can identify the
specific person 85% of the timewith just those pre-pieces of
information.
If we can look at our models andsay, "okay, we've got these
different pieces of datafloating around, they all flow

(45:42):
together, there's a riskoccurring here that information
will be inferred that was neverconsented to but could end up
mattering, and that could besomething like you can infer,
for example, somebody's racejust based on their zip code.
If you're making decisions aboutwho can get credit, their

(46:02):
federal law on credit is thatyou can't use race as a
criterion; but, you might end up.
.
.completely incidentally,because the ML, you might,
incidentally, be using thingsthat are proxies for that.
So, if we have tools that canidentify these risks in the

models, now we can start to see: "an we develop workable laws (46:20):
undefined
that actually achieve our goalof privacy, as opposed to what
we're doing now, because whatwe're doing now, particularly
with the advances in ML, itdoesn't work anymore.
It's not achieving its goal.

(46:40):
" So the idea there is that EMTis being built with a
fundamental foundation that willenable us to start doing this
kind of wargaming and startdetecting these kinds of
patterns, and I'm very excitedto see where that leads, because
we really need to, if at allpossible.

(47:04):
We need to get out in front ofthis because the technology
right now, technology isstripping away our privacy
extremely rapidly and we need tofigure out a way to catch up.

Yeah, I think that makes a lot of sense and
that it's akin to findingvulnerabilities in code.
It's red teaming, basicallyanother type of red teaming;
but, in the AI space, it's abouttrying to test and make sure
that your rules will not bebroken.
So I think that's reallyexciting.
As many listeners here know,because I bring it up often

(47:39):
whenever it's relevant, myfiancé is a hacker.
So, this is like constantconversation in our household,
talking about addressing risksin code.
We need to do it in AI for biasand for fairness, for figuring
out how do you make a trustedproduct.
So, if you want people to trustyour products in the future,

(48:01):
and not believe that it'seroding your privacy, being able
to say that you're wargamingnew rules or at least a type of
test to make sure that it meetscertain criteria before you
never ship it, I think isessential.
So, kudos to you for lookingforward and seeing where
technology is going and thenwhat we need to get in place

(48:23):
quickly in order to be able toeven make decisions across a
business that involve Legal,Risk, and Engineering.
Right?
It's really hard to get thosethings matched when in Legal, by
design, a lot of things areleft as generalities when
they're defined.
Like 'reasonable security' -what's that?

(48:44):
How do you design forreasonable security?
Right?
What's that testing criterialook like?
Being able to have adiscussion where everyone's
understanding and on the samepage at a high level, I think is
essential.
So, Steve, thank you forjoining us today to talk about
privacy ontologies, privacyprocess tooling, and the
exciting work you're doing atEpistimis.

Thank you.
It's always fun.

Yeah, indeed! Until next Tuesday, everyone,
when will be back with engagingcontent and another great guest,
or guests.
Thanks for joining us this weekon Shifting Privacy Left.
Make sure to visit our website,shiftingprivacyleft.
com, where you can subscribe toupdates so you'll never miss a
show.
While you're at it, if youfound this episode valuable, go

(49:32):
ahead and share it with a friend.
And, if you're an engineer whocares passionately about privacy
, check out Privado, thedeveloper-friendly privacy
platform and sponsor of the show.
To learn more, go to Privado.
ai.
Be sure to tune in next Tuesdayfor a new episode.
Bye for now.

All Episodes

Episode Transcript

Popular Podcasts

Dateline NBC

24/7 News: The Latest

Therapy Gecko

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}S2E31: "Leveraging a Privacy Ontology to Scale Privacy Processes" with Steve Hickman (Epistimis)

Episode Transcript

Popular Podcasts

.css-r6mb8g{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:1;overflow:hidden;}Dateline NBC

24/7 News: The Latest

Therapy Gecko

All Episodes

S2E31: "Leveraging a Privacy Ontology to Scale Privacy Processes" with Steve Hickman (Epistimis)

Dateline NBC