S2E25: "Anonymization & Deletion at Scale" with Engin Bozdag (Uber) & Stefano Bennati (HERE)

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Engin Bozdag (00:01):
This means that it's not just doing a PIA with a
software engineer andcommunicating policy
requirements (like delete dataaccording to policy), but
actually understand technicallimitations, capabilities, and
maybe opportunities.
When you see common patterns ofissues across different
products, you should be able torecommend different technical

(00:22):
solutions that feel moreefficient for different teams.
How do you gather those skills?
Maybe join engineeringdiscussions as a privacy SME and
learn more about these softwarecomponents, or you build
software yourself to understandhow these systems work in
harmony.

Debra J Farber (00:48):
Welcome everyone to Shifting Privacy Left.
I'm your host and residentPrivacy guru, Debra J Farber.
Today, I'm delighted to welcomemy next two guests Engin Bozdag
, Senior Staff Privacy Architectat Uber (he's based in Austin,
Texas) and Stefano Bennati,Privacy Engineer at HERE
Technologies (that's H-E-R-E),which is based in Zurich,

(01:12):
Switzerland.
Engin is Uber's PrincipalPrivacy Architect and the Team
Lead of Uber's PrivacyArchitecture team.
He holds a PhD in AI Ethics andauthored one of the first works
of algorithmic bias.
He also helped create ISO 31700, the world's first standard on

(01:32):
privacy- by- design, which Icontributed to as well, here
from the United States.
Engin has gained extensiveexperience in diverse
organizational settings,cultivating a privacy-focused
career that has evolved over thecourse of a decade.
Throughout his journey, he hasassumed multi-faceted roles,
encompassing legal expertise,privacy engineering, engineering

(01:55):
management, research, andconsultancy in the realm of
privacy.
Stefano is a Privacy Engineerat HERE Technologies.
He holds a PhD in privacyalgorithms; and, at HERE,
Stefano worked on the technologybehind Anonymizer (HERE's first
privacy-focused product) andauthored a number of patents and

(02:17):
scientific publications.
Today, we're going to betalking about location data and
anonymization, what it means tobe a Privacy Architect, new
privacy architectures, and howto get privacy issues fixed
within scaling companies.
This is my very first timewelcoming two guests on the show
at the same time.
I'm really excited to dive in,so let's see how this goes.

(02:40):
Welcome, Engin and Stefano.

Thank you, Debra.
Thank you for the invite.
Great to be here.

Stefano Bennati (02:48):
Thank you very much.
Great to be here.

Great, okay.
So, I originally reached out toyou both because I saw that you
were presenting the talk, "Canlocation data truly be

anonymized (02:58):
a risk-based approach to location data
anonymization" at the 2023International Workshop on
Privacy Engineering, and Iwanted to dig deeper on that
topic.
I think my audience has a lotof thirst for that.
Let's get started.
What are some of the technicaland business challenges in
obtaining anonymization?
Engin first, that's to you.

Yeah, I think before we even think about
different anonymizationtechniques, we have to
understand the business use case.
What does the use case need interms of data and which
technique can be used to reducethe risk?
So we can't just start withanonymization as an end call.
We need to understand thepurpose.
So, if I can give an example,imagine you have an app where

(03:44):
the user might buy T-shirts.
They might click in the app.
They might type keywords - likethe size, design, or specific
images belonging to differentsearches.
So, usage data, and anonymizingof such usage data, is very
different than anonymization oflocation data because
re-identification risk forindividuals will be much higher.

(04:07):
I mean, think about medicalimages; it brings a totally
different complexity.
Going back to the location data, this is what our talk focused
on.
We looked into several featuresof location data that impacts
anonymity.
So, you can think aboutaggregation - the data belonging
to a single user or is itaggregated among a larger crowd?

(04:28):
Or, how long the data iscollected as a unit.
Do we need entire trips forlocation or do we need a subset
of trips?
How frequent is the data?
Do we collect it every second,every month?
And, how precise and accurateis it?
In an ideal world, you wouldreduce all these features and
arrive at anonymous data, butthere will be very different use

(04:51):
cases.
Think about finding the bestcharging station for an EV.
Or, you ask your AI assistantto give you some recommendations
on trips based on your habitsor fraud detection.
So you cannot reduce all thosefeatures.
So that's kind of the challenge.
And again, I notice in somethreat modeling frameworks that

(05:13):
one's full anonymity, but itmight or might not be possible
depending on the use case.
And, you also need to thinkabout additional controls on top
of anonymization.

Thanks.
That is a lot of good framingof the issue.
So, Stefano, what are yourthoughts on some of the
technical and businesschallenges attaining
anonymization?

Thanks, Debra.
So, let me start with thebusiness challenge and striking
a balance between different waysof anonymizing data and the
value that anonymized databrings to product development.
For me, the best trade-off iswhen the data retains enough
information to produce a high-quality product, while
simultaneously it does notretain enough information to

(05:55):
compromise the privacy of datasubjects.
By data subjects I mean thoseindividuals responsible for
creating, generating the data.
Each product has a specificrequirement or specific
requirements on data quality andsystem design, which directly
impact what trade-offs aredesignable.
For example, I receivesometimes questions regarding

(06:16):
collecting disaggregatedpersonal data to compute
statistics, which could as wellbe computed from aggregated data
- the reason being thatdisaggregated data might become
useful in future products.
When the business reason is souncertain and also the
requirements are not yet defined, it is hard to justify the
privacy cost.
So, in these situations, Ifollow the principle of 'data

(06:38):
minimization.
' Another interesting example issharing the location of a
vehicle that is involved in atraffic accident.
In this instance, the benefitsof providing assistance to
individuals involved in theaccident and ensuring the safety
of other drivers can indeedoutweigh the privacy concerns.
Under these circumstances, aproduct could be developed based

(06:58):
on the 'vital interest' legalbasis (under GDPR).
I invite you all to check outour presentation, which we gave
in our latest talk.
You will find an overview ofcommon use cases around location
data and the requirements.
Now, for the technicalchallenge, it is about obtaining
the best trade-off given theconstraints.
So, I could ask myself whichprivacy enhancing techniques are

(07:21):
compatible with my use case.
Homomorphic encryption could beideal to process this data, but
it won't scale to the quantityand throughput of data.
Differential privacy, I cannotuse because the data needs to be
disaggregated.
Then maybe local differentialprivacy could be an option, and
so on.
I want to say, choose thetechnique.
What is the best parametersetting?
To determine this, I need toolsto measure and compare the

(07:45):
privacy risk and the dataquality in different settings.

That makes a lot of sense.
Maybe we'll delve into toolslater.
So, for now, I want to know howyou deal with the fact that
different jurisdictions aroundthe world have different
definitions and guidelines foranonymization.
Stefano, tell us about HERETechnologies.

Yeah, sure.
At HERE we keep a globalprivacy stance and adhere to all
local regulations.
This means that we always applythe strictest privacy rules
from around the world.
At times, we fall back to applylocal regulations.
This only happens, though, withproducts that are contractually
limited to one jurisdiction andwe are confident it will not be
exported to other jurisdictionsin the future.

(08:28):
We keep this policy because wewant to avoid costly
retrofitting of products to becompliant to different sets of
requirements.

That makes sense to do.
And, Engin.

I want to mention a recent case in Europe that
tests these differentdefinitions of anonymization.
So, in this particular case,company A was sending data to a
Company B and the data Company Bwas controlled in terms of
re-identification.

So, the question was (08:57):
is the data at Company B anonymized
because the source data remainsat Company A?
There are different opinions.
Some argue it's not, becausethere is always a risk to
re-identify the persons; whereasothers say it depends on the
controls and how likely thisthreat will occur.

(09:22):
I think, we don't have a clearthreshold yet.
So, companies will have theirown policies on what an
organization is.
The assessment is done on acase-by-case basis.
You also have specificregulations, such as HIPAA,
which has a method for thede-identifying health data by
removing identifiers.
So, as a result, you know,similar to what Stefano has

(09:45):
stated, I observed atorganizations they choose a
global approach unless there's aspecific rule from a
jurisdiction.
That approach will depend on acompany's risk approach, as well
.
So, they will define theiranonymization policy accordingly
.

Yeah, you know so much is risk- based when it
comes to setting the policy.
So, I imagine every company'sgot its own set of risk levers
and risk tolerance that theyneed to consider.
So, that makes sense.
What roadblocks do engineersrun into when it comes to
certain anonymization techniqueslike generalization,
aggregation, density, noiseaddition, down sampling?

(10:25):
Are those challenges?
And then, how did you overcomethem in order to achieve an
anonymization?
Stefano, why don't we startwith you?

Sure, from what I've seen, engineers typically
get stuck when improvingexisting products - for example,
when they apply a newanonymization technique to the
product or adapt a product to anew requirement, such as
management of data subjectrequests (DSARs).
A common example of those areright to be forgotten type of
requests with data deletion, andso on.

(10:54):
Products are bound by previouscommitments with customers,
which might be challenged by theproposed improvements.
For example, changing theanonymization algorithm might
reduce the quality of data belowa certain target that has been
agreed with the customer.
Another example is data lineage.
Implementing this functionalitycan enable automatic processing

(11:16):
of data subject right requests,but it might also require a
rearchitecting of the productthat can impact on response
times or uptime targets.
In these situations, we try thetechnical approach - require
working closely together withdevelopers and product people to
deeply understand theimplementation and the
commitments.

(11:37):
This is a win-win situation, asprivacy engineers learn
real-world challenges, whicheffectively fuel privacy
technology innovation.
Once the product is wellunderstood, anonymization
parameters can be tunedappropriately.
This might have the downside ofreducing the privacy protection
in other use cases, though.
Then, the business approach isto talk with our sales team and

(11:59):
with the customers to discussthese commitments.

Oh, that's a really good point.
Rarely do we ever think aboutcircling back with the sales
team and customers.
That's a great point that youshould think about more often,
because that's kind of theinitial point of contact - with
customers, on data sharing.
What about you, Engin?

I think in smaller companies, it's not so much on
the techniques, but I stillobserve that the knowledge
around anonymization is a bitimmature.
So, I see that they just removesome PII and conclude that data
is an anonymized or deleted,which might or might not work

(12:38):
depending on the jurisdictionyou're operating in.
I think for larger companies,there's definitely the challenge
on getting those techniquesimplemented.
The first one is on lack ofproper tooling.
So, in large organizations youmight have custom database
technologies, but no one else isusing outside your company and
data scattered across multipleteams.

(13:00):
So, getting a tool externallyand getting it to work with your
tech stack is alreadychallenging.
And then, asking every engineerto aggregate, mask, or
generalize their own datamanually is not scalable, so
they need platforms to help themwith that.
So, the lack of proper toolingis one challenge.
Second is dependency within datasets.

(13:22):
So, a team might be okay withthe identifying or anonymizing
of their own data table, butthere might be downstream
services reusing this data.
So, this is called 'the problemof many hands.
' Engineers might not even knowwhy the data is even there at
the first place.
What is it doing there and whatis it used for?
So, before even starting withanonymization, you need proper

(13:46):
data classification andup-to-date data inventory so
that to determine if it might beanonymized.
And the last challenge I observein larger organizations is data
sets with too many quasi-identifiers.
If you look, for example, the2020 U.
S.
Census, it has nine questions,and there is also a U.

(14:06):
S.
Census American CommunitySurvey, which has 20 questions.
But in large companies, youmight have tables with thousands
of fields and there might behundreds of quasi identifiers.
So, how would you apply atechnique such as L- diversity
in this case, considering thistable will get updated
continuously?
So, to summarize, we need toolsto help engineers so that they

(14:32):
can deploy anonymity,anonymization techniques without
so much hassle; and, we alsohave to understand the limits of
anonymization and determine ifwe need additional controls when
it may not be feasible.

That's really interesting, just like listening
to both of you speak about yourchallenges.
It's enlightening for me, sothank you for that.
It's pretty exciting.
What have you encountered onthe business side in terms of
business or policy challenges ingetting anonymized data?
How do you get budgets and whatare the challenges with scaling

(15:09):
and regulatory requirements?
How could you overcome them?
Engin, let's start with youfirst.

Yeah, I think.
Going back to my example onthis U.
S.
Census Bureau, so the Agencyused differential privacy to
apply on the census data, whichimproves the confidentiality of
individual responses; but, thatled several researchers to
protest and they asked theagency to abandon using

(15:37):
differential privacy because itdelayed the release of the data.
So, it took some time before itcan be released to the public
and to these agencies, and itwasn't accurate enough for their
needs.
This is definitely an issue inorganizations as well because
once you say we have toanonymize the data, there will
be some delay and an expert willoften need to be involved,

(16:00):
which means extra costs.
In other circumstances, likethe government officials, they
might require preciseinformation.
For example, the government ofEgypt used to demand access to
sensitive location data and theydid not want to anonymize data.
These are some of thechallenges, not so much on
technical limitations, but moreon business and regulatory asks.

It's definitely not.

I encountered many of those challenges, and I
can tell about this.
HERE acquires data fromthousands of sources, such as
automotive manufacturers,commercial fleets, and so on.
The data flows between theseentities, so it must comply with
the anonymization criteria setby HERE, by the data providers,
and by the regulatory bodies.
So, you can imagine that thestandards typically do not look

(16:51):
alike.
They can differ a lot betweenentities.
For this reason, I heredeveloped the Anonymizer, a
solution to measure the privacyrisk of location data and
anonymize it to align it withthe predetermined level of
privacy risk tolerance.
This allows data providers toselect their preferred
anonymization level beforesending their data to HERE.

That's great.
Thank you for a peek insideyour org.
Now, I'm going to ask you aquestion I ask quite a few
people who come on the show.
What does it mean to you intoday's day and age to be a
'privacy engineer'?
Why don't we start with you,Engin?

Yeah, I think IAPP recently made an attempt to
define privacy engineering.
I think it's very interestingfor a discussion and they define
a broad range of jobs thatmight fall under privacy
engineering.
So that, for example, wouldinclude: software engineers
building privacy tools; UXdesigners minimizing dark

(17:50):
patterns, IT infra people likeDevOps that could configure
systems for better privacy; buteven professionals handling
physical architecture, likemaking privacy-friendly choices
in restrooms, patient rooms, etcetera.
It's very broadly defined.
So, I think it depends on theorganization, on how they define

(18:15):
privacy engineering and whatthey want from those privacy
engineers, skills-wise.
So, at Uber, our privacyengineers focus on first,
privacy-threat analysis in a newengineering design architecture
.
We take a look, and we askquestions, and we recommend
architectural changes so,privacy- by- design is actually
implemented, and the outcomewould be technical controls like

(18:38):
onboarding to a deletionplatform or reducing unneeded
data in request headers.
We also do technical audits toreduce technical privacy debt.
So, these are controls that aremissing in legacy systems.
We support our engineers withad hoc questions - how do you
delete data from this particulardatabase?
We support our legal colleaguesand give feedback on the

(19:01):
policies.
We also give specs andrequirements to our software
engineers in what to build andwhich new features do we need in
existing tooling?
So to summarize, we connectprotocol engineering to legal
and other security teams andensure that engineers get
practical privacy advice.
That might be very different inanother organization.

(19:22):
Maybe in another organizationprivacy engineers are the
software engineers buildingprivacy tools.
So, you should understand whatthe organization wants from a
privacy engineer and not justfocus on the title.
That's advice I can give.

Yeah, I think that makes sense.
And if the episode that cameout right before or that came
out, let's see, two weeks ago,that would be this past Tuesday,
present day, but two episodesago, for anyone who's listening
to this with George Ratcliffefrom Stott and May Recruiting,
he really goes into kind of whatcompanies are looking for,

(19:59):
based on their needs and from aperspective of a recruiter
across multiple organizations.
So, I encourage folks to take alook at that episode as well.
But, Stefano, I would love tounderstand your perspective on
privacy engineering and what itmeans to be a privacy engineer.

Yeah, I agree with what Engin said.
So, privacy engineering is anincredibly broad discipline.
I like the description ofprivacy engineering specialties
by Lea Kissner and Lorrie Cranor, which was published in their
paper, "Privacy EngineeringSuperheroes.
Also, the title is pretty good.
Yeah, it is.
It resembles the IAPPclassification.

(20:35):
There are a few more rolesbeyond what Engin mentioned.
So, we have analysis,consulting, privacy products,
math and theory, privacy policy,incident ,and vulnerability
response.
So, it's very broad and eachspecialty has a unique skill set
.
But I think that all privacyengineers must have a broad
knowledge of the portfolio andalso how competitors handle the

(20:58):
same privacy issues in theirsimilar products.
In my day-to-day job, I am thepoint of contact for different
stakeholders for a variety ofprivacy topics.

This requires (21:06):
deep technical knowledge to compile data-driven
reports that supportstakeholder making an informed
decision; business knowledge tounderstand the perspective of
each stakeholder; as well ascommunication skills to clearly
explain the trade-offs to thestakeholders.
I also want to stress that thekey here is to work with
developers and product managersas opposed to throwing them

(21:29):
blockers and some impossiblerequirements from the top of our
ivory tower.

Thank you.
So next, I want to understandhow companies should think about
data deletion capabilities,especially when it comes to
integrating third-party tools.
How can these tools bebasically integrated into a
company's architecture?
Engin, let's start with you.

I'm going to get a little bit technical and I
promise I'll keep it short.
So the complexity arises whenyou have, again, a large company
with a distributed architecture.
In distributed architecturesyou have so-called
'microservices.
' So, think of them as thesesmall software blocks that do
one thing very well but theyrely on others for function.
So, you kind of separate thebusiness logic into different

(22:17):
software blocks.
So, you might have amicroservice for payments,
another microservice for usersign-up, et cetera, and these
microservices will have theirown database and they will store
their own data.
So, going back to your question,when we get an erasure request,
a right-to-eraser request, froma user and it enters a company,
typically a dispatcher servicewill take it and they will

(22:39):
forward it to these differentmicroservices and expect them to
do their own deletion.
There are different ways youcan get this done.
You can use an asynchronousmodel where you put messages in
a queue and wait for theservices to consume it.
Or, you can do an APIintegration so that dispatcher's
API endpoint will beimplemented and these services

(23:01):
will be listening to theseincoming messages.
Both approaches have pros andcons.
When a third-party data sharinggets into play, it gets more
complicated.
So again going back to myexample on company selling
T-shirts.
Let's assume the company usesanother vendor for actual
delivery of the T-shirts andassume that a microservice at

(23:24):
Company A is doing theintegration and sending this
data to Company B.
So, who would actually sendthis erasure signal to the
third-party?
You can have a centralizedapproach where one service
handles all third-partyintegrations, or you can do
different data ownersintegrating with different third

(23:46):
parties.
You should also think aboutauditing and logging off a
request, because once dataleaves your network, those
external requests may be lost.
Do we keep on retrying?
H ow many times?
What do we need to log?
Do we need to log that thethird-party received the request
?
Did they execute the request ordid they just put it into their
backlog?

(24:06):
And, maybe we don't use APIsbecause the third-party doesn't
have support?
Are we comfortable in sendingCSV files?
So, as you can see, there aremany engineering decisions to be
made here.
The legal requirements willjust start a discussion, like
forwarding requests tothird-party, but that's only the
start of the engineering andthe architecture process.

That's pretty enlightening.
It just really shows you howmuch more complex the actioning
upon the legal requirement isthan the requirement itself.
So curious if there's anyattempts at standardizing
requests.
Or, is there an IEEE workinggroup or something on
third-party requests for datadeletion, or is that a good
opportunity for someone tocreate so that it standardizes

(24:51):
these as best practices acrossorganizations?

I'm not aware of any work on this field, but, as
you said, this is definitely agood opportunity.
I'm noticing many companieshave, you know, using different
approaches, they have differentAPIs, different requirements on
APIs - so, this is definitely anarea where we will need more
standardization.

Yeah, that would probably help with
decision-making and design.
Ok, so, Stefano, what are somegood career paths or skill sets
for becoming a good privacyengineer, and where do you
suggest people learn more?

Yeah, good question.
As mentioned before, the fieldis very broad and so there's
always the opportunity to studya particular subfield (for
example user experience) andthen get involved into a privacy
project later on.
Once into the privacy space, Ican suggest some general skills
that can help any privacyengineer.
Risk management is the top onefor me.

(25:49):
It is very important tounderstand risk and talk risk,
because a black and whiteprivacy thinking doesn't really
help.
Then communication - a privacyengineer needs to communicate
clearly to stakeholders withdifferent backgrounds.
Learning the basics of otherbackgrounds, for example policy,
business and so on, helps a lotunderstanding the point of view
of the stakeholders andcommunicating more convincingly.

(26:11):
Then we have statistics.
It helps estimating privacyrisks and talking about those,
which sort of grounds yourreasoning.
Then, of course, softwarearchitecture - very useful to
give realistic improvementsuggestions to proper teams.

Thank you, that's super helpful! What about
you, Engin?
What are some good career pathsor skill sets for becoming a
good privacy engineer and wherecan people learn more?

I think it depends on type of privacy engineer you
want to be, but for our privacyengineers (at Uber) we
typically have two paths.
The first one, which I want tocall 'Architecture to Compliance
,' is we have folks withextensive experience in building
software systems, and theytypically gain more on the
privacy compliance side.

(26:57):
I sometimes hear from otherprivacy professionals that
compliance knowledge is notnecessary or should not be the
main focus, because we should gobeyond compliance anyway.
But, if you do not graspconcepts such as secondary use,
pseudonymization, lawful basis,your understanding of privacy
threats will be limited.

(27:18):
I met software engineersbuilding privacy tools thinking
you always need user consent orconsent is sufficient to satisfy
user privacy.
The second path, which I callthe "ompliance to architecture
part is when you have a seasonedprivacy pro learning more on
the software components orsystems.
You don't need to actually code, but you need to speak the same

(27:41):
language as the engineers.
The advice you give is granularand actionable.
This means that it's not justdoing a PIA with a software
engineer and communicatingpolicy requirements like delete
data according to policy, butactually understand technical
limitations, capabilities andmaybe opportunities.
When you see common patterns ofissues across different

(28:04):
products, you should be able torecommend different technical
solution that will be moreefficient for different teams.
How do you gather those skills?
Maybe join engineeringdiscussions as a private SME and
learn more about these softwarecomponents, or you build
software yourself to understandhow these systems work in

(28:25):
harmony.
The second path is moredifficult because there is a lot
of learning curve.
Many privacy pros have given updue to this learning curve.
But the first part, it does notdeliver many privacy engineers.
Often, software engineers don'twant to narrow down their scope
.
They want to have a largerscope in software engineering.

(28:46):
Typically, only those that arereally passionate about privacy
stick around.

That's a really fascinating insight.
Thank you for that.
That really is because everycompany has its own way of doing
things and it's nice to knowthat there is an opportunity for
privacy pros that want to getmore technical, like applied
technical, and work withengineering teams that aren't
necessarily coders.
But, it's also fascinating tounderstand why it's harder to

(29:14):
get privacy engineers pumped outif they're not coming out of a
Carnegie Mellon PrivacyEngineering school with a
master's and a PhD in the topic.
Like, where do you go findthem, right?
Why is it so hard to find them?
I think you gave some goodperspective on why they might
not be drawn to a hyper-focusedengineering path like privacy

(29:35):
engineering.
But I'm with you.
I think it's freakingfascinating, obviously, and I
just I can't get enough oftalking about privacy and data
protection, so I'm definitely inthat second bucket.
Let's broaden the discussion alittle bit to something that's

so hot right now (29:49):
AI - especially for generative AI and
LLMs (large language models).
What should privacy engineersunderstand about vendor privacy
requirements for LLMs if they'rebringing that into an
organization?
Let's start with you, Stefano.

Yeah, I think even before starting to evaluate
vendors, privacy engineersshould verify under what
circumstances or preconditionsthey can apply artificial
intelligence to their use case.
Some AI-related regulations andguidelines review are the EU AI
Act and the Ethics Guidelinesfor Trustworthy AI.
Once you verify that, then youcan think about which vendors

(30:30):
you want to choose.
An approach that I take whenevaluating vendors is to put
myself in the shoes of theirprivacy engineers.
I collect as much informationas possible about their business
model and product design, thenthink about how I would build
privacy into their product.
Given these constraints, coulda privacy engineer justify to
their management addingprivacy-enhancing technologies

(30:52):
and minimizing data collection,or does it have too big of an
impact on their business needs?
Another complementary approachis to ask the vendor to invite a
privacy engineer to the productdemonstration and evaluation
meeting.
That is a quick way to get abetter understanding of the
privacy stance of that productand also helps your fellow
privacy engineer to prove totheir management that privacy is

(31:14):
an important feature of theirproduct offering.

I like that approach.
I think that makes a lot ofsense.
What about you, Engin?

I think this is an interesting field that is
moving very fast.
I observe companies going withdifferent paths.
Some of them are using opensource on- prem models, which
reduces the risk of databreaches, but it requires quite
a hefty investment in computingand personal resources.
Also, I'm hearing theperformance might not always be

(31:43):
optimal.
Commercial models provide muchbetter performance and there are
no front calls, but there arequestions around their security
posture.
So, as a company, I thinkbefore using LLM, the first
question, you should ask, "Doyou really need LLM or
generative AI capabilities foryour purpose?
Maybe you can use traditionalmachine learning for your use

(32:05):
case.
Maybe you want to be able toexplain the decisions in detail.
So, the first step is tounderstand will you really
benefit from these LLMs and isit work the risk?
Second, after deciding whetherto use an on-prem or a
commercial offering, you need todo diligence, thinking about
security, environment of thevendor; but also, is the vendor

(32:30):
storing your input and outputfor follow-up analysis?
Are they using such data toretrain their own models?
There are serious concernsaround hallucinations.
For example, the model creatingsyntactically and semantically
accurate output, but it's notfactual.
What about bias in the datasetof the vendors?

(32:52):
It depends on what type ofproject you have.
If you do a general marketanalysis, maybe you can't
tolerate some inaccuracy; but,if you're trying to automate
hiring and sourcing for example,you feed in a job description
and LLM helps you finding theright candidates, you have much
higher risk.
And other things to consider isdata deletion.

(33:13):
If your data is used by thevendor, how will they delete the
data?
And deleting data from trainingdataset, is it sufficient for
the model to forget thatparticularly user's data?
There are all theseconsiderations and many more you
have to make, and you alwayshave to be transparent to your
end users if you decide to useone of those LLMs in your

(33:36):
product offerings.
So, probably you have a riskgovernance framework already in
order, but you might need toupdate it based on these new
threats.

Yeah, yeah.
And then, with all thosequestions, it'll be interesting
to see what regulatory bodieslike the FTC and others in the
EU, what kind of potential finesand requirements they'll place
on an organization.
If they want you to delete thedata set, they might actually
not just make you delete thetraining data, but delete the
model it was trained on.

(34:06):
In the U.
S.
, we call that "disgorgement.
It'll be interesting to seewhat the FTC, what their stance
is on companies that they feelunethically train their data
with personal data or otherrisks.
So we'll stay tuned.

To add, I think FTC has this interesting report
on OpenAI.
I recommend all the listenersto read.
You can understand what theyare expecting from a generative
AI offering.
It offers an interesting riskframework and that will be
valuable to understandexpectations.

Great, I'll look that up and I'll try to put
that in the show notes for otherpeople to access.
Ok, so what other trends areyou seeing in the world of
privacy engineering?
Stefano?

My impression is that the industry is slowly
reaching a more mature statewhere tools to minimize risky
data are available and easy touse.
Now, the focus of the privacyengineer is shifting from
product-specific controlstowards company-level controls
which govern privacy risksacross products and data types.
For example, HR data, productdata, and so on).

(35:14):
Examples of privacy governancesolutions are data and product
catalogs, which collectprivacy-related labels and
information on how data isprocessed in the company.
The most advanced solutions areable to dynamically update this
information by monitoringchanges in the data or in the
source code.
I also see this shift in theprivacy vendors, with the market

(35:37):
moving from early solutionsfocused on providing
organizational algorithms forspecific products or use cases
towards company-wide governanceof privacy risks.
One example is Privado, whichwe use here to monitor the
compliance of our products anddata lineage.

Awesome.
Privado is also the sponsor ofour show and I sit on their
Advisory Board, so I'm a big fan.
Engin, I think you also have arelationship with Privado that
we want to disclose.
And, why don't you answer alsothe question what trends are you
seeing in the world of privacyengineering?

Yeah, I also provide a consulting service to
Privado.
Privado provides, or is workingon, practical privacy solutions
.
I think vendors should listento privacy practitioners to
better understand what theindustry needs.
I noticed some tool offerings,but that's a problem we had a
couple of years ago.
We already have in-housedeployments and there are other

(36:36):
areas where we really need avendor to step in but that's not
addressed.
So, typically some vendors seeit as an undesired cost, but
building something that doesn'tprovide much added value for
practitioners will lead costlyredesign.
So, I'd recommend let's buildsolutions that fix our biggest

(36:56):
problems first.

And so, just so that we explain to people why
we're even mentioning Privado,if you could just describe their
tool and how that helps youwith things like anonymization
or location data, or how do youuse it and why are you bringing
it up?

We don't use Privado at Uber, but based on
what I've seen on the tool, itcan detect new assets being
added to the inventory, new APIendpoint being added.
It also does code scanning, sosome privacy threats in code can
be identified and managed.
What they're also working on,long- term, is something to help

(37:34):
with the design reviews andunderstanding privacy risks that
might come out early on andthen tied back to the code parts
where you can actually validatesome of the requirements you
set during the design review.
I think that's very, veryuseful.
Many companies do designreviews or PIAs, but that ends

(37:56):
up as some tickets in a specifictool and validation becomes
more difficult.
Tying the code analysis back tooriginal design requirements
will be very, very useful.

Yeah, that makes a lot of sense and really key
to DevPrivOps as privacy isbeing added to the DevOps
process.
Awesome.
Okay, so moving on, how do youguys deal with conflicts between
engineering, legal, andoperations teams?
I know that you are very muchinterfaces between all of them,
so, yeah, what's your approachto dealing with conflicts?

(38:30):
Let's start with you, Engin.

I think when a conflict arises, typically it's
because accountability is notvery well defined.
I'll give an example.
In the early days of GDPR, I'veseen an organization where the
Legal team owned privacy, butthey do not want to define
operational or technicalrequirements for engineers
because they didn't feel likethey had the right expertise.

(38:55):
But, those legal requirementsdid not work for the engineering
teams because they neededsomething much more tangible and
actionable, but they lacked theknowledge of privacy laws and
regulations.
So, they couldn't themselvescreate those requirements.
And, things changed after theyhired their first Privacy
Architect.
I won't tell who this person is.

(39:16):
Y ou can make your guess.
In an ideal setting, I think,the Legal team defines policies,
works with the risk (acceptedrisk), and informs the rest of
the company on new developments.
Privacy engineers and moreoperational folks will help
translate those requirements butalso give feedback to the Legal

(39:38):
team so that we update policiesbased on technical
capabilities, but we also buildnew tools and features based on
policy changes.
So again, define accountabilityproperly and the teams, I think
, will work in harmony.

That's great - a great perspective.
What about you, Stefano?

I think conflicts arise because each of
these teams brings a differentperspective to the table.
For example, Legal teams tendto have a better understanding
about compliance risks thanProduct teams.
Well, Product teams tend tohave a better understanding of
business requirements.
The role of the PrivacyEngineer is to evaluate these
perspectives in an objective anddata-driven way, then guide the

(40:21):
stakeholders to reaching anagreement.
Of great help for successfullydealing with conflicts, I find a
strong privacy culture in thecompany.
Like revenue and efficiency andothers, privacy is not
necessarily a recognized-broadly a broadly- recognized
value, and so stakeholders arenot necessarily ready to
compromise for it.
From my experience, the mostefficient way to build a privacy

(40:43):
culture is to educate uppermanagement and have them
advocating for privacy in theirteams.
Who's responsible for educatingupper management is the Privacy
Engineer.

Yeah, so it's a lot of responsibility besides
just engineering.
I know a lot of responsibilityfalls on Privacy Engineers to
get stuff done even at a higherlevel.

Stefano (41:04):
You never get bored.

Debra (41:05):
You never get bored! Okay, so what's the best way to
get privacy issues fixed in anorganization?
Engin?

I think, because the challenge in getting privacy
issues fixed is not ontechnical disagreements or
choosing the most efficientsolution.
It's more about scaling.
In large organizations, privacyengineers cannot fix everything
or even have the bandwidth totrack all privacy issues across
the board.
The first thing to do would beidentifying key engineering

(41:41):
leaders (those who haveauthority) and get them to
nominate some 'PrivacyChampions' to get those issues
fixed.
Typically, privacy championswill be senior engineers.
They know the specific of theirproducts' architecture.
They know who covers what; whoowns which technical feature;
and they will also have someauthority to get some resources

(42:03):
to get things fixed.
They will also send you asignal if they discover privacy
vulnerabilities at the systemlevel.
After securing resources andgetting some champions, you
should talk with the team, trainthem and together identify a
plan to address high-risk issues, dependencies, and then have

(42:23):
some estimated time to fix theseissues.
You won't be able to fixeverything immediately, so think
of privacy debt.
Another thing to consider isbeing transparent in efforts to
the higher ups.
Show them what is pending, whatis fixed and what are the
blockers; and later, use theselearnings and then train your

(42:46):
engineers better so theyactually address these issues
early in design - so that youdon't have to do this all the
time.
So, focus on privacy- by-design, not on privacy- by-
remediation.
This will be a continuousexercise.

That's really, really great advice.
Stefano, what is yourperspective?

Yeah, it's very aligned.
I really like the PrivacyChampions approach, and I want
to add that it doesn't have tobe only engineers.
It can also be Product Managersand other kind of figures.
I say this because privacyissues are ideally resolved
proactively, even before theysurface, so even at the product
development before developmentstarts.
So, privacy engineers need tobe involved in product design

(43:28):
from the start, while there isstill room to adjust the product
architecture, and before therequirements are finalized.
So, in this context, the mosteffective approach is to
addressing privacy concerns isto establish a comprehensive
Privacy Engineering program andshift privacy left.
So, as Engin said, privacy bydesign.

And we all know that I'm for 'shifting privacy
left,' even naming my showexactly that.
So, we are definitely aligned.
Okay, before we close, do youguys have any last words of
wisdom for the audience as weclose the conversation, starting
with you, Stefano?

Yeah, I have a message for my fellow Privacy

Engineers (44:04):
do not overlook the less technical aspects of the
work, as they constitute thefoundation of privacy- by-
design.
Your objective should be toestablish a robust privacy
process deeply integrated intothe product and development
lifecycle.
You need to collaborate closelywith product and engineering
teams, obtain the supportthrough privacy education and
incorporate their feedback intothe process.

(44:25):
This approach ensures that theprocess aligns with these teams
rather than working against them.
A strong privacy- by- designprocess also enables the early
detection and mitigation ofprivacy concerns, starting from
the conceptualization and designof the product.
This facilitates the seamlessintegration of privacy and asset
technologies into the productlater on, when development

(44:46):
starts.
Moreover, product andengineering teams can actively
become participants in theprivacy- by- design process by
identifying and reportingprivacy issues themselves.

Great advice.
Great advice! Engin, what'syour great advice?

I think it's simple.
You should really have somepassion for privacy in order to
sustain in this field.
The field itself is not 100%mature, such as security, and it
is ambiguous.
There is often no one rightsolution.
People often ask me.
"how do I become a PrivacyEngineer?
I just got my CIPT or myMasters in Cybersecurity.

(45:23):
There are different paths, aswe discussed, but again, the
passion will help because therewill be a lot of things to
consume.
There are new articles beingpublished on the AI Act,
generative AI architecture, suchas transformer models,
homomorphic encryption,generative AI governance.
You need to keep up- to- datewith this material.
That means you'll have to do alot of reading, and to do such

(45:46):
reading, you really need to lovethis field.
I know people who've been hereover a decade and they still
ping me at night with a newarticle published on privacy
threats.
You need to have this passion.
I've seen some folks whoentered the field because it was
a hot topic, there was a lot ofhiring, but they left in a
couple of years because learningjust got too tiring for those

(46:09):
folks.
That learning should not be aburden for you.
It should be excitement.
If you like it, you shouldcarry on.
If you don't, you have to havea reflect on what you want to do
.
Maybe you want to stay in theprivacy domain, but not as
Privacy Engineer.
Maybe you want to do somethingelse, but passion and love for
privacy is the key.

Yeah, I think that makes a lot of sense.
I obviously have the passion,but I think we're constantly
drinking from a fire hose.
Even though people, even thoughprivacy engineers, might love
their work, I do want to saythat it can be exhausting to
continue to drink from an everchanging fire hose of news and
changing requirements and caselaw and just having to put it

(46:46):
all together.
I hear what you're saying.
I also get a lot ofsatisfaction out of putting all
those pieces together and thenalmost trying to solve, as if
it's a puzzle.
How do you get to the rightoutcome for an organization
based on its business needs andthen the needs of effective
privacy and putting thosecontrols in place?
So, I think that that makes alot of sense.

(47:07):
I do think that people can getreally burnt out if they don't
want to continue to constantlylearn new things.
This was really, really a greatconversation.
I think that my first interviewhaving two guests has been a
success, so thank you for beingmy guinea pigs.
I think this episode is goingto be really well- received
because it's a lot of meaty,info from the front lines of

(47:30):
being a Privacy Architect and aPrivacy Engineer in actual job
roles and not just talking aboutprivacy engineering abstractly.
I just want to thank you somuch for joining us today on
Shifting Privacy Left.

Thank you, Debra.
It was a lot of fun.

Thank you, Debra.

Excellent.
Okay, Until next Tuesday,everyone, when we'll be back
with engaging content andanother great guest (or guests).
Thanks for joining us this weekon Shifting Privacy Left.
Make sure to visit our website,shiftingprivacyleft.
com, where you can subscribe toupdates so you'll never miss a
show.
While you're at it, if youfound this episode valuable, go

(48:10):
ahead and share it with a friend.
And, if you're an engineer whocares passionately about privacy

, check out Privado (48:15):
the developer-friendly privacy
platform and sponsor of thisshow.
To learn more, go to provado.
ai.
Be sure to tune in next Tuesdayfor a new episode.
Bye for now.

All Episodes

Episode Transcript

Popular Podcasts

Dateline NBC

24/7 News: The Latest

Therapy Gecko

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}S2E25: "Anonymization & Deletion at Scale" with Engin Bozdag (Uber) & Stefano Bennati (HERE)