S2E33: "Using Privacy Code Scans to Shift Left into DevOps" with Vaibhav Antil (Privado)

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Vaibhav Antil (00:01):
Our goal is that if we do all of this work well,
you start with the place whereyou already have the data map.
Once you have the data map, youactually can see the risk
really quickly and you can thenyou can do the good work; you
can go take that risk to theEngineering Lead or the
Developer and say, "Hey, thereis a risk and we want to reduce
it.
There are four ways how we cando it, and let's work on that.

(00:22):
" So you save all of the work,all of the time you're spending
in manually building these dataand you get all of that time
free.
And, you spend all of that timein actually reducing the risk
and actually putting a privacyticket on the Jira board, which
gets shipped.
So, the risk is less.
Your users actually see it.

Debra J Farber (00:46):
Hello, I am Debra J Farber.
Welcome to The Shifting PrivacyLeft Podcast, where we talk
about embedding privacy bydesign and default into the
engineering function to preventprivacy harms to humans, and to
prevent dystopia.
Each week, we'll bring youunique discussions with global
privacy technologists andinnovators working at the

(01:06):
bleeding edge of privacyresearch and emerging
technologies, standards,business models, and ecosystems.
Welcome everyone to ShiftingPrivacy Left.
I'm your Host and residentprivacy guru, Debra J Farber.
Today, I'm delighted to welcomemy next guest, Vaibhav Antil
(also known as Vee), the CEO andCo-Founder of Privado, the

(01:30):
privacy tech platform that'sbridging the privacy engineering
gap, and sponsor of thispodcast.
Welcome, Vee!

Hi Debra.
Hi everyone.

Before we start, I want to say a few things to
you, Vee.
I've had the privilege ofserving as a member of Privado's
Advisory Board, along withNishant Bajaria, for the past
two and a half years, advisingon go-to-market messaging,
thought leadership and ShiftingLeft evangelism; and it's been
such a wonderful ride.
I even remember when I was onvacation in Belgium a little

(02:00):
over a year ago and received anemail from you, Vee, asking me
whether I'd consider launching apodcast that's focused on
privacy engineering becausePrivado wants to sponsor it.
I could not have been moreexcited.
That really kicked off andstarted my creative journey with
the Shifting Privacy Leftpodcast, and it's enabled me to

(02:22):
bring this content for a growingniche audience, and I just want
to take this time to publiclythank you for believing in me
early on and supporting thisreally important educational
initiative.

I'm glad we could work this out and be sponsoring
the podcast, and it's beenamazing in creating such great
content.
I especially love a couple ofepisodes, and with so much
diverse opinions and experiences.
So, I learned a lot from thispodcast and we've been, you know
, super excited sponsoring itand continue to do so as well.

Thank you so much.
It really means the world to me, and I can coo and talk about
how much I love Privado and youand the entire team, but I think
people probably want to hearabout the topic of the day; and
that's several things.
We're going to be talking aboutprivacy code scanning; what
'shifting privacy left' means,especially into DevP rivO ps;

(03:18):
and then, the importance ofbridging the privacy engineering
gap.
So, we'll just kick things offby me asking you a little bit
about Privado and how youdecided to focus on code
scanning for privacy and solvingprivacy engineering challenges.

For the audience who might not be aware of
Privado, Privado is a platformwhere privacy and engineering
teams collaborate.
They come together to ensurethe products the companies are
shipping are built with privacyfrom the get go.
I would imagine the same rolein a company is played by a
privacy engineer or anotherprivacy professional who's
acting like this connectivetissue or who's bridging this

(03:57):
gap between privacy andengineering teams.
I would like to think Privadois the co- pilot for this person
, this Privacy Engineer or thisprivacy professional who's
working with engineering teams,whose core job is to work with
engineering teams.
Essentially, my journey intoprivacy has been different.
I mean, at least this isprobably the most interesting

question I ask everyone (04:17):
how did you end up in privacy?
And, everyone has differentanswers and different paths.
So, my path came in from aproduct management role.
I was a product manager for anonline music streaming company,
and I had seen the world wherebusiness goals as a product

manage are (04:35):
revenue, active users, engagement were super
critical.
I had this deep empathy forengineers - developers who are
building these products -because they were kind of the
centerpiece of shipping thesefeatures fast, which could, move
the revenue numbers, theengagement numbers.
They could literally do magicby building these features.

(04:56):
While I had this experience, Iended up getting involved into a
privacy project with the legalteam there and kind of saw the
other world - which was aprivacy world.
I felt that was prettyexciting, as well, because they
were trying to protect the userdata, do right with the user,
which as a product manager,whatever you're building, you
want to do.
You want to make sure thepromises that you are making

(05:18):
kind of consistent and the endusers' privacy is protected.
That's kind of how I gotintroduced to privacy.
When I started Privado.
.
.
actually, the first thing I didwas I tried to be like a
privacy professional.
I said "he first thing I'mgoing to do is do a data mapping
project for an e-commercecompany and I still remember
that was.
That was a sizable company, wasa large e-commerce company and

(05:40):
I probably interviewed about 80people.
That was a mix of ProductManagers, Engineers, Engineering
Leaders, Data Managers, DataEngineers, and the project was
this crazy project to map in anExcel sheet the microservices,
the data pipelines, all theproducts they have, and how data
comes into the company andwhere all it goes.

(06:01):
I still remember, I did thishuge exercise - probably took
like two and a half, threemonths - and I went to my CTO
and said, "Hey, look at thissheet and this is basically
around which we have to buildthe product.
At that point of time, I wasstill thinking more assessments;
that was kind of the wayprivacy teams used to get this
data, which I got via interviews.

(06:23):
But, when my CTO looked at it,he kind of had a different idea;
and he basically said to methat there were probably like
30%- 40% of this information wecan get by scanning the code and
the rest - something we mightstill have to do manually - at
least it'll take you that far.
That led me to a realization;that was kind of our spark or

(06:44):
our Eureka! moment where we'relike, "Hey, nobody is kind of
doing this at the moment.
If we attack the problem fromcode scanning, it's a very good
experience for the privacy teambecause they get to this
information super fast.
For the engineer, it's a reallysuperior experience because
when you're talking to them, thebiggest problem was not
understanding what a lawyer issaying.

(07:06):
So, they get to see the samethings.
They get to see data in aformat they understand.
" And, we were early to market.
So, that was exciting to me aswell.
So that's how privacy codescanning got started, with this
huge Excel RoPA data mappingexercise; and now, of course,
the vision has evolved.

(07:28):
It started with doing RoPA,then data mapping (very, very
small), and as we got somecustomers and started working
with them, now of course, ourvision is to actually be the
single platform where privacyteams and engineering teams will
collaborate and building aplatform that engineers love and
not hate, which actually getsadopted, and the privacy risk

(07:49):
goes down.
You could get more privacyissues, reviews; you can get
some privacy functionalities aspart of a Jira board.
So, kind of enabling theprivacy function, and at the
same time making sure developersactually like the experience
and they're engaged, as well.
So, that's a little bit abouthow we started with code
scanning and where we are in ourjourney.

Thank you for that.
That's really helpful.
I know that the mantra forPrivado is 'bridging the privacy
engineering gap.
' Can you talk a little bit moreabout this gap?
What do you mean by it?

I think this is something at least as we started
, or as the initial concept ofdoing a code scan happened, we
started speaking to privacyprofessionals in Europe and the
U.
S.
I remember that we used to haveslides saying where we were
trying to put on some challengesthat they face, and one of the
things we had was 'Isengineering a black box?
' Engineering is not a black box.

(08:46):
We still know things, but thereis a huge gap in our
understanding.
What we're trying to ask, theengineers don't understand what
they tell.
Sometimes we don't understand,and that's how the full concept
of this privacy engineering gapcame in.
I also, at that point of time,read a blog post - I think it
was on Adobe, where it wassomething like 'Privacy
Principles Are from Venus;Engineering Requirements are

(09:09):
From Mars.
' S omething similar.
So, a couple of things cametogether while we were thinking
about this.
But broadly, if you look at thereason on why there is a gap, I
think there are three bigreasons.
1) I think the first big reasonI've seen is there is this kind
of difference in the approachof how these two functions work.
So, kind of on the stakeholderlevel, right at the top, there's

(09:30):
a misalignment.
If you look at privacy, privacyis a very central function.
Right?
There's a central team who'strying to make sure use of data
is consistent with the privacypromises made; or data actually
gets deleted; or you'recollecting less data.
Right?
All the nice, great privacywork, but that's done by a
central team and they have thecentral view of things.

(09:52):
So, look at engineering; itbasically evolved from the day
where it was central to superdecentralized, distributed teams
.
Right?
You have a small team of aProduct Manager ,and a couple of
Developers, and one Designer -sometimes just engineers who are
experimenting and pushingfeatures out, which means on one
side, you want the central viewof data and privacy; and on the

(10:13):
other hand, you have thisdecentralization: where they are
making decisions of data,architecture, tooling, decision-
making on their own.
So, that's the one big gap thatwe see is like just the way
these two functions.
Things are different.
2) I think the second relatedgap here is also the speed.
Engineers are kind of shippingthings on a weekly, sometimes

(10:35):
daily, basis; but, for anyprivacy teams, to catch up here
becomes a big problem, and thetooling hasn't caught up.
So, that's another conversation.
So, I would say the first bigproblem I see is, overall, how
these functions are structuredare slightly different.
I would imagine it's the samething with security as well, but
they have better tooling atthis point to tackle all of
these challenges.
The second thing I see is whichbecomes a problem is lack of

(10:58):
shared vocabulary.
So, it's kind of like theproblem is that none of these
laws is descriptive; everythingis written in a more principle-
based way.
But, when you go to theengineering team and say, "Hey,
you have to delete data, itmeans something.
Or actually, let's take anon-privacy example.
As a Product Manager, I wouldgo and say, "hey, our website

(11:19):
has to be fast.
It doesn't mean anything rightnow, but for an engineer it will
mean something when you talk inseconds.
"Okay, what does it mean?
How far should the page load?
What is fast?
Is it 10 seconds, one second,micro second, or you can have.
You can say the same thingabout the specific API that, as
a product manager, I own.
Like you know, the responsetime has to be X.

(11:40):
It has to be available for Ytime.
You cannot go and say our APIhas to be available.
" there has to be some numberwhich makes sense for the
engineer.
So now, once you look at thislens and go back to the privacy
laws, they are not prescriptivein that sense; and hence it,
becomes very difficult tooperationalize privacy.
3) So, you have this gap wherethey're not speaking the same

(12:03):
language.
The law itself is notprescriptive; and hence, it
leads to the friction betweenthe two teams.
And a combination of this, Ithink, finally leads to a point
where you might start with avery engaged privacy and
engineering team; but, asengineering teams look at that,
they did a bunch of assessmentsand questionnaire and they did
not get any privacy request toimplement in the product.

(12:27):
It leads to a point where itlooks, to Engineering Teams,
that privacy has become ablocker.
And, engineers are smart, sothey try to then find ways
around your DPI or PI process,which means you have now even
more risk.
So, I feel like these are threekind of structural things
because of which there is thisgap between privacy and

(12:48):
engineering teams.
100% solvable; that's what Ithink any privacy person working
closely with engineers, or atypical privacy engineer ,is
trying to do as well.
That's what we, as Privado, aretrying to solve, as well.

That makes a lot of sense to me.
One other area that I think isjust very, very different
between engineers and theprivacy function that comes out
of Legal / GRC traditionally isthat lawyers don't want risk
documented where there areunknowns and you don't know how
to solve it.
There was a time, for instance,that I was working on a major.

(13:21):
.
.
a major breach had happened ata company, at a retail company;
and I was one of thesubcontractors working on that,
and we needed to document the'as- is' state for privacy and
then come up with the 'tobe'state for privacy.
So, that involved lots and lotsof interviews with various
types of people, includingengineers and the attorneys on

(13:41):
that case didn't want anythingthat was put in discovery.
.
.
they didn't want anything thatcould be like "You did things
wrong and now it's at trial andcould be part of the discovery
process, so they did not want usto document any of the risk.
So, as a result, they put allof these attorneys in our
meetings to make sure that thosemeetings were covered by
privilege, right?
I mean, how different is thatfrom engineering, where it's

(14:03):
like let's get to the meat ofthis problem.
Let's root cause it.
Let's document and make surethat we address it and fix it.
Right?
Right"And so those twodifferent perspectives have been
a real friction point and Ithink I definitely think
something like Privado makes itso that the Legal and the GRC
folks can have insights derivedfrom all the great work that

(14:23):
code scanning does and the datamapping and stuff that you do;
but also give the engineers thegranular data points, so that
they can take action to addressprivacy risk.
I just wanted to tell thatlittle story because I
definitely think that that's awide gap between Engineering and
Legal.

However, what I've been seeing in the industry
for the last couple of years is.
.
.
one of the good things that allthese regulations have done and
also, I think, in general userbehavior change and how much
they care about privacy - isthat companies have realized
that this work is not goinganywhere.
So, while maybe the earlierapproach might be to document

(15:02):
less, I think companies slowlyare realizing that they have to
do the work.
Once you have to do the work,everyone is, I think, reaching
the same conclusion actually, isthat we will need to finally
take these legal policies andtranslate something actionable
for the engineers where thingshappen right.

(15:23):
That's the reason why more andmore companies have now a
Privacy Center and more is beinglaunched.
So, they have an open privacycenter.
A very simple example is to say"Hey, send an email to this
person for your DSR to kickstart.
So I think, as companies haverealized that there's no chance
that DSR is going down orthere's no way that they can go
away with features that haveless privacy, they're now making

(15:45):
it more formal.
A consequence of that is youwill see more Privacy Engineers
being hired in new companies.
Companies with existing PrivacyEngineers are starting up more,
as well.
So, I do think that's a bigtrend shift which has happened
in industries that companieshave come to that conclusion,
and they are now staffing up,building processes, and buying
tooling for the exact samereasons.

Awesome.
So, how can engineers providevisibility into the personal
data that applications arecollecting and using?

Unfortunately, this is kind of a very tough
problem.
Again, let me take a step back.
If you talk in generalities -nothing related to personal data
- I think one thing that peoplehave been trying to do, or were
trying to create another kindof documentation, is network
diagram, architecture diagrams;which a lot of engineers create

(16:37):
as part of their requirement.
Maybe it's part of theircompany's policy or they have to
do it because it's a salesfacing document.
It's very difficult to even getthat updated, which has very
high motivation for theengineering teams to maintain or
accurate version of it, right,they'll see like all those
documentation get outdated aswell.
The same is true for theirservice names.
Right?
Nobody has a clear idea of"Okay, I have built this product

(17:01):
, which is basically thisbackend microservice," and then
over time, a lot of othercomponents in engineering starts
to interact, like your websitedashboard and some other backend
systems, and then someone makesa change in this specific
component and 10 things break oneach side.
Essentially, I mean, if I takea step back and look at it,
documentation of any kind istough when things are

(17:24):
decentralized and are changingfast.
That's where the engineersoperate, so they have a product
or an application which wasbuilt by let's say, if you start
to do it today, built by someengineers who are not in the
company.
It probably has hundreds ofthousands to millions of lines
of code, which means there's nosingle person who can actually
tell you what is in there.

(17:45):
What they can do, is they canat least give you a good
starting point.
I think engineers can give youa good starting point.
The biggest gap I see there iswhat a privacy person means
'personal data' is; thedefinition is not the same for
engineers.
They might be thinking PII,like email address; and as you
start to have thoseconversations, they will give
you more information like "ohyeah, we do have clickstream

(18:07):
data that we collect.
Oh yeah, we do collect auditdata.
Is that even personal data?
So I think it is superchallenging.
But current engineers can giveyou at least a starting point on
what data are they collectingand using.
Directionally, you will get theright answer, but an accurate
representation, depending onwhat stage you are.

(18:27):
.
.
if you are at a decent stage;which means your company has
been there for a while; whichmeans there are multiple people
who built the product; and whichmeans it'll be super complex.
At this stage, I would recommendto do more automation.
One type of automation is doingcode scanning.
Another automation, or actually, even if you don't want to
automate, you can just look atyour product, go to the website,
try to use it and say, "Oh, Isee one data element which is

(18:49):
missing, I see another.
" Do it in a session, and youwill actually uncover even more
data elements.
So, a session where you can gothrough the product and then ask
questions will give you moredata elements than just sending
an Excel sheet or assessment;and ideally, what you want to do
is generate thesedocumentations from automated
tools.
I believe privacy code scanning- of course I'm biased - is one

(19:11):
of the better ways to do thatbecause it's generating the
document from the source oftruth, which is the code; and it
actually enables engineers.
In fact, to that point, theGoogle Play Store came up with
this requirement to generatePlay Store Data Safety Report.
We actually open sourced, andmade available, a free tool - it
is still available - calledPlay Store Data Safety

(19:32):
Generator; and what it does.
.
.(and it got downloaded, Ithink, thousands of times) where
Android engineers came in,downloaded the app, ran a scan
on their Android application,and it prefilled nearly 80% of
the Play Store Data SafetyReport.
At the end of the day, theyfilled out the rest of the stuff
, downloaded it, and put itthere.
So, that's another example of acode scanning approach, where

(19:55):
you're using some automation togenerate these documents
actually help.
So, again - it's a long answer,but to summarize, use automated
tools.
Your engineers will thank youfor that.
If not, do a session with them.
That works as well.
I did that when I was trying todo it the first time.

That's definitely is easier to do when
you just have maybe one productor team.
But, if you're like a privacyperson that comes in as a
consultant, that's like "oOkay,we've never done this before.
Go and get data maps of all ourproducts.
Have all of these interviews"-I mean something I've done
before; it is a massive, massiveeffort.
So, I definitely thinkautomation comes in handily that

(20:32):
way.

Just to add one thing - one of our core
motivations is, at the end ofthe day, as privacy
professionals,as privacy engineers, you really
want to go in and buildsomething in the product or
remove certain flows whichshould not be there in the
product.
So, you want to reduce privacyrisk, and you want to shift
privacy features; but, themajority of the work is spent in

(20:54):
collecting data, is askingpeople questions, is building
this data map, which I feelshould not be there.
Our goal is that, if we do allof this work well, you start
with the place where you alreadyhave the data map; and once you
have the data map, you actuallycan see the risk really quickly
and you can then do the goodwork, which is, you can go take
that risk to the EngineeringLead or the Developer and say,

(21:17):
"hHey, there is a risk and wewant to reduce it.
There are four things how wecan do it and let's work on
that; but you save all of thework, all of the time you are
spending in manually buildingthese data maps, and you get all
of that time free.
You spend all of that time inactually reducing the risk and
actually putting a privacyticket on the JIRA board, which
gets shipped.

(21:37):
So, you know the risk is less.
Your users actually see it.
" I would say that's thestrongest argument I have
personally seen for automationis you you free up time which
can be used to do better things,for things which can impact the
user privacy immediately.

Exactly, I totally agree.
In fact, I was going to askyou, why is shifting privacy
left into DevOps important toyou, but I think you kind of
summed up some of the reasonswhy.
I want to give you theopportunity to add anything.
.
.
to answer the questionholistically: why is shifting
left important?
You could shift left intodesign, shift left into
architecture.

(22:14):
Privado focuses on DevOps, so Iwanted to highlight that and
get an understanding from you.

Yeah, sure.
I think one of the great thingsthat has happened in
engineering is people have movedto this agile development.
A part of it is also continuoussoftware delivery, which means
you are continuously shippingfeatures out.
You're fixing bugs super fast.
For the end user, it's amazingbecause, let's say, our own
customer, we find a bug today;we can literally fix the bug

(22:42):
and, in the next 10 minutes, itgets deployed.
So, companies have built thisamazing tooling and
infrastructure for engineerswhere they can build features,
release them, and they go liveat the customer level super fast
.
Because of cloud and bestDevOps practices, it's possible.
So, a lot of investment hasgone into this.
How can we actually build fast,ship fast so our end users can

(23:06):
get to see these features fast?
Once you wear this hat and say,"okay, my company is going to
change things super fast, right,they're going to change things
every week, things are going toget shipped every second.
" Then, you can't have a privacymindset, which is thinking
weeks or months, which istypically like a privacy impact
assessment or a data protectionimpact assessment At the design

(23:28):
stage you do want to spend timebecause the feature has not
started to be developed.
So, you can spend that time anddo technical privacy reviews,
etc.
But, once all of that is done,there is huge work which doesn't
come into that regular designcycle.
It goes through this continuoussoftware delivery model where
in engineers are writing code,they're pushing code, things
don't remain the same from adesign to an implementation

(23:49):
phase, and that's basically amissing gap in the current
privacy programs.
You don't really have anoversight there, and that's what
shifting privacy into theDevOps process means.
It means, as a privacyprofessional, you have a privacy
check as part of this DevOpscycle, which ensures the new
code changes which are happening, the configuration changes

(24:10):
which are happening, a newdatabase being created, a new
third- party being added, a newpixel being added to your
website, is going through thisprivacy check.
And then, this privacy check islooking at it and saying, "okay
, this is a high risk privacyissue, we have to block it.
It needs to be fixed before itgoes live.
Or it's a low risk, we can letit go live, but it creates a

(24:31):
ticket for the engineer to fixlater.
Shifting privacy left, or thisfull DevPrivOps is kind of a
methodology where privacy isgoing and saying, "hey, we have
this, we shifted extremely leftto the design phase.
We're right at the inception ofa new feature.
We are doing privacy reviewsand thinking about new products
from a privacy perspective, bothon the legal side on and on the

(24:53):
technical privacy side.
So they're going from that tosaying, okay, but we also need
to solve things at scale.
We also need to acknowledge thefact that our engineering
workflows are at a daily, weeklyspeed and we need to have
something on the privacy sidewhich matches that as well.
"Another way that I can
articulate it how a PrivacyEngineering Leader told me - the

(25:17):
way they like to think aboutthis DevPrivOps or shifting left
is a lightweight, consistentgovernance; wherein, instead of
looking at 90 things, you lookat 6 things and the rest of the
84 things are automaticallychecked.
So, it's lightweight; it's notadding a lot of time or a lot of
commitment for the developers.
It's consistent.
So, it checks all the time andso you are assured of whatever

(25:40):
is going out to production, toyour end user, as less privacy
risk.
That's the philosophy ofDevPrivOps or DevOps.
Yeah, shifting left just meansgo earlier, find things earlier
so there are fewer things thatyou're finding in your products,
which have gone live.
Ideally.
.
.
the most common one I've seenwith our customers, which I

(26:00):
never thought would be, is theyfind these cookies on their
website.
You would imagine that it wouldbe solved, but if they have a
large front end team, they'llfind cookies on their website
that are first- party cookies,and they're not easy to find who
actually added that.
Right?
Because there could be 20- 30front end engineers working on

(26:23):
the same website or samedashboard.
So, third- party cookies areeasy to track.
It's generally the marketingteam or analytics team, but
first- party cookies could besome part of the engineering
team.
So, that could mean the momentan engineer decides to add a
cookie, it creates an issue andalert for the engineer to say
"you have to onboard it to ourCMP" - whatever consent

(26:43):
management platform you use -and then, it gets shipped.
So, you're not chasing theentire 40 member team when you
discover it later in your cookiescan, when it hits actually
production.
That's shifting left in theentire DevOps world.

Awesome.
Before code is ever shipped toproduction.
So, 'fix it before it's inproduction' is a great mantra,
in my opinion.
Let's talk a little bit aboutprivacy code scanning, since
that's the crux of what Privadoprovides.
Let's start with what is aprivacy code scanner and how
does it differ from traditionalstatic code analysis tools in

(27:17):
security?
And maybe, it's also a goodtime to bring up the difference
between static code analysis anddynamic code analysis to
differentiate from what youdon't do.

Yeah, sure.
Think of a privacy codescanning as.
.
.the technology itself ofscanning the code to look at
patterns is super old.
I t started with saying, "okay,our developer is writing code in
a way which is consistent withour coding practices" to then,
you know, looking for licensecompliance issues in open source

(27:47):
packages, to the securityscanning tools that you set,
which is looking at, "Hey, do wehave code written that a hacker
can attack?
You know, like, can they?
Can they use this weakness inour application code to attack
and get some data out?
And so the concept is the same.
Where you are.
.
.I mean, the technology islooking at the code, scanning it
and then giving you out someinformation.

(28:08):
In that way, the approach -that technology - is old; we
just applied it for privacy.
What it really means is.
.
.
essentially, a privacy codescanner is going to scan your
code of anything your engineerhas built and build a data flow
graph out of it.
It has to basically give you acouple of data points.
It has to tell you what data iscoming into this application.

(28:29):
What's coming in?
What happens with that once itcomes in?
Where it flows?
Where is it going out - whichmeans it could go out to a
database, it could go out to acookie, it could go out to a
third party, it could go to yourown internal infrastructure,
your own internal backendservice wherever it is going.
.
The crux of a privacy codescanner is to build a data flow

(28:49):
graph, or a data flow diagram,of your specific application
that you're scanning and withthe focus on all life cycle:
collection, storage, sharing,logging, transformation,
processing, all of it.
Once you have this basic graphdone, then on top of it, you can
do a couple of interestingthings.

(29:10):
Once you have that, you cangenerate a RoPA report.
You can build a data flowdiagram for your threat modeling
.
Once you have all these flows,then you can leverage something
like a policy or a rule engineto say, "show me all
applications where sensitivedata is being shared with third-
parties," or show me allapplications where we have a

(29:30):
pixel, or show me allapplications where there are
cookies Any of these things.
You can do so in a simplest way.
Privacy code scanning - think ofit as something which will scan
the code of the application andbuild a data flow graph or a
diagram out of it, focusing onspecific lifec ycle cycle of
data coming in, which iscollection, and data going out,
which could be sharing, storage,logging, processing in general.

(29:53):
Once you have that, it shouldenable you to do all the core
privacy use cases of RoPAdocumentation, privacy by design
, finding privacy issues, and soforth.
The main difference is that asecurity code scanner will not
care about all of this.
They don't care if theapplication is going to process

(30:14):
personal data.
If it has a securityvulnerability, it has a security
vulnerability.
On the other hand, we do careabout that; we only care about
that.
We don't care about if it has asecurity vulnerability.
We only care about whether weaccurately build this data flow
graph.

That is really helpful.
I think it would be even morehelpful to the audience if you
could differentiate how doesprivacy code scanning and the
data mapping capabilities ofPrivado.
.
.
how is that completely different(because, I know that it is)
from the traditional tools outthere that do personal data
discovery, correlation, datamapping - companies like BigID,

(30:51):
Secuvy.
ai, Securiti.
ai, companies like that.

Yeah.
I would say, if you look atdata discovery in general, it's
basically discovering data atrest.
What you're trying to say is,"hey, as a company, we've grown;
and we continuously collecteddata, and then we kept on
storing it, and now we don'thave a good handle of where all
of that is stored; which, from asecurity perspective, is
important to secure the data.
From a privacy perspective,it's important because you need

(31:17):
to delete the data pieces, doretention, et cetera.
So, a data discovery toolfocuses on how quickly, at scale
, with a good performance, I cango in and tell you where
personal data is, in which datastore, in which file, in which
SaaS app.
So, that's looking at data atrest.

(31:39):
Now, the reason you need codescanning, even if you have a
data discovery tool ( especiallyif you have a data discovery
tool) is that a data discoverytool only gives you the picture
of what data you have at rest.
But, privacy is all about datause.
It's about did you collect moredata than you were supposed to?
How did this data get used?

(32:00):
A famous example was, a companywas not fined because their
phone number were breached orleaked.
The company was fined becausethey collected phone numbers for
authentication purposes, butwere using them for advertising.
That picture of how data getsused is actually in the code.
It's actually in how you builda product, how you build an
application.
It's as simple as that.

(32:20):
When you're doing a PIA, you'redoing a PIA, not on the
database; you're doing a PIA ona processing activity or a
product function, or a product,or an application.
We are doing it because youknow privacy is about the full
use of data - the collectionpoint, the sharing point - and
storage is one part of it.
So, that's why you need to scanthe code to get an automated
understanding of how my datagets used.

(32:42):
"Okay, I know it's in thisdatabase, but do you know which
two products are connected tothis one database?
Do you know how that getsshared to a third party
eventually?
All of that information iswritten in code by engineers,
version controlled andmaintained, and that's why you
need a code scanning solution.
That's one answer, which ismore around looking at the

(33:04):
lifecycle of data, and theirdiscovery solution only captures
storage, whereas code scanningsolution captures the complete
lifecycle, especially if you'relooking at engineering and
products.
The second argument on wherethese tools differ is the data
discovery tools are looking atthings when they've already
entered your system.
So, they've already collectedthe data.

(33:24):
It's already in your system.
That's when you're discovering"okay, yeah, now we have a new
database which has these manydata elements, right?
Or we have this new file.
" the code scanning solution -because they're super early in
the lifecycle; we must shiftleft - this is when engineers
just write in code to collectphone numbers for the first
time, or precise location forthe first time.

(33:46):
You have not really pushed thisto your production, which means
it's not yet live.
No customer data has everflowed through that code.
You can actually scan anddiscover all of these things at
that time, which means if youwant to get out of the cycle of
a technical privacy debt, as youmove left, through code
scanning you can prevent some ofthese things.

(34:06):
You can basically ensure that ifyou were not supposed to
collect precise location, you'renot collecting it, which means
you have to do less scanning onthe right side, where data
discovery tools sit.
You can also enforce betterprivacy standards, privacy
engineering standards.
When someone is creating adatabase, attaching it to your
application code if thatdatabase table has sensitive

(34:28):
data along with generalidentifiers, you can tell them
not to do that.
If they are going to share datawith third parties, that's
something you can catch earlyand fix as well.
I would say these are the twomain differences.
One is from a coverageperspective - you do want to get
clear line of sight into datasharing and data use, which is
only available in the code.
It's not available at the datastorage layer.

(34:50):
And I would say, second ismoving left to fix and prevent
things - find, fix, and preventthings rather than fire-fighting
on the data side only.
That's how privacy codescanning approach differs from a
classic data discovery approach.

I also want to highlight something.
I am both an Advisor to Privadoas well as an advisor to Secuvy
because I definitely think thatthey are separate products that
address separate issues.
For companies today that arelike w"We need to know where our
personal data is.
This wasn't a thing we neededto do and, in the big data era

(35:28):
of collect everything, figureout a use for it later.
It's sprawled across ourenvironment.
It's structured, unstructured,semi-structured data stores.
Where is it?
Because you can't protect anypersonal data if you don't know
where it is.
You can't delete it if youdon't know where it is.
You can't provide it tosomebody and show them what
you've collected about them ifyou don't know where it is.
And so, I think it isabsolutely important for

(35:51):
companies in that space to use adata discovery, correlation,
classification mapping tool.
But, as you said, I want to seea world where everyone's using a
privacy code scanner likePrivado's, where you get your
engineering best practices putin place.
You embed privacy into thesoftware development lifecycle,
which is not the BigID, Secuvy,Securiti.

(36:13):
ai tools are doing.
They're looking more at thepersonal data lifecycle rather
than embedding into the softwaredevelopment lifecycle.
They are very different toolsand I definitely think companies
can use both, obviouslydepending on the maturity and
whether or not you have theright resources to manage these
tools.
How does privacy code scanningenable engineering teams to stay

(36:34):
compliant with new laws?
I know you mentioned thatbefore, but let's take, for
instance, my state - WashingtonState's My Health My Data Act,
which grants consumers the rightto access, delete, and withdraw
consent from the collection,sharing and processing of their
health data.

Yeah, absolutely.
In fact, the My Health My DataAct is a classic example of
where you do need to move tocode to actually comply with the
law, because it focuses on acouple of things.
One of the main things it doesfocus on is use of data, like
purpose of why you've collectedit and how you're using it.
So that is one.
Sharing of health data is themost critical part there, and

(37:13):
both of these things are in thecode.
So, take a health app.
The first thing they would wantfor them to comply as a privacy
or an engineering team.
.
.
they will be asked "what healthdata you collect and who you
share it with and how you use it.
With Privado, you can basicallyconnect in five minutes.
We'll run a scan on all yourcode repositories and give you

(37:33):
one single view of all thehealth data you're collecting;
where that health data isflowing; and which products are
using this health data, whichmeans it solves all the three
use cases of My health My - youyou can, to do your consumers,
consumers confidently say"Thisthis is the health data that we
collect, this is who we share itwith.
These are the service providers.
If you are sharing it for anyadvertising reason, then you

(37:56):
have to take authorization; andthis is how all use cases we are
using the health data for.
So that's number one.
So, you get to compliancefaster.
As I said, you don't have tospend months in figuring out
this information.
You can fix things ratherremove certain health data flows
.
I think the second mostimportant thing with the My
Health My Data Washington Act is, if I remember correctly,

(38:21):
there's a class action lawsuit,which means there is a very high
onus on companies to preventprivacy issues from going to
production.
This means you do need a policyin your DevOps system, in your
SDLC, looking for privacy issues.
So, something needs to check forsaying, "Hey, I pushed this
code, I was using GoogleAnalytics, but then I,

(38:43):
incidentally or accidentally,actually started sending health
data to Google Analytics becausethe page name has it or the
search query had it, or I wasintending to send only the user
ID, but I sent this full objectwhich has the health details as
well.
That's the second part.
So one is like the initialcompliance, which is your data

(39:03):
maps, faster.
You get the entire picture ofhealth data collection, sharing
,and use - faster.
And, second is guardrails -having these automated privacy
checks for My Health My Data onbehalf of privacy teams, which
are not only looking for issues,but also providing a guidance
to engineers on what to do rightwhen they make a change.

(39:24):
This will help you scale yourprogram and actually, not only
comply, but also remain incompliance with these laws.

Awesome.
Thank you for that.
That's illustrative.
All right, I want to talk alittle bit about the future of
Privado and just also privacyengineering education for a
little bit.
One of the things that reallyimpresses me about Privado is
that the founders focus oneducating the next wave of
privacy engineers is soimportant.

(39:53):
Besides sponsoring this podcast, Robert Bateman's 'Privacy
Corner', and, Vee, your ownwebinar series on privacy
engineering, Privado recentlymade available a free Technical
Privacy Masterclass hosted byNishant Bhajaria, Privado's
Advisor and Author of theseminal work 'Data Privacy: A

(40:15):
Run Book for Engineers.
' Can you tell us a little bitabout this masterclass?
Who was it designed for?
What topics does it cover?
How long does it take tocomplete?
that kind of thing?

That was such a fun project to build this course
with Nishant on technicalprivacy.
Again, the motivation for uswas.
.
.it was also like an inward-facing motivation for us.
It was like, "Hey, as we'vebeen speaking and working with
our customers, we end up workingwith a lot of technical privacy
leaders and privacy engineers;"and some people have been in

(40:45):
industry for a while, some areentering and I was like, is
there some space where technicalprivacy leaders or privacy
engineers can come together,Learn, talk, care, and that led
to this one- by- one or multiplekind of initiatives, like
starting with sponsoring apodcast.
Then, as we went through thatfeedback cycle, working on this

(41:09):
course, launching a community,doing our own webinars around
this as well.
It's been a great learningexperience because I personally
get to learn a lot doing.
.
.not just the masterclass butlooking at questions in each
webinar, listening to yourpodcast as well.
Right?
Specifically, on themasterclass, I mean it's for
anyone who's working in atechnology company.

(41:30):
So, basically, we built it insuch a way that it talks about
why engineering is different.
If you are in a tech companythat has a lot of engineers, it
starts with explaining howengineering has evolved and why
it is different - kind of thesame thing we were talking about
initially.
Then, it goes into how you canbuild a privacy program, which
is kind of more aligned to howyour engineering practices are.

(41:52):
So, it's more proactive itactually kind of works.
Then, we also went a littledeep into how can you build a
privacy tooling; how can youbuild your own DSAR; how can you
buy or build a data discoverysolution; and then, finally,
goes into KPIs.
It also has a nice bonusepisode with Nishant's journey
into privacy engineering.
So, our core motivation was tohelp people who are entering or

(42:15):
who are already, I would say are, 'Unofficial Privacy Engineers
,' who are still doing the work.
So, they can get a resource tobegin with, and they can learn
and enter the field.
That was our motivation to doit.
It's been a great experiencelaunching this because we've got
such an amazing feedback, bothpositive and also critical, on

(42:35):
where we can improve.
We have a lot of data points onthe next modules we want to add
and the version 2, how it couldchange, as well.
But yeah, it was such a funproject to work on as well.

I've been in privacy for over 18 years and
even I learned new things fromNishant's perspective on how to
approach implementing some ofthese solutions.
I absolutely think - this isthe best technical privacy class
I've ever come across.
I think it's excellent.
I'm not raving about it simplybecause I'm a Privado Advisor.

(43:10):
I think it's something that canbe used by, well.
.
.anyone in privacy can take it;it's certainly aimed at
technical folks.
It would be great forengineering teams that are
upskilling on privacy to takethis.
It would be great for currentPrivacy Engineering Managers to
take this and take it with theirteams, and maybe something they
do as they talk aboutincreasing their skills over the

(43:31):
year, and being able to showthe certificate that they've
completed it.
How long is it?
I know I took it, but I wasn'tsure.
Is it like around 3 hours long?

Yeah, roughly about anywhere from 2 - 3 hours
is probably the content.

It's somewhere between 2 - 3 hours.

It's not a lot.
This is kind of, as you said.
.
.probably like one third ofpeople who attend our privacy
engineering webinars are peoplewho are trying to enter the
field.
For them, this is like a goodprimer.
Some of them are actuallysoftware engineers who built.
.
.they literally have built aDSAR system in the company from
ground up, like the fullinfrastructure; and now, they

(44:09):
can actually learn the otherparts of privacy, which is not
just the tooling, but the wiresin the house and you know how to
think about it - a privacyprogram, KPIs, and stuff like
that.
So, yeah, that is the kind ofaudience that we have and, two
to three hours - not a lot ofeffort, but a lot of learnings
for sure.

Oh, so much impact! So much wisdom dispensed
by Nishant in that amount oftime.
So, it really does feel like amasterclass.
You could be talking toexperts, as well, in it.
He's clearly talking in a waywhere both experts can step back
and get more of a holisticperspective about what their
approach is; but newbies to thefield could also pick up those

(44:48):
best practices immediately.
So, it really is good.
I'm going to put a link to itin the show notes so that anyone
can check it out; and, rememberit's free, so there's really no
negative here.
If you have the will, thenPrivado's provided the resource.
I do want to know if you havenumbers?
How long has it been out now?
Like a month or two?
How many people have completedthe course so far?

About that - we are inching towards getting 1000
enrollments to the course.
So, yeah, it's organicallygrown a lot.
I mean, again, I'm superthankful to people who took the
course, posted on LinkedIn, andgave good feedback because that
encourages more people to takethe course and then, you know,
more people can learn about it.

(45:29):
Because it's successful.
.
.again, as I said, the nextversion of it - at least for the
people have taken it - we aregoing to go deeper.
Like one is Technical PrivacyReview - something that I'm
really looking forward to createwith different people, along
with Nishant, something veryspecific which goes into, OK,

(45:50):
how do you do a review?
How should you design it?
What should the KPIs be?
" So, once we have this basewhere people have taken it, I
think the next ones are going tobe more in there.
Ai governance is another one weare thinking about as well.

Oh yeah, that makes sense.
There's a lot of demand forknowledge there.
So, I know that you've got somereally exciting features on
Privado's roadmap.
Would you mind sharing a few ofthose with us?

For sure.
I think broadly, again, as Ikind of laid out, our vision is
to be the platform where privacyand engineering teams can come
together, collaborate, and theycan confidently ship features
fast with privacy built- in fromthe get go.
And once you have this vision,saying, "Ok, this is what we
want to do".
We want to enable speed - forthe developer, it's a good

(46:35):
experience with less privacyrisk, better privacy experience
for the end user.
It really forces us to solvefor some things.
The first problem is thisvisibility problem .
as As a privacy professional, ifyou're working in a technology
company, one question if youwant to answer is "I have this
product, product A.
Show me the data flow diagramof that.

(46:58):
I mean, it's so hard to answerthat question, even with all the
work that has been done; evenif you bought a privacy program
management tool, did the datamapping, bought a data discovery
tool.
Still, you're not able toanswer that because the tooling
was not built to answer thisquestion.
So, our motivation, our vision,all the features that are
coming up is "hey, we alreadyprobably solved the problem to a

(47:21):
good extent, but it has to besuper simple.
Someone can answer thisquestion really, really fast.
And then, the next level is,for example, we can automate
documentation like RoPA for 40%- 50%, and then now, we've just
taken it up to even 70+% thanksto our language model.
So, how can you use code - thesource of truth - to build this

(47:42):
documentation, which areevergreen, which are coming from
the code, which have a lot ofcontext as well?
So, that's kind of the firstbig puzzle that we are solving.
The second big puzzle we aretrying to solve for is making
privacy- by- design programmatic.
A lot of efforts are done in.
.
.the entire privacy- by- designprocess is to do privacy

(48:02):
reviews, which are superimportant to do at the design
stage for large features.
But, you can also go into arabbit hole where you can
continue to do them.
You can really scale them up,which is where you do so many
privacy reviews that you havezero rest.
And then what?
That reduces the developmentspeed.
That literally halts productinnovation; and, in today's

(48:24):
environment, if you are notinnovating, your company is kind
of suffering.
So, how can you make it to apoint where you're reviewing the
right stuff and then a codescanning solution is taking care
of everything else, and a partof it is the product?
A larger part of it is alsoopen sourcing these privacy
rules that we have.
So, a large part of work ishappening there as well.

(48:46):
I'm super excited about that.
Then, I would say the thirdthing is going deeper into this
full collaboration space ofprivacy and engineers and these
technical reviews; and again,helping them scale up.
So, we have an interestingproduct there as well, wherein
we are kind of looking at stufflike PRDs, ERDs and trying to

(49:07):
help people streamline or triageand say, "Hey, these are the
five requests that you shoulddefinitely review and these are
the ones that you can kind ofsafely ignore and let the
privacy code scanner take careof it.
" But again, overall vision isto have this one platform where
these two personas can worktogether; and, for that, we have

(49:27):
to solve for visibility,governance, and collaboration.

Awesome.
Thank you so much for that.
Privacy engineering is a smallbut growing field, for sure; and
you definitely have a front rowseat to its development.
You're working with customersfrom a variety of industries and
of different sizes.
Can you tell us what trendsyou're seeing in this space and
how do you see the field shapingup over the next two years or
so?

Yeah, sure, I think the basic trend I'm seeing
is it's becoming very popular.
Companies have decided, andthey're hiring privacy engineers
.
We run a Privacy Communitywhere we post on our Job Board
new jobs, and there are alwaysnew jobs almost every week.
It is getting popular.
So, that's a net positive trend, I would say.

(50:12):
The second thing is, at leastwhat I've seen, is Privacy
Engineers are getting successfulat their jobs, which means they
are becoming successful in acompany to actually take the
amazing work of the privacyprofessionals - the Chief
Privacy Officer, the Chief LegalOfficer - have done on the
policy side (on creatinginternal policies, best
practices, record keeping) andtranslate them into engineering

(50:34):
requirements and tools that theycan use to scan, build, buy -
all of that work.
One of the simplest trends I'mseeing is existing Privacy
Engineers are getting successfulin reducing privacy risk;
successful in building privacyfeatures and tooling; and,
hence, it's a nice cycle whereinbecause they're getting
successful, they're going tonewer companies and newer

(50:56):
companies are hiring morePrivacy Engineers and that's an
amazing cycle.
This one thing I'm seeing isthere are more Privacy Engineers
that are coming in.
It's still a small community ofpeople where everyone is super
helpful, sharing information,and trying to network with each
other to share best practices.
It is growing.
I think that's one trend thatI'm seeing in the space.

Awesome.
How do you think it's going toshape up over the next two years
or so?

Again, I think it's just kind of mainstreaming.
Mainstreaming would mean thatwe would have better KPIs for
different things.
I do think, for example,Nishant's book talks about KPIs
for data discovery.
I do think there will be KPIsfor Technical Privacy Reviews or
Privacy Engineering Assessments, which will become popular.
I do think privacy codescanning will become mainstream,

(51:45):
and that's what we've seen withour customers - the Privacy
Engineers taking the leadbecause they would be the ones
who will be like "Hey, we havethis nice automated check as
part of SDLC, which is lookingfor things, so we are doing
high- level work, more importantwork in the company as well.
I'm not sure about this, but Ithink the impact of AI
governance, Privacy Engineersmight also play a big role there

(52:09):
, but that's so early that Idon't really know for sure.
Definitely, they'll have a bigrole to play, but I don't know
how much they will end up owningthat piece.
.
.
I mean, it'll be a sharedresponsibility, but I don't know
the exact answer to that.
But at least, these are acouple of trends, or couple of
ways, I can see the fieldshaping up.

Thank you so much.
Do you have any words of wisdomto leave the audience with
before we close?

No, I mean, I just wanted to come here on the
podcast and share our journeyand what we've been doing with
Privado.
Again, personally, we as acompany are super grateful to
the privacy community.
I think privacy in general, Ithink are the most open people,
most accepting people, mostdiverse people.
Everyone is trying to solve theuser's problem, which sounds

(52:58):
super exciting because you'relooking things from the user's
angle.
Obviously, you have thebusiness metrics, but you're
looking at things from thebusiness angle.
So, yeah, we're superappreciative and thankful to the
entire privacy community foraccepting us, and working with
us, and giving us good and nicefeedback and critical feedback
as well.
And, we really look forward toworking with everyone who's

(53:18):
listening on the podcast in thefuture as well.

Yes, everyone, go check out Privado.
I'll put the link in the shownotes.
Well, Vee, thank you so muchfor joining us today on The
Shifting Privacy Left Podcast totalk about privacy code
scanning, Privado, and the riseof privacy engineering.

[Vaibhav (53:36):
Thanks, Debra, for having me here.
] Until next Tuesday, everyone,when we'll be back with engaging
content and another greatguest, or guests.
Thanks for joining us this weekon Shifting Privacy Left.
Make sure to visit our website,shiftingprivacyleft.
com, where you can subscribe toupdates so you'll never miss a

(53:57):
show.
While you're at it, if youfound this episode valuable, go
ahead and share it with a friend.
And, if you're an engineer whocares passionately about privacy

, check out Privado (54:06):
the developer-friendly privacy
platform and sponsor of the show.
To learn more, go to Privado.
ai.
Be sure to tune in next Tuesdayfor a new episode.
Bye for now.

All Episodes

Episode Transcript

Popular Podcasts

Dateline NBC

24/7 News: The Latest

Therapy Gecko

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}S2E33: "Using Privacy Code Scans to Shift Left into DevOps" with Vaibhav Antil (Privado)

Episode Transcript

Popular Podcasts

.css-r6mb8g{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:1;overflow:hidden;}Dateline NBC

24/7 News: The Latest

Therapy Gecko

All Episodes

S2E33: "Using Privacy Code Scans to Shift Left into DevOps" with Vaibhav Antil (Privado)

Dateline NBC