My guest this week is Patricia Thaine, Co-founder and CEO of Private AI, where she leads a team of experts in developing cutting-edge solutions using AI to identify, reduce, and remove Personally Identifiable Information (PII) in 52 languages across text, audio, images, and documents.

In this episode, we hear from Patricia about: her transition from starting a Ph.D. to co-founding an AI company; how Private AI set out to solve fundamental privacy problems to provide control and understanding of data collection; misunderstandings about how best to leverage AI regarding privacy-preserving machine learning; Private AI’s intention when designing their software, plus newly deployed features; and whether global AI regulations can help with current risks around privacy, rogue AI and copyright.

Topics Covered:

  • Patricia’s professional journey from starting a Ph.D. in Acoustic Forensics to co-founding an AI company
  • Why Private AI’s mission is to solve privacy problems and create a platform for developers to modularly and flexibly integrate it anywhere you want in your software pipeline, including  model ingress & egress
  • How companies can avoid mishandling personal information when leveraging AI / machine learning; and Patricia’s advice to companies to avoid mishandling personal information 
  • Why keeping track of ever-changing data collection and regulations make it hard to find personal information
  • Private AI's privacy-enabling architectural approach to finding personal data to prevent it from being used by or stored in an AI model
  • The approach that Privacy AI took to design their software
  • Private AI's extremely high matching rate, and how they aim for 99%+ accuracy
  • Private AI's roadmap & R&D efforts
  • Debra & Patricia discuss AI Regulation and Patricia's insights from her article 'Thoughts on AI Regulation'
  • A foreshadowing of AI’s copyright risk problem and whether regulations or licenses can help
  • ChatGPT’s popularity, copyright, and the need for embedding privacy, security, and safety by design from the beginning (in the MVP)
  • How to reach out to Patricia to connect, collaborate, or access a demo
  • How thinking about the fundamentals gets you a good way on your way to ensuring privacy & security


Resources Mentioned:


Guest Info:

  • Connect with Patricia on LinkedIn
  • Check o

Send us a text



Privado.ai
Privacy assurance at the speed of product development. Get instant visibility w/ privacy code scans.

Shifting Privacy Left Media
Where privacy engineers gather, share, & learn

TRU Staffing Partners
Top privacy talent - when you need it, where you need it.

Disclaimer: This post contains affiliate links. If you make a purchase, I may receive a commission at no extra cost to
Mark as Played
Transcript

Episode Transcript

Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Patricia Thaine (00:00):
Here's, I think , where regulation is going to
make a huge difference, becausewhat we saw with regards to the
GDPR is that it is aforward-thinking, wishful
regulation where a lot of thetechnology required to comply
with the GDPR doesn't even existyet.
We're working on it at PrivateAI, but it still does not exist

(00:22):
yet.

Debra J Farber (00:23):
Hello, I am Debra J Farber.
Welcome to The Shifting PrivacyLeft Podcast, where we talk
about embedding privacy bydesign and default into the
engineering function to preventprivacy harms to humans and to
prevent dystopia.
Each week, we'll bring youunique discussions with global
privacy technologists andinnovators working at the

(00:43):
bleeding edge of privacyresearch and emerging
technologies, standards,business models and ecosystems.
Welcome everyone to ShiftingPrivacy Left.
I'm your host and residentprivacy guru, Debra J Farber.
Today, I'm delighted to welcomemy next guest, Patricia Thaine,
Co-founder and CEO of Private AI, where she leads a team of

(01:04):
experts in developingcutting-edge solutions for
identifying, reducing, andremoving personally identifiable
information (or PII) across 52languages in text, audio,
images, and documents using AI.
Welcome, Patricia.

[Patricia (01:20):
Thank you so much, Debra.
It's such a pleasure to be herewith you.
] Likewise.
AI is definitely the topic ofthe year, but I'm really excited
to find out a little bit aboutyour journey bringing Private AI
to life, your career journey;and, if you don't mind, starting
by telling us a little bit ofyour backstory and how you then

(01:40):
ultimately decided to co-foundan AI company.

Patricia Thaine (01:44):
Yeah, so I started a PhD in order to start
a company, actually.
I was looking at acousticforensics.
So, who's speaking or recording?
What kind of educationalbackground they have?
If you include them inautomatic speech recognition
systems, it can improve thesystems quite a bit.
While doing this work, itbecame really obvious: 1) if you

(02:04):
get the data, you have massiveamounts of privacy problems; and
2) you often can't get the dataanyway because of privacy
problems.
So, two sides to a coin thatreally encouraged me to go and
try to find solutions to privacyproblems.
Then, I started looking athomomorphic encryption, did a
few publications on that, triedto spin up a company in 2017

(02:26):
that was combining homomorphicencryption with natural language
processing, scrapped it becauseit wasn't going to scale, and
then co-founded this one in 2019with my co-founder and CTO,
Peter Luitjens.

Debra J Farber (02:36):
That's been a while now that you've been
working on that; and so I'vebeen watching that you've gotten
a lot of investment, actually,from several VCs and corporate
VCs.
What problems did you set outto solve with Private AI that
the VCs looked at and go, "Oh myGod, yeah, this is what we need
to invest in, this is what weneed to bring to market.

Patricia Thaine (02:57):
Yeah, so in 2019, it was still quite early,
I guess, definitely compared tonow, with regards to the
fundamental problems of privacythat needed to be solved for
unstructured data.
What we set out to solve isreally.
.
.
if you're a Natural LanguageProcessing Engineer or if you're
an engineer dealing withunstructured data - which ends

(03:18):
up being 80% to 90% of the datathat is collected by
organizations - you're going toneed tools in your arsenal that
allow you to handle the data ina very flexible way.
Oftentimes you have to be ableto see the data; it has to be
really highly accurate; and, ithas to run your environment a
lot of the time as well, or inyour customer's environment.

(03:41):
What we set out to do is reallytake privacy problems that had
not been solved before, whichare very fundamental, and make
them really modular, reallyeasy- to- use, and make it so
that you can integrate itanywhere you want in your
software pipeline (including atdata ingress; and ideally, at

(04:01):
data egress).
So, you have absolute controland understanding of what kind
of data you're collecting.
One of those fundamentalproblems is what is the personal
information in the first placein this really messy data.
For the most part, folks havebeen using regular expressions
or open source software that isnot well- adapted for this

(04:22):
particular task and it turns outit's quite a gargantuan task if
you're going to do this right.
So, that is what we arefocusing on - bits and pieces of
privacy problems to create aplatform for developers to be
able to pull what they need andintegrate it into their software
pipelines.

Debra J Farber (04:40):
Okay, and is this mostly around leveraging AI
models?
Or, are you also applying thisfor data discovery for GRC
purposes, you know, for findingall the data across an
organization and then being ableto action upon that data or
deliver data-subject accessrequests (DSAR)?
Help me put the scope aroundthe problem that you are solving

(05:04):
for.

Patricia Thaine (05:05):
Yeah, all of the above.
There are so many use cases.
It really comes down to thosefirst principles of: 1) you: 1)
you the personal informationinformation, then then 2) can
only keep what you need.
?
you solve those two, thosefundamental components are
applicable to a number ofdifferent tasks and they should

(05:25):
be applicable anywhere in yourenvironment.
Whether you're running on Azureor AWS or GCP or on-premise -
and really one of my dreams isto run directly on the edge.
We created edge deployments ofour products, but it was really
too, too early for that.
But one day that will be thecase.

Debra J Farber (05:44):
That's pretty cool.

Patricia Thaine (05:45):
Yeah, essentially you want to limit
the amount of personalinformation you're gathering as
soon as possible so it doesn'tget crazy and messy.

Debra J Farber (05:53):
Right, right.
I do want to ask, because youdo refer on your website a lot
to PII, which is PersonallyIdentifiable Information.
Is it more encompassing, doesit include Personal Data more
broadly, or is it specificallyidentifiable data that you can
discover?

Patricia Thaine (06:09):
It's identifiable or
quasi-identifiable.
We identify over 50 differententity types across what counts
as PII, PCI, or PHI; and, we'realso working on confidential
company information as well.
So, it's anything that youreally need to capture in order
to make sure that you'recomplying with a number of

(06:29):
regulations or that you aresafely using products within
your organization.

Debra J Farber (06:35):
Got it.
Okay, that makes sense.
One of the things that I'venoticed in my career is that
companies very often aremishandling data, or employees
within companies are mishandlingdata because they're just maybe
ignorant about the appropriateways to handle it, or whether or
not something is 'confidential,' or 'personal information' or

(06:58):
whatnot.
So, how can companies avoidmishandling personal information
when leveraging AI / machinelearning?

Patricia Thaine (07:05):
There are a few misunderstandings about how to
best leverage AI in aprivacy-preserving way.
One of those fundamentalmisunderstandings is actually
that if you're deploying a modeldirectly in your environment,
you're okay, you're good, that'sall you need for privacy.
That is very, very untrue,because normally what you're

(07:26):
gonna be doing is fine-tuningthat model in your data; and
then, what you have to becareful of is if that data that
you're fine-tuning that model onor training that model on
contains 'personal information'or 'confidential information.
' That means that that model isgonna have to have the same
levels of access controls asthat original data.
The best thing that they coulddo is actually remove the

(07:48):
information that you don't needto fine-tune those models on.
So, you can limit those needsfor appropriate levels of access
control, which will end upbecoming a complete nightmare
with organizations if you haveto deal with access control for
different models.

Debra J Farber (08:04):
I could see that being a headache, yeah.
It seems like detectingpersonal information is pretty
difficult to do and there'sdifferent purposes for which a
company might do this.
Maybe data loss preventionpurposes and they wanna identify
data that might be exfiltratingfrom the company, but we
prevent personal data fromleaving.

(08:26):
Or, it could be data discoveryfor purposes of finding personal
data across your organizationso you can action upon it.
Or, it could be for otherreasons, right?
So, why is it that it's so hardto find personal information?

Patricia Thaine (08:40):
Just think of the disfluency of language.
Think about optical characterrecognition errors when you're
trying to analyze PDFs or images.
Think about the errors thattranscription systems make.
Think about all of thedifferent languages in the world
and how they're used incombination with one another.
Think about all the dataformats, all the data structures

(09:05):
.
Think about even how dates arerepresented differently across
English alone and how spellingmistakes might affect that.
So, AI is super necessary inorder to do this properly.
What was being done prior to AIfor this problem was: here's a
regular expression whichbasically says "find 16 digits

(09:27):
that are one after another andcall that a credit card number,
for example, but then think ofall the different exceptions.
It becomes impossible to accountfor all of the different
exceptions that happen fairlyoften.
AI allows you to look at contextand, based on the context,
determine what is or is notpersonal information.
However, there is also amisunderstanding of how to use

(09:50):
AI models appropriately fordifferent tasks.
Oftentimes with people thinkingthat you can just throw a
machine learning model, like anLLM, at a task that it was
purpose- built for and call it aday.
For some very basic examples,that works fine; but, in the

(10:10):
real world, you get to cornercases so quickly and to be able
to cover all those corner casestakes years and years of corner
case data collection.
That keeps changing becauselanguage keeps changing, because
data formats keep changing,because the way that

transcription systems work: their output keeps changing. (10:27):
undefined
So, it's constantly keepingtrack of that and then keeping
track of all of the differentregulations where the definition
of PII, or what counts as PII,is changing.
For example, if you think ofthe data protection regulation
that came up out in India,'cast' is considered PII under

(10:48):
that data protection regulation.
Right?
The sensitive piece ofinformation.
That is not something that isin the GDPR, and that's
something that you have to thengo collect data for, make sure
you're capturing the cornercases, make sure you're
capturing the corner cases inall of the different languages
supported.
So, you can imagine it's a 50 x52 or more scope of different

(11:11):
types of entities that you'recapturing because of all of the
combinations of languages.

Debra J Farber (11:17):
You definitely, I think, made the case for why
it's so hard.
So, given that, tell us aboutPrivate AI's architectural
approach.
How did you design it, and witha special emphasis on the
privacy pieces, of course.
It's a technical audience, sofeel free to get technical.

Patricia Thaine (11:33):
Yeah, so we have mainly focused on how can
we make sure that the data doesnot leave the environment in
which it is being processed.
So we deploy generally on-premor in private cloud.
We do have a cloud API as well,often being used for testing,
but also being used by smallercompanies or companies who don't

(11:57):
have the ability to do thesetup, and what we do is make
sure that we delete the dataright after it's processed, so
we don't store or see any of theinformation.
But when we deploy on-prem andprivate cloud, the only
information that getstransferred to us is usage
statistics, that's it.
We don't see any of the data.
What that means is that.
.

(12:17):
.well, we often get asked"wWhat does that mean for your
models?
How are you improving yourmodels?
How do you know if they'reworking?
And the way that we knowthey're working is our customers
, and sometimes our customers'customers, which is when our
customers are the moststringent), end up seeing the
results of our system.
We get feedback by saying "Thisde-identified piece of

(12:38):
information" or "this piece ofinformation was not properly
de-identified.
Here's a sample that isde-identified for you to create
copies of and then fine-tuneyour model with.
" So, over time, we end upcreating more and more robust
models for everybody, witheverybody's feedback, but only
when they physically give us thedata that they want us to know

(12:59):
about.

Debra J Farber (13:00):
That's fascinating.
Thank you.
How does the accuracy of thedata matching of personal data
stack up?
Tell us a little bit about theaccuracy of your matching.

Patricia Thaine (13:12):
Yeah, so it depends on the type of data, and
we do focus more on the highly-sensitive piece of information,
like names or credit cardnumbers or phone numbers, things
like that.
What we do is we aim to get to99% plus accuracy with our
customers, and it's a process.
Right out of the box, it doeswork very well; and if it's data

(13:32):
that's similar to what we'veseen in the past, it will work
that well, right out of the box.
However, with anythingunstructured data related, it is
constantly something that'sbeing improved upon.
What I can say is "welegitimately have the best
system in the world for thistask, and I do not say that
lightly.
Every time we do a POC, whichwe do many of per year, we come

(13:54):
out on top.
The continuous feedback loopfrom our customers means that
we're constantly improvingfaster and faster.
In addition to that, we havenoticed that we do have to, of
course, put in proper resourcesto do what it takes to get a
research paper out of this; but,we have over and over observed

(14:16):
that our system is more accuratethan human level accuracy for
identifying personal information.
So, you get the higher accuracythan a human and you get the
scale that a human can provide.

Debra J Farber (14:28):
Wow, that's definitely quite the
go-to-market message.
What's on Private AI's roadmap,or what are you currently
researching?

Patricia Thaine (14:38):
This is quite an endless problem.
What we are hoping to providethe community is a platform to
go to when they're dealing withanything PII, and that's
regardless of language,regardless of format, and also
the ability to better understandthat data.
The research that we're doing isvery much around "hHow can we

(15:00):
allow our users to betterunderstand the data that they're
processing?
Also, we've recently deployed adashboard and a control panel
that allows managers to be ableto see what kind of information
different teams are processingand be able to control directly
from the edge.
The way you can look at it is,say, a CISO or a product manager

(15:24):
is seeing, not as anafterthought, not after the data
is collected, but while thedata is being collected.
"Hey, there's a credit cardnumber that just flowed through
this chat and it's about to bestored in your central database.
This is alerting you about that.
Do you want it to continue?
So that's something that we'venow enabled and essentially,

(15:44):
what we're aiming to do is allowthese managers to have a full
view of what's going on withintheir organization from a
product- by- product andsoftware pipeline perspective,
rather than as an afterthought.

Debra J Farber (15:59):
That's pretty cool.
Now I want to ask you a littlebit about AI regulation.
So, I came across your article,literally titled 'Thoughts on
AI Regulation' from about amonth ago on LinkedIn, and what
struck me is not so much youropinion, because I agree with it
and I've been thinking the same, but I just love that, like me,

(16:20):
you didn't pack any punches.
You write.
"I think the call forregulation to prevent rogue AI
is so utterly absurd and, likein bold, I really can't
emphasize enough how absurdeither pausing the development
of foundation models orregulating them would be for the
purpose of preventingexistential risk, totally agree.

(16:43):
Would you mind unpacking that alittle bit and explaining why
you feel so strongly about theexistential risk?

Patricia Thaine (16:57):
Partially because I do strongly believe
that regulations can help withthe current risks, and that's
definitely a focus that should.
.
.one of the main focuses or themain focus that regulation
should have, which I think thatis where it's leaning.
Then, for the existential risk,if you read document talking
about his concerns aboutexistential risk and having

(17:21):
rogue players create a rogue AI,for example - if you think
about what that means, it meansyou could have, for example, an
AI that is generating code, thatis doing so in a way that
humans cannot keep up, and thenbeing able to hack devices and
cause all sorts of raucous.

(17:41):
But the key thing is humanscan't keep up.
And then, if you think aboutwhat regulations can do,
regulations can make sure thatthe people who care about
regulations are following theregulations.
When it comes to these models,if they ever do get to this
point - I'm not commenting onwhether or not it's a
possibility - but,hypothetically speaking, if they
get to this point, any nationstate is going to be able to

(18:05):
train a model like this.
It's the amount of compute thatreally matters.
How to do it also matters, butthere's theft, there's research,
ideas get propagated.
I highly doubt that if it's anation state or a large enough
criminal organization that thiswould not be possible to do if
we get to this stage in AItraining.

(18:26):
So, if you think about this,the only real possible way to
counter this is in investing inAI that can counter that
potential rogue AI.
That's the only thing that cansurpass human capabilities in
this scenario and actuallyprotect the systems in an active

(18:49):
way.

Debra J Farber (18:51):
So, then let's call that anti-rogue AI.
So then you haveanti-anti-rogue AI and it just
becomes an arms race really.

Patricia Thaine (18:59):
Cybersecurity is an arms race.

[Debra (19:01):
yeah that's fair].
Yeah, it's an internal armsrace.
So, what good has legislationdone when it comes to stopping
criminal activity, when it comesto creating viruses or to fraud
and all of that?
Very little, if any.
So, expecting regulation to doanything on this front, if

(19:22):
somebody's willing to create arogue AI that's going to destroy
humanity, I really don't thinkthat they care about regulation.

Debra J Farber (19:29):
Right, right.
Criminal is going to criminaland all.
That makes a lot of sense.
So, yeah, it is an arms race.
I guess the question.
.
.Everything feels like an armsrace already and we're not even
dealing with the actual risks orwe being like the general
discussions you hear out in themedia about AI or the LLM base

(19:52):
model companies.
They're trying to put all ofthis attention on the potential
future, potential existentialrisks that maybe one day they
might run into, as opposed tothe current risks that they're
doing with data right now -risks to people, risks to
intellectual property, risks to,you know, just risks generally.

(20:15):
I think that that's the majorchallenge, right?
I mean, you've got this VC-backed company like OpenAI and
then you've got Microsoft'smoney and you've got all the big
tech companies that have lotsof money and can spend a lot on
compute and, you know, have lotsof lawyers kind of racing to
market right now with stuffthat's half- done and not
necessarily thoughtful aroundprivacy (that's aimed at OpenAI,

(20:39):
not the other companynecessarily).
Where do you think that thiswill go?
I mean, we just had the EU kindof almost pass a law, but it
seems like they're getting closeto that.
The U.
S.
, in my opinion, is just goingto take a lot longer than we
want to pass something that'smeaningful.
I mean, look how long it tookfor the EU to do it.
The U.
S.
only just started on this track.

(21:00):
So, where do you see this goingwhen we've got the big tech
companies kind of in an armsrace against each other to come
and grab market share, given thecurrent regulations down the
road?

Patricia Thaine (21:13):
I think if we break it down into the problems
- I mean there's bias, there'sexplainability, a question mark
for whether or not hallucinationis actually a problem or a
feature, there's privacy -[Debra: Both].
Yeah, both works too.
There's just, for a lot ofthese problems, not enough
research as to how to even dealwith them, explain abilities

(21:36):
especially.
Here's, I think, whereregulation is going to make a
huge difference, because what wesaw with regards to the GDPR is
that it is a forward- thinking,wishful regulation where a lot
of the technology required tocomply with the GDPR doesn't
even exist yet.

(21:57):
We're working on it at PrivateAI, but it still does not exist
yet, if you even think about.
.
.

[Debra (22:02):
data portability] data portability, being able to do
access to information requestson unstructured data in a way
that's not going to bog down anorganization tremendously if
they have enough customers.
There's so many aspects of itthat you need better technology
for, but what we see is thattechnology being created as a

(22:24):
response to that requirement.
Because that's when people.
.
.actually organizationsactually started to get their
act together and the huge massesof data that they had.
All of a sudden, they had tomake some sort of sense of it
and make it usable from aprivacy understanding
perspective.
And then, what these AIlegislations are going to bring

(22:46):
is reallywhat do we want the future to
look like?
Where are we going to focus ourenergy?
If we have things like, "don'teven bother to do public
surveillance because that is anunacceptable risk, that's great.
That tells us, "o not focusyour energy on that.
Let's focus our energy on whatthe requirements are around
explainability.
Let's focus our energy aroundunderstanding what the risks are

(23:08):
aroumd bias and data ininsurance, for example, with
regards to privacy.
" A lot of that is alreadycovered by data protection
regulations.
So, the AI legislations I don'tsee actually making much of a
difference when it comes to whathappens with PII, except for
unacceptable risks or requiringmore oversight around these

(23:34):
higher risk applications.

Debra J Farber (23:36):
Yeah, that makes a lot of sense.
Well, I do hope that thestandards that people are
working on, which I'm notfollowing as closely in the AI
space, but I do hope otherdrivers move forward, putting
the focus back on preventingharms, especially to people,
with this new technology.
A lot of the people coming tomarket seem to think that, oh,

(23:57):
there's no AI laws, so there areno laws, so I can just do all
of these experiments on peoplein real- time or put something
in the stream of commercewithout thinking about privacy
and security.
So, I do hope things likestandards and new technologies
or privacy enhancingtechnologies enable people to

(24:20):
easily do some of the rightthings.
So, at least there's aneducational piece there, or some
driver of them addressing thatrisk, before they put a product
out in front of other people.

Patricia Thaine (24:33):
Yeah, definitely, and you had
mentioned copyright risk at riskas well.
I think, fundamentally, that'sgoing to be an industry changing
thing, if you think aboutNapster or now Spotify and what
it did to the CDs in the musicindustry.

Debra J Farber (24:50):
Yeah, I remember .

Patricia Thaine (24:51):
Totally changed what's done.
Yeah.

Debra J Farber (24:54):
Totally.
Part of me is like I stillthink that there's going to be a
reckoning here.
I mean, you can't.
.
.
if it's determined that peopleneed to be compensated for the
data that they provide fortraining purposes, or if there's
infringement because the outputof the AI model was obviously
maybe over-trained or overfittedto somebody else's IP, that's

(25:19):
going to be really expensivereal fast.
It's going to be really.
.
.
I'm not even sure it's possible, if you're Open AI, for
instance, to all of a suddenjust change your model.
You'd have to actually retraineverything.
Right?

Patricia Thaine (25:30):
Yeah, and you'd need to understand what the
sources of the data are that areactually being output.
But, suppose that fundamentalquestion of where did this come
from gets answered in some way.
Then, all of a sudden, thatopens up authors and creators to

(25:50):
a Spotify-like commercialpossibility when it comes to
their works.

Debra J Farber (25:57):
Absolutely.
It's actually "dDon't cut humanbeings out of the loop here,
you know, bring them into theloop and then, instead of the
surveillance capitalism modelwhere we're extracting stuff
from people - like companies areextracting data from us - in
this case, it's like enfranchiseus.
Let us be part of the economicmodel and don't take too much of
it, company; you know, they'relike 30% of profits on, you know

(26:20):
, for the app stores.
If you're just taking too much,people are not going to want to
use your product.
So, I think that makes a lot ofsense.

[Patricia (26:28):
That's where regulation can help].
I agree.
Although, it's game- changingregulation, right?
I mean, I personally, havinggone to law school and just
believing that copyright,specifically, is there to
incentivize humans to put in thehard work of bringing something
original and putting it outinto the public domain.

(26:50):
You know, I believe that thatis integral to who we are.
So, I actually think thateventually, this is going to
really be an uproar in theindustry, but I guess we
shouldn't spend too much moretime on that topic without any
sort of regulation that backs meup on that.

Patricia Thaine (27:11):
I don't know if no regulation is backing you up
on that, because if you thinkabout what it normally takes for
developing software on publicrepositories of code or on
public data until very recently,the way to do it and the way we
do it at Private AI is we makesure that any libraries that

(27:32):
we're using - any GitHublibraries we're using or any
code repositories are using -has the appropriate licenses.
Any data we're using, we haveto make sure we have the
appropriate license to use.
Unless it's public domain, weneed to have the appropriate
legal in place.
So, it's actually a little bitodd the way that things are

(27:53):
functioning more recently.

Debra J Farber (27:55):
Yeah, I know it's really odd and it's
presumptuous and it just happenswhere no one was.
.
.only people who were livingcompletely in an AI world, like
yourself working on in thisspace.
Maybe we're paying attentionsix months ago to a year ago,
when Open AI, ChatGPT came outinto everybody's consciousness
and it's like, "hat is this?

(28:16):
" Everyone's talking about it.

Patricia Thaine (28:18):
Yeah, and the way I was thinking about it was
what's the difference betweenthis and Google, which also
makes use of people's data inorder to profit?
But it's really the providingthe source and providing benefit
to that source.
The benefit they're providingis eyes on pages.

Debra J Farber (28:36):
It's a real good point.
I think their argument was thatthey're linking to, they're
indexing and providing theability for somebody else to go
to the actual site to see it.
Right?
Maybe that's not the case witha picture where it might be like
Google images or whatever it is, and you see it on their site
before you click on it.
But a thumbnail image, I guess,would be kind of like a snippet

(28:57):
, not the actual thing.
I don't know.
I actually don't followcopyright law development like
the case law, so I really justpulling from my legal training
back in 2004.
This was a while ago, but yeah,it does feel like things really
came out of nowhere.
My true belief is that therewas an effort in the industry,

(29:19):
led by VCs, to kind of flood themarket with this as quickly as
possible so that it's everywhereand so regulators can't
regulate or so that they beatthe rest of the market.
That's how I feel about howOpen AI came into my
consciousness, at least, when Ifirst heard about it because I'm
really drinking from all thefire hoses of privacy, data

(29:42):
protection, ethical tech, andnow I've added AI to that as
well.
But, that's only recently andit was definitely a lot more
noise than anything else, likeit trumped everything, all of
the news in the space that I amfollowing.
I'm looking at the market leveland go-to-market messaging and
it really did feel like a playto dominate so quickly, get

(30:05):
market share but also get it tothe point where regulators just
can't wrap their arms around itin time.
But, I don't know, maybe that'sjust my jaded belief from my
experience.

Patricia Thaine (30:15):
To a certain extent.
But GPT-2, GPT-3 have been outfor a while, and I think chat
GPT was actually a bit of afluke and a bit unexpected of
how well it did in the publiceye.
I've experienced this as wellin that building developer
software, you think that, ofcourse, the documentation is
super important, how the APIworks is super important, but a

(30:39):
good user interface actuallychanges people's perspective of
it so quickly and so massivelyfrom "I don't get this to.
I get this that I don't thinknecessarily that the developers
building chat GPT knew what theywere getting themselves into

(31:00):
when they made it public.

Debra J Farber (31:03):
How fascinating is that, given that Sam Altman
was the head of Y Combinator,right?
He helps how many companies inthe past come to market and to
not know that you need to putall the areas of ethics,
security, privacy, safety, right, that those are kind of
essential to have.

Patricia Thaine (31:21):
I don't know if they knew what they were
getting themselves into in termsof the massive amounts of
public attention and theflooding of the market with
regards to privacy and ethicsand all that.
I'm not sure how much thoughtnecessarily was being put into
that, specifically when it comesto chat GPT.
But, I do have a friend whoworks on ethical applications

(31:43):
within OpenAI, and she oftentells me how she feels like her
voice is heard, how theyconsider the ethical
implications that she brings upquite seriously and they act on
it.
But, I think in some cases theproblem is so massive and so
much fundamental research hasnot been done and needs years in

(32:07):
order to complete that you haveto make a decision of whether
or not to commercialize orwhether or not to continue to
pause commercialization afteryou spent this much money
creating a system in order toget to these fundamental
questions that may or may nothave answers in the next five to
10 years.

Debra J Farber (32:27):
Yeah, that is definitely tension.
I think it also justunderscores the point that you
really have to embed privacy andsecurity and safety by design
from the beginning in yourarchitecture or the downstream
wicked problems are going to bemaybe too hard to surmount.
We'll see, We'll find out.

Patricia Thaine (32:47):
Yeah, 100%.
If you embed privacy in thevery beginning, it'll save you a
lot of headaches.

Debra J Farber (32:53):
Absolutely.
Well.
What's the best way that peoplecan reach out to you to
collaborate or request a demo?

Patricia Thaine (32:59):
You can try a demo out on our website at demo.
private-ai.
com.
We also have chat.
private-ai.
com, which is our Private GPT,which provides a privacy layer
before you're sending any dataout to your large language model
of choice and happy to connectwith people on LinkedIn.
So, if you listen to thispodcast, please let me know in

(33:22):
the message on LinkedIn thatthis is why you're connecting,
because otherwise I will notknow to connect with you because
I don't know you.

Debra J Farber (33:30):
That makes a lot of sense and I will put all of
the links and the article youreferenced about AI regulation
and all that in the Show Notesso that everyone could access
them easily.
Do you have any words of wisdomyou want to leave the audience
with before we close?

Patricia Thaine (33:45):
Thinking about the fundamentals gets you a good
way on your way to ensuringprivacy and security.
There's this Microsoft reportthat came out recently that was
talking about how, incybersecurity, if you think
about fundamentals like doingtwo-factor authentication, you

(34:07):
are already 95% of the way thereand can stop so many attacks.
I'd say it's very similar withprivacy.
Think about the fundamentalsand only really keep the data
that you need.
That's going to save you awhole lot of headache down the
line.

Debra J Farber (34:22):
Well, thank you so much, Patricia, for joining
us today on The Shifting PrivacyLeft Podcast.
Until next Tuesday, everyone,when we'll be back with engaging
content and another great guest.
Thanks for joining us this weekon Shifting Privacy Left.
Make sure to visit our website,shiftingprivacyleft.
com, where you can subscribe toupdates so you'll never miss a

(34:45):
show.
While you're at it, if youfound this episode valuable, go
ahead and share it with a friend.
And, if you're an engineer whocares passionately about privacy

, check out Privado (34:54):
the developer-friendly privacy
platform and sponsor of thisshow.
To learn more, go to provado.
ai.
Be sure to tune in next Tuesdayfor a new episode.
Bye for now.

Popular Podcasts

Stuff You Should Know

Stuff You Should Know

If you've ever wanted to know about champagne, satanism, the Stonewall Uprising, chaos theory, LSD, El Nino, true crime and Rosa Parks, then look no further. Josh and Chuck have you covered.

Dateline NBC

Dateline NBC

Current and classic episodes, featuring compelling true-crime mysteries, powerful documentaries and in-depth investigations. Follow now to get the latest episodes of Dateline NBC completely free, or subscribe to Dateline Premium for ad-free listening and exclusive bonus content: DatelinePremium.com

24/7 News: The Latest

24/7 News: The Latest

The latest news in 4 minutes updated every hour, every day.