Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Speaker 1 (00:15):
Pushkin. Earlier this year, an employee working in Hong Kong
for an international company got a weird message from one
of his colleagues. He was supposed to make a secret
transfer of millions of dollars. It seems sketchy. It obviously
seems sketchy, so he got on a video call with
(00:36):
a bunch of people, including the company's CFO, the chief
financial officer. The CFO said the request was legit, so
the employee did what he was told. He transferred roughly
twenty five million dollars to several bank accounts. As it
turned out, the CFO on the video call was not
really the CFO. It was a deep fake, an AI
(00:58):
generated twin created from publicly available audio and video of
the real CFO. By the time the company figured out
what was going on, it was too late, the money
was gone. I'm Jacob Goldstein and this is What's Your Problem,
the show where I talk to people who are trying
(01:20):
to make technological progress. My guest today is Ali Shahieri.
He's the co founder and chief technology officer at the
audaciously named Reality Defender. Ali's problem is this, how can
you use AI to protect the world from AI? More specifically,
how do you build a set of models to spot
(01:41):
the difference between reality and AI generated deep fakes. How'd
you get into the defending reality business?
Speaker 2 (01:51):
Yeah, so when I started, it was around actually generating
videos and deep fikes.
Speaker 1 (02:00):
So you were attacking reality before you were defending it.
Speaker 2 (02:04):
I wouldn't said we were attacking anything, but we were
definitely into looking into this technology. And it is way
back before all this stuff kind of went crazy. This
is back in like twenty nineteen around that time, So
we were building digital twins and we're looking at how
do you make it so that it looks realistic? Is
it a cartoon looking thing? Is it like a unity
(02:27):
three D thing? And then that's when we started to
see like these early research papers where they were taking
like someone's face and putting it on a video and
blending it in and it looked really good, and we
were like, oh, maybe we can do the digital twins
that way. And while we were like in that business,
we were like, you know, probably in a few years
(02:49):
someone can download an app and just make anything very easily.
And that's kind of the origins of how how we started.
We're very mission driven. What we're trying to do here
is really protect the world and people from the dangers
of AI, but in a way where you know, we
want people not to abuse the technology. We're very we
(03:12):
love AI, we just don't want it to be abused.
Speaker 1 (03:16):
So let's talk about this sort of deep fake detection
kind of you know, jen AI detection market more generally,
like who's like, who's selling deep fake detection right now,
and who's buying what's the what's the sort of market
landscape look like.
Speaker 2 (03:34):
The type of clients that we have right now are banks.
For example, we are currently live with one of the
largest banks in the world. When you call that bank,
the audio goes through our defake detection models and we're
able to tell the call center this person might be
a deep fake. And part of that is that's actually happened.
(03:57):
Someone's called the bank and they've transferred money out and
actually this this goes back to twenty nineteen, so the
first incident of defake fraud actually and back in.
Speaker 1 (04:09):
Two in nineteen that we're aware of. Right, you're right exactly,
So what happened in twenty nineteen.
Speaker 2 (04:16):
Yeah, so this is back where this is early and
nobody really knew about this, and there was a CEO
that called a smaller company that THEO was a parent
company calling the child company. The CEO calling the other
CEO and he wanted to transfer some money out and
it sounded like him and the guy transferred I think
(04:36):
it was in UK, about two hundred and three hundred
thousand dollars out And that was like the first one
of the first ones that we.
Speaker 1 (04:40):
Know of, and they got away with it, I believe.
Speaker 2 (04:44):
So. Yeah.
Speaker 1 (04:44):
And there was an instance earlier this year right where
I think it was in Hong Kong and some employee
was on a zoom call with the company's CFO and
the CFO was like, you know, why are twenty five
million dollars or something to some bank account? And then
the employee did it and it turned out the CFO
on the call was a deep fake, right.
Speaker 2 (05:01):
Yeah, So fast were they your client? They were not
in our clients unfortunately. But this shows the how quickly
the technology is evolving. You know, twenty nineteen audio fast
forward a few years now, You've got a zoom call.
There'd a bunch of people on it and they all
look like people, you know, I know, they're all de fis.
Speaker 1 (05:21):
So you were starting to mention. Banks are some of
your main clients. Who are some of your other main.
Speaker 2 (05:25):
Clients, media companies, I think think of some of the
big ones, there is our product this year, especially with
the election. You know, back twenty twenty, we thought it
would be a problem. It wasn't. This year we think
is a big problem. For sure. I think we were early,
but it's already this is happening everywhere even this year.
(05:45):
This year is the largest election year in the world.
More than fifty percent of the people are voting, and
we already have documented cases of election issues with the fix.
Speaker 1 (05:55):
Okay, media companies, banks, any other kind of big categories
of clients.
Speaker 2 (06:00):
Yeah, so other ones are government agencies. But in the end,
everyone we think, we believe everyone needs this Product's not
It shouldn't be up to the people to decide or
figure out if something's a deepic. If you're on the
social media platform, you shouldn't have to figure out, hey,
is this person real or not. It should just be
(06:21):
built in and anyone should be able to use it.
Speaker 1 (06:24):
Well. Our social media companies either buying or building deep
fake detection tools or do they want to like stay
out of that business and be like no, we don't
want to be in the business of saying yes, this
is real, no, this isn't real.
Speaker 2 (06:37):
I can tell you we've been in contact and have
talked to some social media platforms. I think one issue
is they don't have to flag these things. It's up
to them, right, there's not a lot of regulation, so
I know they're thinking about it. We've chatted with some,
but that's the extent of it.
Speaker 1 (06:58):
So okay, So let's talk about how it works. And
there's two ways that I want to talk about how
it works. So one is from the point of view
of the user, whoever that may be, and then the
other is sort of what's going on under the hood. Right,
So let's start with the point of view of the user.
If I'm a whatever, a bank, university, a media company
who is paying for your service, how does it work
for me?
Speaker 2 (07:19):
Depends on exactly the user and the use case. If
let's say it's a media company, Uh, they're looking at
maybe filtering through a lot of content, so content moderation.
Actually that would be like a social media company. They're
looking at content moderation. Maybe they want they're looking at
millions of assets and they want to quickly flag those
(07:39):
things if they were in that business. Uh, the bank
there For the example I gave the issue, someone could
call and biometrics fail. By the way, if you call
a bank, some banks say repeat after me, your my
voice is my passport? That actually fails. Now what do
you think? So a bank wants to make sure the
person calling in is actually that person. This is more
(08:00):
relevant to more to private banking, where there's actually a
one on one relationship between the client and the bank.
Speaker 1 (08:07):
And so in that case, So let's take that case.
So in that case, someone calls in and talks to
their banker. They're a rich person who has a private banker.
Basically it's what you're talking about, right, So this rich
person calls in and talks to their private banker, and
it is the system just always running in the background
in that case, And like, how does it work from
(08:28):
the point of view of the of the private banker.
Speaker 2 (08:30):
Sure, and I have to be careful what I say here,
But the high level is the models are listening and
if they detect a potential deep fake, they will the
call center. That person will get a notification so is
integrated into their existing workflow. They'll get a notification that says, hey, this.
Speaker 1 (08:48):
Person get like a text or a slack or something
they're using. You're talking to a deep fake.
Speaker 2 (08:54):
No, they're using software for the bank they're using they're
still using a software and there's a dashboard. In that scenario,
they do they ascalate, so they might say, let me
ask you some more questions or let me call you back.
Speaker 1 (09:05):
Huh. Let me call you back is a super safe one,
right because if they have a relationship, probably they know
the number. They just call them back. Yeah, absolutely, okay,
And then how does it work? How does it work
for like when you say, like I presume by the
way that you can't name your clients. You said a
media company and a bank. It's it's secret that they're.
Speaker 2 (09:24):
Yeah, we're not allowed to okay.
Speaker 1 (09:26):
So let's say a media company. How's it work for
a media company?
Speaker 2 (09:29):
Their their use case is slightly different, especially right now,
as I mentioned around the election, So there there might
be something that that's starting to go viral in the
news and they want to check, hey, is this a
real or not? I would like to say like something
like this is usually when something goes viral, the damage
is already ton.
Speaker 1 (09:48):
Yes, although if you're if you're whatever, the New York
Times of the Wall Street Journal. You don't want to
repeat the viral lie. Part of your business model is
people are paying to subscribe to you because you are
more reliable.
Speaker 2 (10:00):
Right exactly. So that's why they come to us. They
upload the assets and are our web app returns the
results I see.
Speaker 1 (10:06):
So it's just like you just go to whatever Real
Defender dot whatever and you upload the viral video and
your machine says it's a fake.
Speaker 2 (10:16):
Yeah, So we give results and probabilities that we don't
have the ground truth, so we give a probability. There's
several different models running, so we use an ensemble of models.
We have different models looking at different things, and we
give an overall score averaging those. In the case of
a video, we actually highlight the areas of a defake.
If the person is speaking and they're a fake, there'll
(10:38):
be a red box around them. If there is a
real they'll be a green box around it.
Speaker 1 (10:41):
And well, that latter part sounds more binary as opposed
to probabilistic.
Speaker 2 (10:47):
We give both. So yeah, there's there was a probably
score and there's just the visual.
Speaker 1 (10:52):
And so the probabilistic score is basically according to our model,
there's a seventy percent chance that this is fake something
of that nature.
Speaker 2 (10:59):
According to our ensemble of models.
Speaker 1 (11:01):
Yes, yeah, our model of models, our fund of funds
of models exactly. So so okay, so you're actually looking
us toward what's under the hood, right, I'm interested in
discussing this on a few levels. Right, there is the
sort of broad beyond reality defender. You know, what are
the basic ways that the technology works, Like how does
(11:23):
deepfake detection gen AI detection work? In a broad way?
Like can you talk me through that?
Speaker 2 (11:27):
Absolutely? Yeah. There's currently two ways people are looking at
this problem. Number one is prominence. For example, you water
mark a media that you create, maybe you water market
or you digitally sign it, maybe you put on a
blockchain somewhere or something like that. But basically there's a
source of true that this video is real. Yeah, and
there's a water mark. That's number one.
Speaker 1 (11:50):
But we're concerned. We're concerned with instances where that is
not the case. Right. Our world is full of videos
today that are not clearly watermarked, blockchain whatever for prominence.
So we have this problem. What are the ways people
are solving it?
Speaker 2 (12:03):
Yeah? The second way is how we're solving it, which
is basically we use AI to detect AI, so we
which we call inference. So we train AI models, as
I mentioned, a bunch of them to look at various
various aspects of plus say video.
Speaker 1 (12:20):
So like, is it a sort of generative adversarial network
the right term? I mean, it seems like you It
seems like if I were making up how to do this,
I'd be like, well, I'm gonna have one model that's
like cranking out really good deep fikes, but I'll know
which ones are the deep fis, and then I'm gonna
feed the deep fis and the real ones to my
other model, and I'll score it on how well it does,
(12:41):
and it'll get really good at figuring out the difference.
Speaker 2 (12:43):
Yeah, that's actually exactly how a lot of these work.
For if you go to there's a website you can
go where it just generates a person every time you
go to it a right, and that's actually using again
to generate that person. So the way we detect and
I can I can give a little bit more detail here.
So for example, one of our models which we actually removed,
was looking at blood flow. So yeah, so imagine actually
(13:07):
in this video lighting and conditions are right, we can
actually detect the heartbeat and the blood flow and the
veins the way we're looking at each other.
Speaker 1 (13:16):
As I'm looking at my weirdly today, maybe because it's
hot or because the light hair, I can actually see
a vein bulging on my forehead. So, like you're saying,
an AI could like measure my pulse from that or something.
Speaker 2 (13:27):
In the right conditions. Now, that model has a lot
of limitations, and you need to have the right It's
basically it has a lot of bias. Right, So we
tossed that.
Speaker 1 (13:40):
Wait, you're saying it didn't work. You're saying it didn't work.
Speaker 2 (13:42):
It worked in the right conditions and the right skin tone,
so yeah, so otherwise it was biased. So we this
was experimental and we tossed it.
Speaker 1 (13:52):
A lot of things. It didn't work. So you tried
it and in a broad way it didn't work. It
worked in narrow conditions, but you need things that work
more broadly. What's another thing you tried that didn't work?
Speaker 2 (14:02):
Well, I can tell you every month we may be
throwing away models.
Speaker 1 (14:06):
Well, presumably there's things that work for a while and
then they don't. Right, It's kind of like antibiotics versus bacteria, right,
like your adversaries are getting better every day.
Speaker 2 (14:16):
Basically, what we use, what we like to use is
we like to say we're like an anti virus company.
So every time every month there's a new genera of technique,
maybe we should go detective. But maybe it's something we
don't anticipate and we don't detect, and so we have
to make sure we quickly update our models. So and
then a model that worked last year, it's completely irrelevant now.
Speaker 1 (14:37):
So what else, like, what else is happening technologically on
the reality defense side, on the detection side.
Speaker 2 (14:43):
Okay, so the way, we have a few different products.
One is, as I mentioned, real time audio like scanning
and listening for telephone calls. The other one is a
place where a journalist or any user can go and
upload not just videos, but we also detect images. We
also detect audio, We also detect texts like chat GPT,
(15:03):
and these tools also explain to a user why something
is a deep fake. We don't just give a score.
Or for an image, we might put a heat map
and see these are the areas that set the model off.
For text, we might highlight areas and see these other
areas that appear to be generated.
Speaker 1 (15:20):
There's a case study you have about a university that
is a client of yours that, among other things, uses
uses your service to tell when students are turning in
papers written by chat GIPT. Basically as I read it, right, like,
I just assume that, like everybody writes papers with chat
GPT now and there's nothing anybody can do about it.
(15:41):
But is that not true? Like if I like have
GPT write my paper and then I like change a
few words, does that sort of help get let me
sail past your defense?
Speaker 2 (15:52):
It depends depends how much you change, Yeah, or if
you change like over fifty percent, maybe maybe would So
it depends.
Speaker 1 (16:00):
Over fifty percent is more than a few words. And
so can you talk? I mean, I know you can't
name the university, but in practice you know how they
use it. So you know, somefess runs the student's papers
through your software and it says of when student there's
a whatever sixty percent chance that this was created using
a large language model. I mean, do you know in practice?
(16:22):
Obviously the professor could do whatever they want or the
university could have whatever policy, but do you know in practice,
what do they do with this information like that's that's
in a way a harder one to figure out than
the like banker who's like, oh, it might be a
deep fake on the phone. I'll call you right back
for security. Like if my I don't have a banker,
but if I had a banker and they did that,
I'd be like, oh, that's cool. I'm glad my bank
(16:43):
is doing this thing. Whereas with like the professor and
the student, that's a much more sort of fraud situation, right,
and harder to think of how to deal with again
the probabilistic nature of the output of the model.
Speaker 2 (16:59):
Yes, I think a couple more things here. First of all,
I think even universities are trying to figure out this problem.
How to you solve it? You know. But the second
thing to note, most of our users are not interested
in a text detector. That seems to be a much
smaller market. The biggest one is actually audio. It's becoming
(17:20):
imagine you get a call from a loved one and
send me money, and you send money if you realize
is not who it was a defate, right, That's actually
a much widely used system.
Speaker 1 (17:31):
That's the big one in terms of the business it's interesting.
I mean, I wonder if that's partly like relative we
think about the video more, but is it partly because
deep fake audio is now quite good and there are
lots of instances where people will transfer lots of money
based solely on audio.
Speaker 2 (17:47):
De fake audio is the best and it's getting better,
right interested. I used to go to make your voice,
maybe I need a minute. Now I need just a
few seconds and I can make your voice. It's getting
exponentially better. All of them are, but audio is definitely
top of the list right now.
Speaker 1 (18:01):
Huh And how are you keeping up?
Speaker 2 (18:06):
Yeah? I mean, so when we detect audio, it's tricky.
There's a lot of factors to think about a person's accent,
right what is model biased? Does it not understand or
is there an issue where it detects It detects one
person with a certain type accent always as a d thing.
There's also issues of like noise when when when there's
a lot of background noise, the model could be impacted.
(18:28):
When there's cosstop, multiple people speaking at the same time,
that could impact the model. So there's a variety of factors.
And the other thing you think about is our models
are more they support multiple languages, so we don't just
do English, and so all of these kind of make
it very complicated. So when we detect something it's called
pre processing, there's a whole bunch of steps to the
(18:50):
audio before it actually goes to our AI models where
we have to clean up the audio, do certain types
of transformations before we push it to the models.
Speaker 1 (18:58):
And is that happening in real time with these companies?
Huh huh? And and are you like, what is the
frontier of preprocessing? Like is it is it an efficiency
and speed problem because you're trying to do it in
real time and so you're just trying to kind of
make the sort of algorithmic part of it as fast
and efficient as possible.
Speaker 2 (19:19):
Yeah, I mean this is a challenge. There's a lot
to be done. So that's an ongoing research. How do
we continue to speed up not just a preprocessing, but
the inference. And there's a variety of one thing that's
called a foundation model. I'm not sure if you heard
what those are, but these are extremely large pre train
model GPT is a foundation model is a pre train model.
(19:40):
And so these models can be useful in some parts
of the preprocessing where they can quickly extract certain features
for us, and then we can use those two down
the pipeline.
Speaker 1 (19:54):
Still to come on the show. The problems that Ali
is trying to solve. Now, how good are you at
detecting de fikes? Can you quantify how good you are?
Speaker 2 (20:13):
So the way they usually do this is they look
at benchmarks. Right, there's public data sets which we can
take and run and we're in the nineties and then
but you know that's not the real world.
Speaker 1 (20:25):
When you say you're in the nineties, you mean you
in a binary sense, you guess correctly ninety percent of
the time.
Speaker 2 (20:35):
Yeah, So on a public benchmark, we're in the nineties.
There's accuracy, precision and recall. Accuracy is how accurate are
we Let's say there is one hundred sample set is
one hundred, maybe fifty is fake, fifty is real? Right.
The accuracy is you take, okay, how many of those
did you get right? How many of the real I'm fake? Divided? Right,
(20:55):
that's the that's the accuracy. The problem with that is
like unbalanced data set, maybe maybe only two is fake
and then the other ninety eight are real. So in
that case, the accuracy. See if we had said that Okay,
everything is real, we would be ninety eight percent. Right,
that's not very useful because you missed the defix. So
(21:17):
that's why precision and recall coming. They look specifically at
how did you do on that specific like the fakes
or the reals, So there's more than just accuracy. There's
also other factors to look at.
Speaker 1 (21:30):
So there's it's kind of like the sort of false
positive false negative challenge with medical tests, right you want
to test that both says you have the thing, says
you have the disease when you have the disease, and
also says you don't have the disease when you don't
have the disease, And that actually ends up being a
really complicated problem given the nature of baselines, right like
(21:54):
in your universe, certainly in the universe of people calling
their banker. Almost everybody calling their banker is a real person, right,
but there are these very high stakes, presumably very rare
cases where it is a defake, and so that's like
a complicated problem.
Speaker 2 (22:10):
It actually is, It absolutely is, and it's something as
we work with each customer, we have to tweak those.
Someonet higher false positives, someone higher false negatives. It depends
on each use case, in the case of a bank,
they want to be a bit more cautious. But that
also causes a lot of It could cause a lot
of pain depending on the volume.
Speaker 1 (22:29):
Right, because if every client it's like, oh sorry, I
got to call you back to make sure you're not
a deep fake, Like that's not great.
Speaker 2 (22:36):
Yeah, And if you have thousands of calls a day
and even one percent is a false positive or negative,
that that creates a lot of work, Yeah, because it
adds up.
Speaker 1 (22:46):
How do you solve that? What do you do about that?
Speaker 2 (22:49):
So the way it works is all about adjusting. You
can think of thresholds, right, We can adjust variety of
parameters as the output for a model, not just the
model itself, but the for example, in an audio as
we speak, you know, we could look at okay, how
(23:11):
long do you want to listen before you give an answer?
You know, maybe maybe? And the longer you listen, the
more the more confident, the more that's smart.
Speaker 1 (23:21):
That makes sense, right, because it's essentially more data for
the model exactly. Yeah, what are you trying to figure
out now? Like what is the frontier?
Speaker 2 (23:32):
What's really the latest now? And it's just amazing how quickly.
It's going as videos. So the videos that we detect
are like a face swap, Like you're sitting there speaking
and another person's face is on there. That's a face swap.
But now you can generate an entire video completely from scratch,
and you just type in the description and the video
(23:52):
comes out. You can take some you can I can
take your voice, a few seconds of your voice. I
can then have you say anything I want, which you
can clearly see. The bad, bad person can misuse these tools.
So the latest is these things are getting really good and.
Speaker 1 (24:06):
Over time, like with those videos, is your how is
your reliability and accuracy changing? You're getting better or worse
or staying the same as the technology to create the
deep fix improves.
Speaker 2 (24:17):
So what's interesting is it has slowed down in terms
of like the signatures, Like we don't need as much
data as we used to. So of course there's still
a lot of work and we're never going to stop,
but it is stabilizing a little bit.
Speaker 1 (24:32):
When you say it, what is stabilizing a little bit,
So like the.
Speaker 2 (24:37):
Defied signatures are stabilizing the way.
Speaker 1 (24:40):
The signatures, meaning the giveaways, the things that I can't see,
but that your models can see that AI exactly.
Speaker 2 (24:47):
So our models going back and give them more detail.
They're looking at different attributes of a piece of media,
and they pull out those attributes and then they send
those to our and house neural networks that steady those attributes.
Speaker 1 (25:01):
Like one that you have mentioned, that the company has
mentioned publicly is the the sync of audio and video. Right, Yes,
maybe that's one where it's gotten better and it doesn't
matter anymore, but like it, from what I understand, from
what I've read, there was at least a time when
the sink of the audio and video tended to be
off in deep fake videos. Right? Is that an example
(25:25):
of a signature.
Speaker 2 (25:27):
So the way that works is we train the model.
We say, hey, here's a bunch of people speaking, here's
what they look like. Look at the sink. Here's a
bunch of people like that or defikes, and look at
the sink, and we tune the model so we can
tell the difference. That's also happening to a video. By
the way, if you look at Sora and some of
these new models where someone's are walking, for example, their
(25:49):
legs are not like you know, they're not really smooth,
or they don't look right, So you can look at
that as well. That's the temporal dynamics we call that.
Speaker 1 (25:59):
Uh Like temporal dynamics is basically are things proceeding in
time in a natural.
Speaker 2 (26:04):
Way exactly how things change over time.
Speaker 1 (26:09):
So yeah, all of these seem like things that you
can just that are going to be fleeting, right. Like
my baseline assumption is it'll all get solved. Do you
how long do you think you'll be able to defend
reality for?
Speaker 2 (26:24):
You know, this question comes up all the time where
there is always a giveaway or there is always a
new way to look at the problem. We're not just
looking always at the raw pixels, right, We could look
at different aspects. We could look at the frequency. For example,
if you look at an image, you can actually break
it down into frequencies.
Speaker 1 (26:41):
When you say frequency, what do you mean when you
say you can look at the frequency? What does that mean?
Speaker 2 (26:46):
So? For example, okay, so let's go with audio. You
know you can use some of call four yer transformers
to actually break up an audio into individual wavelengths science
and co science that does a look. You can do
the same with for an image, for example, you can
break that.
Speaker 1 (26:59):
Up like like the analogy of a wave form of audio.
Speaker 2 (27:03):
Yeah, it can. It can be translated into a bunch
of waves. So so there's multiples that we look at.
There's and the AI there's always a giveaway, uh and
and and again we're also thinking outside the box, right,
like the blood flow for example. Right, But there's other
kind of similar things we could think about.
Speaker 1 (27:22):
I mean, presumably you know there, you know, renaissance Renaissance Capital.
The James Simons is one of the first quant hedge funds,
and they made tons of money for a long time.
They wildly outperformed the market. Clearly they had a technological advantage.
And the thing Simon said, the founder of this math
(27:44):
guy about about that company. One of the things he
said was like, we actually don't want to hire like
finance people who have some story about why a stock
is going to outperform, because if there's a story about
it that then then somebody else is going to know
it already. Right. Their thing was just like, we just
give the model all the data and let the model
(28:06):
find these weird ass patterns that no human even understands.
But they work more often than they don't work, and
we make tons of money. And I would think that
would be the case for you to some extent that
if you could think of a thing like monitoring blood
flow or whatever, then the bad guys or whatever, the
people who want to make realistic Jenny I would also
(28:29):
think of it. And the real kind of secret sauce
would be in weird correlations that the model finds that
we wouldn't even understand.
Speaker 2 (28:40):
Exactly. I mean, that is oftentimes what the model is
training on, and the way it determines of something that
you think looking at certain features, it is something that
we don't even tell it, right, Yeah, it determines on
its own.
Speaker 1 (28:56):
Like that's the beauty of this kind of new era
of whatever, neural networks, machine learning. Right, it's just you
throw everything at it and let the machine figure it out.
Speaker 2 (29:07):
We like to say we throw the kitchen sink at sometimes.
Speaker 1 (29:09):
Yes, yes, I mean, And so when you were talking
before about explainability, right about sort of saying in your output,
here's why we think it's fake. I feel like that
kind of throw everything at it and let the machine
figure it out makes it hard to like sometimes you
don't know, right, it's just like, well, the machine is
very smart in it says this is probably fake, Like yes,
(29:31):
that is that intention that can happen.
Speaker 2 (29:33):
So you'll look at it. We'll show you an image
and it'll say the model was looking at certain areas.
And by the way, this also helps us with debugging
him bias. Right, maybe it was like for some reason
looking at an area of the face that we wouldn't tell.
Why would that set off the model? And so in
those scenario as we also investigate like why was this
(29:54):
area flag? And it could be one hundred percent correct,
it's just we do we do have to examine it further.
Speaker 1 (30:02):
Could you create a deep fake that would fool your
deep fake detector? Yes, haha, Well if you could do it,
somebody else could do it. Don't you think I could
do it?
Speaker 2 (30:13):
Because I have access to a lot more knowledge, right, Like,
you know I could if I was running an anti
virus company, I could probably write a virus if I
knew exactly what we're constantly actually trying to do that.
Speaker 1 (30:27):
By the way, yeah, I mean in a sense, that's
the whole adversarial network thing, right, Like I guess you
have to do that for your detection models or your
suite of models to get better, right.
Speaker 2 (30:39):
Yeah, So we have what's called red teeming both black
box and understanding of the codes. So we're trying to
break the models. That's part of the what we do.
Speaker 1 (30:47):
Uh huh. And so are there like evil geniuses at
your company who can make killer deep fakes?
Speaker 2 (30:53):
We definitely have geniuses one hundred percent, but we're in
the business of detection, right, we don't. We don't try
to generate too much other than just for training the models.
Speaker 1 (31:03):
I mean I have to think like, there are many
people in the world world who want to make a
deep fakes for many reasons, and they're at different levels
of technological sophistication. Naively not knowing much about this, I
would think you can catch most of them. But if
(31:25):
you have people who can beat your models, I would
imagine that, say, state actors, countries throwing billions of dollars
at this probably also have people who could defeat your models.
Speaker 2 (31:36):
Yeah, I mean that's always a case with any cybersecurity company.
We are a cybersecurity company. Every cyber security company does
its best to defend right, but we did not promise
one hundred percent. Our models are always a probability.
Speaker 1 (31:54):
Who's who's the best at making deep fikes that you're
aware of.
Speaker 2 (31:58):
There's a few, right, there's like Sora from OpenAI. There's Runway,
there's Synthesia, there's you.
Speaker 1 (32:03):
Better be able to catch right, anything I've heard of.
You better be really good at the technic. Presumably it's
like some like you know, Russian Genius Squad or I
don't know, the North Koreans are some things. I would
imagine it is some state funded actor, but.
Speaker 2 (32:17):
I would I would actually say, you know, we're in
a place where this is a problem is getting bigger.
But we're in a place where a lot of the
defects coming out are actually fron entertainment and they're not
like Youth for Evil. You know, you've seen the famous
Tom kruzwe or or other actors running around and do things,
and those are defakes, right, those are actually pretty good.
(32:38):
We detect them, but they're actually very good.
Speaker 1 (32:40):
What are you thinking about in the context of the
of the election in the US this year and do
you have particular clients who are especially focused on election
related deep fakes.
Speaker 2 (32:51):
Yeah, the media companies are the main ones, and we're ready.
We detect the best, the best defakes, right, everything that's
coming out we detect, So we're ready and we want
to make sure we're there as one avenue of people
verifying content. I believe late last year there was an
(33:13):
election in Slovenia where there was an audio of one
of the candidates saying he's gonna double the price of beer. Yeah,
and that actually was a defake. It was caught, but
it kind of costed some damage. So it's starting to
happen now.
Speaker 1 (33:29):
It's an awesomely stupid deep fake. I mean, to me,
the real risk of deep fakes is not people believing
something that's false. It's people ceasing to believe anything, right.
It's just saying, oh, that's probably just a deep fake,
right like that. Actually, to me seems like the bigger
risk is nothing is true anymore. Nobody cares about the
(33:49):
truth anymore.
Speaker 2 (33:50):
That's definitely a problem as well. Now we're seeing people saying, oh,
this is a defake. That's actually happened. There's a few.
I believe it was a Cape Milton video if I'm correct,
that was earlier this year, or everyone thought that was
a defake and it wasn't. So this kind of problem
is happening.
Speaker 1 (34:08):
Like that's because people people want to believe things that
are consistent with their prior beliefs, and they don't want
to believe things that call their prior beliefs into question, right,
and so deep fakes in a way are an easy
out where if you see something you like, you assume
it's true. If you see something you don't like, you
assume it's not true, or you assume everything's just kind
of bullshit like that to me seems like a big
(34:29):
quind of societal level risk of deep fakes.
Speaker 2 (34:32):
We'll never fix that. That's something that will never solve. Yeah,
people have their own beliefs. You can show them anything,
the facts, math, that's not going to fix it all. Yeah.
Speaker 1 (34:44):
No, I guess that's a human nature problem, if not
an AI problem. We'll be back in a minute with
the lighting round. Okay, let's close with a lightning round.
Speaker 2 (35:09):
Okay.
Speaker 1 (35:10):
How often do people applying to work at Reality Defender
use generative AI to write cover letters?
Speaker 2 (35:16):
Oh, that's a good one. Not a lot of, but
we've seen it for sure. I would say maybe about
three percent.
Speaker 1 (35:22):
Okay. If I want to use generative AI to write
a cover letter to apply to work at Reality Defender,
but I don't want to get caught, what should I do.
Speaker 2 (35:33):
Change about seventy five percent.
Speaker 1 (35:34):
Of the words Okay, who is Gabe Reagan?
Speaker 2 (35:41):
Gabe was? I think it was our VP of public
relations or something like that. Here's a dfake. We we
created him as a as kind of a as kind
of a fun joke. But obviously we tell everyone.
Speaker 1 (35:54):
Tell me, tell me a little bit more about that.
Speaker 2 (35:57):
If you go on certain websites where you put your
photo and maybe your job experience, there's quite a large
number of deficke profiles on these websites like LinkedIn.
Speaker 1 (36:11):
Yes, huh, why why people be doing that?
Speaker 2 (36:17):
Sorry scammers?
Speaker 1 (36:18):
I'm trying to think, how do you get money out
of people by having a fake LinkedIn account?
Speaker 2 (36:22):
Oh? I can tell you. Let's say you start the
most popular ones that I'm aware of, is like cryptocurrency.
Maybe you create a coin and you're like, here's a
CEO and here's this person and they have these great
LinkedIn profiles. Here's their photo and they're not real, but
it sells a story. Right.
Speaker 1 (36:41):
Is it right that you founded a clothing company?
Speaker 2 (36:46):
I did?
Speaker 1 (36:46):
Yes, what's one thing you learned about fashion from doing that?
Speaker 2 (36:50):
It's much different than software development.
Speaker 1 (36:54):
Sure, I don't think you needed to start a company
to learn that. I mean, the marginal cost is not
zero for one thing.
Speaker 2 (37:02):
Yeah, the software is easy, you write some It's not
easy at all. But what I mean is you're writing
some code and you ship it. Versus in fashion, you
have to have like you got to source the fabric.
You gotta you gotta design it, you gotta make the patterns,
you gotta cut it, sew it, make sure it fits.
It's a lot more work.
Speaker 1 (37:22):
What are the chances that we exist in a simulation?
Speaker 2 (37:25):
You know, I used to think this is kind of
a joke, but I don't know. I'm seeing every every
month it seems to get higher. From my perspective.
Speaker 1 (37:35):
Why do you say that.
Speaker 2 (37:37):
I'm seeing what's happening with tech and what we're building,
and there's you can see there's there was one paper
where they took a bunch of agents and they gave
them all a job and they start to do it
and they just started to like create their own kind
of like work cloths. Right, I don't know, it shuld
be getting there.
Speaker 1 (37:53):
So so it's like, well, if we can create a
simulation that seems like reality, maybe someone created a simulation
that is our reality exactly. Yeah, what do you wish
more people understood about AI.
Speaker 2 (38:07):
I mean, it's a tool, and I don't think people
should be afraid of it. They should embrace it. And
you know there's people are just running away from it.
It's fantastic, it's great. Embrace it. Just be careful. One
thing I'd like to tell, like my friends and family,
especially with the e fake audio, have a safe word.
As somebody calls you and you're like that's weird, you know,
(38:29):
call him back or ask for a safe word.
Speaker 1 (38:31):
What do you wish more people understood about reality reality?
Speaker 2 (38:39):
I would say, just be aware that you exist. And
every day's a gift, So you should be excited that
you hear. Like the chances of you existing it's like
you've won the lottery a million times, So every day's
a gift.
Speaker 1 (38:56):
Ali Shakiyari is the co founder and CTO at Reality Defender.
Today's show was produced by Gabriel Hunter Chang. It was
edited by Lyddy Jean Kott and engineered by Sarah Bruguer.
You can email us at problem at Pushkin dot fm.
I'm Jacob Goldstein and we'll be back next week with
another episode of What's Your Problem