Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Andreas Welsch (00:00):
Today, we'll
talk about red teaming and
(00:01):
safeguarding for large languagemodels and who better to talk
about it than somebody who's notonly actively working on that,
but who's actually written thebook on it.
Steve Wilson.
Hey, Steve.
Thank you so much for joining.
Hey, Andreas.
Steve Wilson (00:13):
Thanks for having
me today.
Always happy to be here.
Andreas Welsch (00:16):
That's awesome.
Hey this is the third timeyou're actually on the show.
You're the first guest to be onthree times.
We talked about large languagemodels last year, about two
weeks ago for the launch of mybook, the AI Leadership
Handbook.
You shared a lot of great advicein a short segment on security,
but we said, Hey, we want tospend a little more time on this
(00:36):
today because I also feel thatit needs a lot more attention
now that Large language modelsare becoming more or less
ubiquitous and embedded in somany more applications.
But before I keep stealing yourthunder, maybe for those of you
in the audience who don't knowSteve, maybe Steve, you can
introduce yourself and share abit about who you are and what
(00:56):
you do.
Steve Wilson (00:57):
Great.
I'm the Chief Product Officer atExabeam, which is a cyber
security company thatspecializes in using AI to find
cyber security threats.
Last year, I got involved increating an open source product.
Basically an open source projectcalled the OWASP Top 10 for
(01:18):
large language models.
And OWASP is a large foundationwith a few hundred thousand
members that's dedicated tobuilding secure code.
And we did a bunch of researchinto what are the security
vulnerabilities I'm not reallygood at reading languages, but I
do have a couple of books inlarge language models because
this isn't something that I'vegotten a lot of research, and
(01:38):
that got a lot of attention.
Last year, after that gotrolling, I actually, I got an
email out of the cold from aneditor at O'Reilly, and they
asked if I wanted to write abook.
And I've spent the interveningyear writing this book, so I was
super pleased to be on yourlaunch stream, and today is the
launch of my book, So I'mhijacking your stream to give it
(02:00):
back.
So this is my O'Reilly book.
It's called the DevelopersPlaybook for Large Language
Model Security, Building SecureAI Apps.
And, it's been fun to do thatbecause it lets me get much
deeper into a lot of theseissues than you can do in a top
10 quick hit list.
Andreas Welsch (02:20):
That's awesome.
Again, I'm so glad to have youon the show and folks also check
out Steve's book here on largelanguage model security.
First of all, alsocongratulations, right?
That's a big day for anybodythat's that spent a year writing
something and pouring your heartand soul into it and going
through edits and revisions andgetting some, first excitement.
(02:41):
Now it's the big day.
So excited that we can allcelebrate that together with
you.
Awesome.
Perfect.
Hey folks, if you're justjoining the stream, drop a
comment in the chat where you'rejoining us from.
I'm always curious how globalour audience is.
And also don't forget tosubscribe to my newsletter, the
AI memo and the intelligencebriefing.
So you can stay up to date onhow you can run AI projects
(03:02):
successfully.
And again, this one, AILeadership Handbook, is also
available now on Amazon.
Now, why don't we play a littlegame to kick things off, right?
We've been doing this for awhile on the show.
So this game is called In YourOwn Words.
(03:22):
And when I hit the buzzer, thewheels will start spinning, and
when they stop, you'll see asentence.
And I'd like for you to answerwith the first thing that comes
to mind, and why, in your ownwords.
Okay.
Let's see.
Are you ready?
Steve Wilson (03:37):
Let's do it.
Andreas Welsch (03:38):
If AI were a
fruit, what would it be?
Steve Wilson (03:42):
Oh man, the thing
that just jumps to mind is
strawberry.
Did you load that question?
Andreas Welsch (03:47):
Oh, that's a
good one.
Steve Wilson (03:49):
But for folks who
haven't been following that the
big news the last few months hasbeen OpenAI secretly developing
some new supermodel and it wascode named Strawberry and they
did release an early version ofit, which is actually pretty
fascinating.
And I've been playing with thata lot the last few weeks.
Andreas Welsch (04:09):
That's an
awesome answer.
Didn't expect that one though.
So good, perfect.
So folks we obviously skippedyour answer this time, but if
you have one, please feel freeto put them in the chat as well.
But now, why don't we jump tothe main part of our
conversation, and we alreadyhave a couple questions prepped
(04:30):
to guide our conversation.
you mentioned where we're abouta year after the release of the
OWASP top 10 for large languagemodel apps, and that's how we
first got in contact with eachother.
But I'm curious where are we?
What are you seeing?
Are these threats that youdiscovered and that you
(04:52):
described last year.
Are they still somewhat far awayin the distant future?
Are they already here?
Are all of them here?
What are you seeing?
How real are these things?
Steve Wilson (05:03):
It's interesting.
When we go back more than a yearnow to when we published the
first version of the list, Iwouldn't say these things were
hypothetical.
Many of them there were realexploits that we published and
discussed, but I'd say the theconsequences for the people who
fell victim to them were mostlyembarrassment, sometimes severe
(05:26):
embarrassment, but it was mostlyembarrassment.
In fact Martin Mikos, the CEOover at HackerOne, which is a
really well known securitycompany, put out a list that was
a variant of the top 10 that wasthe top 10 ways to embarrass
yourself with Gen AI.
What's been interesting to watchfirst is the consequences are
clearly much more severe nowwe've gone from a car dealership
(05:51):
in Central, California puttingup a chat bot that gets tricked
into Offering to sell peoplecars for a dollar.
I don't think anybody actuallygot a car for a dollar So they
got embarrassed but noconsequences On the other hand
now, we're to the point where wesee the AI agentry being a added
to really major applicationslike Microsoft Copilot, and
(06:13):
Slack being vulnerable to thethings at the top 10 list, like
indirect prompt injection.
And both of those have come outrecently with being vulnerable
to that, where we see the majorcompanies like Apple putting out
these announcements about howthey're going to make their Gen
AI environment secure.
(06:34):
But often I feel like they'redoubling down on their
traditional security.
Apple's I'm building a superhardened data center to put your
data in, and I'm going to runthe LLM locally on your device.
And both of those are goodthings to do.
There's no downside to that.
But if you've got the LLMlocally on your device, and that
LLM is susceptible to some ofthese vulnerabilities, and you
(06:57):
haven't dealt with that, and ithas access to your data,
Somebody's going to trick itinto doing something bad with
your data.
And we can see that coming.
And we see those early resultsfrom the Microsofts and the
slacks and a lot of otherpeople, it's not limited to
them.
The other thing that we see outthere are some of the
vulnerabilities from the top 10list let's call it changing
(07:18):
rankings.
We actually did.
Some re voting and the expertgroup continues to grow and we
shipped the first version.
It was 400 people.
Now it's 1400 people.
Andreas Welsch (07:29):
Wow.
Steve Wilson (07:29):
So people jump in
and vote and participate.
And we did some re rankingrecently to decide where we
wanted to focus.
And I'd say there are two thingsthat dramatically went up the
list in terms of what people areinterested in.
One of them is what we callsupply chain security.
And this was on the list, but itwas near the bottom.
It was a hypothetical worry thatbad actors could get into your
(07:53):
supply chain for where am Igetting my models and my
training data and my weights andmanipulate that somehow.
and make it bad for me.
Well known general cybersecurity problem, hypothetical
not hypothetical anymore.
When we look at these placeslike Hugging Face, which has
become the de facto source toget so much stuff, and Hugging
(08:17):
Face is an awesome site andcommunity, but there have been a
lot of researchers demonstratingthere are thousands of poisoned
AI models on that site, andwe've seen examples where major
organizations like Intel andMeta have lost control of their
accounts and their keys and sothat's going way up to the top
(08:39):
of the list next time we publishit is understanding those supply
chain concerns.
The other one I don't think willsurprise people, but agents.
We had one vulnerability classfor agents last time, what we
called excessive agency, whichis basically, are you giving the
app more, more rights or moreability to execute autonomously
(09:04):
than it should?
really should have for itscapabilities.
If you look out in the AI sphereright now for the last couple
months, everybody's talkingabout agents.
It's suddenly pivoted to thattime where people want to
unleash a lot more autonomy.
But I think we're still in avery dangerous place where I
don't think that's wellunderstood what the security
(09:26):
implications are going to be ofunleashing that and ultimately
what the safety implicationsare.
Andreas Welsch (09:32):
I was just gonna
say, especially that part, right
on, on one hand you have thevulnerability of the LLM and to
some extent the gullibility ofit doing things that you don't
intend it to do, but somebodyelse is getting it to do or
tricking it into doing.
And then certainly putting it ininto workflows that are a little
(09:53):
more critical.
That are a little more highrisk, that are a little more
customer facing, certainly canquickly go beyond embarrassing.
Steve Wilson (10:03):
And look, people
are not moving in small steps
here.
We see all sorts of exampleswhere people are jumping
straight from sales chat bot tomedical application and
financial trader.
We're getting to the heart ofthe matter now.
Andreas Welsch (10:17):
Now that makes
me curious.
Especially as you're seeing andas we are seeing in the
industry, that shift towardsagents towards more LLM based
applications of all differentsorts and kinds in all different
kinds of industries.
How do you build thesesafeguards into the apps in
around your LLMs?
And doesn't that drive up thecost for checking and
(10:40):
regenerating output if it'ssomething that's harmful or
something that that it's notsupposed to generate?
How do people deal with that?
Steve Wilson (10:48):
So one of the
things that I have in the book
is, ultimately when you get tothe last chapter.
There's a checklist of thingsyou should be doing.
The top 10 list was great forhere's 10 things you need to
understand that can go wrong.
And I do cover all of those inthe book, but the second half of
the book is, what do you doabout it, right?
Just giving me a list ofproblems is not helping anymore.
(11:12):
So I would say one of the thingsis people naturally jump to this
idea that my defense If I'mworried about prompt injection,
my defense is going to beputting a filter in front of the
LLM that's going to look forprompt injection.
And and that can be one part ofthe solution, and I do talk
(11:33):
about some ways to do that andscreen for things.
But that's an incredibly hardproblem, and I'd say, for the
most part, something I don'tencourage people to solve by
themselves.
The first thing to think aboutis for that part of the
approach, you want to thinkabout going and getting a
guardrails framework.
And there are commercial ones,there's a whole bunch of cool,
(11:55):
virgining companies.
I could name a few and then I'llget in trouble for the ones I
don't, but let's try anyway.
You've got PromptSecurity andLassoSecurity and YLabs.
All these people are doing greatstuff.
You've got open source projectsfrom companies as big as Nvidia
and Meta.
That are attacking parts ofthese problems.
(12:15):
I suggest people check those outbecause that's, going to give
you the state of the art forthat layer of defense.
The other things that I say,though, the first thing on the
checklist is actually more aboutwhat I will call product
management than engineering.
And that's about carefullyselecting What you are going to
allow the thing to do and notdo.
(12:37):
And actually really closing downyour, field of view.
If you're trying to buildChatGPT in your open AI, there's
a tremendous amount of things.
And we all, trip over ChatGPTguardrails all the time.
Oh, I can't do that.
But then people trick it intodoing it because it's a very
large space to defend.
It's actually much easier tocreate what we call an allow
(12:59):
list than a deny list.
If, as a product manager, Idecide this bot is for helping
with tax calculations, it's mucheasier for me to put some
guidance in there.
build it into that bot that saysdon't answer questions that are
not directly about taxes, thenlist all of the things that
should not be.
(13:21):
And when you think about it thatway, you unlock some of these
things where that's not a hugecomputational cost.
That's just a smart decision.
Andreas Welsch (13:30):
That makes a lot
of sense.
I think putting it that way andgiving it that persona of This
is what you do, this is whatyou're not supposed to do, or
avoid or ignore anything else,right?
And again, comes back to thepoint that I think we talked
about last year.
Almost like a zero trust, zero,yeah, zero trust boundary.
Steve Wilson (13:50):
What I tell
everybody is you need to treat
your LLM as something in betweena confused deputy And an enemy
sleeper agent.
And really we're used to talkingabout trust boundaries in our
software as the software ishere, it's my software.
I trust it.
And what I'm trying to do iscontrol what's coming in from
(14:12):
the outside.
The thing is at this point,almost by the nature of any of
these applications, they'redesigned to take untrusted input
from the outside.
And by proxy, I have to stoptrusting the LLM.
And that means actually, I put alot more effort into filtering
and checking what comes out thanI do what comes in.
(14:34):
It turns out to actually be,again, a much easier problem to
solve and screen for.
I can watch for, is it spittingout code?
If I didn't intend it to spitout code, I can watch if it's
spitting out credit card numbersor social security numbers.
Those are easy problems to solveversus, is this an attempt to
hijack my LLM?
Andreas Welsch (14:54):
That's great.
I think especially the examplesthat you give of how to do that
or where to do it, right?
It's easier, like I said, to dothat at the beginning than at
the tail end of it.
Which triggers another thoughtor another question here.
And I haven't been really ableto find good expertise on this
(15:16):
for people who can talk aboutthis in depth and in breadth.
Which is how do you actually redteam your LLM applications?
Meaning you have somethingthat's untrusted, but how do you
go through all these scenarios?
What are things you should betesting for?
And how, can your own securityteam maybe get more familiar
with these techniques and,upscale now that LLMs are here?
Steve Wilson (15:40):
It's, been a, it's
been a really interesting
journey and, I've been exploringthis from a few different
angles.
Personally my day job being atExabeam, we just added a bunch
of Gen AI capabilities to ourcyber security software that we
sell to enterprises.
All of a sudden, I wasn't justthe author writing about this
(16:01):
stuff.
I was on the hot seat andresponsible for it.
And, when we started puttingthis out and started putting it
out in early access, and franklygot some really great responses
from users, I went back to theengineering team and started
asking a lot of questions like,Hey, what kind of guardrails are
we putting on this?
And what are we doing?
And, there, there weren't a lotof great resources for red
(16:22):
teaming, but it turned out to bevery, or it was very easy to
trip it up with even basic redteaming at the beginning because
These, things are hard, but whatI will say is, oh, and the other
part is from the OWASP side redteaming has become a huge topic
(16:43):
within the expert group.
Again, it was like, hey, whatare all the problems?
Now it's like, all right, redteaming is a big part of the
solution.
How are we going to give peoplegood red teaming guidance?
And we actually, we did a panelon that at RSA that was really
well received.
And it's up on YouTube.
If you go look for it, OWASP top10 LLM red teaming, you can find
(17:04):
it.
But, What I'll say isinteresting, which is, people
coming to this from what I callan AI background or, coming to
this from an AI angle oftendon't have a lot of
cybersecurity experience.
And that's one of the tricks.
So they don't necessarily knowwhere to start.
They actually, the term redteaming has become pervasive.
(17:27):
In the community, broadly, dueto LLMs I was watch I was
watching a clip off cable TVrecently, where Senator Chuck
Schumer from the United StatesSenate was talking about AI red
teaming and LLM, My mind justexploded, so it's in the popular
(17:48):
culture.
And, what we really see is that,there's a tremendous amount of
expertise on how to approachthis in an obscure corner of
your development organization atyour enterprise called the
AppSec team.
And these teams are sometimesfeel unloved, but they're the
(18:08):
ones who are charged withbuilding secure software inside
your enterprise.
And they are coming up alearning curve about AI, but
they know how to secure thingsand they know how to pen test
them.
And for a long time, we calledthat pen testing.
And then red teaming is adifferent variant of that, but
that's a good place to startwith that piece of expertise.
(18:28):
The thing about red teamingthat's different, and I break
this down in the book with atable that actually compares pen
testing and red teaming, but AIred teaming is really, you want
to attack it from a broadperspective.
You're not probing it lookingfor traditional attacks.
You're looking at it from aholistic point of view.
(18:49):
Security, safety.
And it's really interesting towatch these teams that were
viewed very narrowly, like anAppSec team in an enterprise.
Suddenly become much moreimportant where they're getting
asked for questions.
Like how do I test this forsafety?
And sometimes they'reuncomfortable with that.
They're like nobody's asked methis Like I got nobody else to
(19:10):
ask you're the only person herewith expertise on this.
Let's figure it out some peopleget uncomfortable Some people
are really embracing it andthese teams are stepping up and
it's really cool to see themstep up and attack this teaming
if You don't have experiencewith it and you're looking to
just get your feet wet.
(19:31):
There's an awesome resource froma company called Laykara.
They're another one who makesguardrails frameworks.
The thing they got famous fororiginally was, they put out
basically a game on the internetcalled Gandalf.
And it is, a tool to practiceattacking large language models.
And so if you want to just getthere in a way that's not
(19:54):
illegal or immoral, practiceyour skills at hacking LLMs and
develop those red teamingskills.
That's a great place to go getstarted.
Andreas Welsch (20:02):
Cool.
I know what I'll be doing oncewe end the stream.
That's for sure.
I always love learning somethingnew and getting new resources.
Two things, folks, for those ofyou in the audience, if you have
a question for Steve, please putit in the chat.
We'll pick it up in a minute ortwo.
And while you're thinking aboutyour question, Steve, you
mentioned you wrote a book andtoday's launch day.
(20:24):
What's the name of the bookagain?
Steve Wilson (20:25):
Yep, so the book
is called The Developers
Playbook for Large LanguageModel Security Developing Secure
AI Applications.
It's from O'Reilly.
And and again, it's availabletoday.
You can get the electroniccopies on Amazon or wherever you
buy books.
And they promised me the hardcopies are on a truck somewhere.
(20:46):
Coming really soon.
Andreas Welsch (20:47):
That's awesome.
Fantastic.
So folks, definitely check thatout.
So while we're waiting forquestions to come in, I think
another aspect then is, asyou're thinking about red
teaming, it's the usual teamsthat have been doing this kind
of work for a long time.
It's an additional dimension ofhow do we do this for large
(21:10):
language models now.
How do we upskill?
What roles do you see withinthese teams are best suited to
do that?
What skill sets do people havethat they do red teaming that do
develop these safeguards?
Is it the AI scientists, I thinkyou said, that's maybe not the
best angle.
Is it the pure security teams?
(21:32):
Is it the English majors whoknow how to write a crafty
sentence and elicit a goodresponse?
Where do you start?
Steve Wilson (21:42):
Yeah, it's.
It's funny.
Just in general, having beenimmersed in not just the
security, but the development ofGen AI apps.
It's interesting to look at therole of, let's call it, data
scientists and machine learningengineers.
Because at ExitBeam, we have abunch of them who've been
developing.
(22:03):
Deep machine learning models forhigh speed analysis of data for
10 years.
But when we added the co pilotwith, Gen AI, and we're
actually, we're using we'reusing a lot of things people who
watch your show would befamiliar with.
We were using a private versionof Gemini inside Google Cloud,
and it's an LLM, and we do ragand prompt engineering and,
(22:26):
these things that we're alllearning about together.
The role of the data sciencethere.
is very minimal.
If you're a machine learningspecialist in the world of large
language models, you might beworking at somewhere like Google
or OpenAI, working on the nextwave of those frontier models.
(22:47):
And I have a tremendous amountof respect for that.
But when it comes to buildingthese apps, this is really a new
skill set that we're alldeveloping together about.
How do I build effective promptsand how do I do effective RAG?
And there are ways to connectthat up to existing, machine
learning technologies, which hasbeen really cool to do at
(23:07):
Exabeam.
But, I'd say what that means isthat we're all developing these
new skill sets.
I think from a securityperspective, those data
scientists are the furthestaway, from, doing this directly
at your enterprise.
Although there are a lot ofsmart ones at places like NVIDIA
and Meta working on securityguardrails.
(23:27):
And I would just take advantageof that.
I would not try to imply, employdata scientists to build
security guardrails for a newenterprise app.
Andreas Welsch (23:37):
Good point, that
you're making there.
So then what types of skills doyou see if it's not the data
scientists?
But there is some knowledgeneeded that you understand what
are the vulnerabilities?
What's a good skill set that cantest for these things?
Again, is it more language?
Is it more security?
(23:57):
Is it something completelydifferent?
Steve Wilson (24:00):
I'd say that the
term red teaming existed in
cyber security long before largelanguage models.
And the first place is to lookto those security teams that
have experience in pen testingand red teaming.
There are also great things outthere that are cropping up.
(24:22):
Companies like HackerOne.
HackerOne is a really coolcompany with a cool business
model where they basically runpeople's business.
bug bounty programs for largeenterprise.
What they have is a million,literally a million hackers
under contract, and they willcome try to hack your stuff.
And if they hack your stuff, youpay them because they will
(24:42):
disclose to you how they did it.
And in the last six to 12months, they've added AI red
teaming to this.
And I've seen some amazingfeedback on what you're able to
do with that.
So you're able to actually goout, and basically crowdsource
Andreas Welsch (24:59):
That is indeed
an interesting business model as
well.
And I could see, especially forcompanies that don't necessarily
have the bandwidth or alreadythe expertise, we're still
bringing up the expertise intheir own organization to
leverage offerings like that youjust mentioned to augment their
skills or again get someexternal help on this.
(25:19):
Fantastic.
Steve, we're getting close tothe end of the show, and I was
wondering if you can summarizethe key three takeaways for our
audience today before we wrapup.
Steve Wilson (25:28):
Yeah, from the
topics we've covered, I'd say,
this is a fast evolving space.
I would say that if you'retracking this, you want to start
to be aware of some of the newerplaces where the vulnerabilities
are showing up, and these aredefinitely things like supply
chain.
If you're not just usingsoftware as a service model,
(25:52):
like a Google Gemini, but you'reoff trying to run your own LLM
or trying to use training datafrom third parties and things
like that.
Really start to understand thecharacteristics of your supply
chain.
And then as you go out and startto think about adding agents to
the mix letting things executeautonomousy and seek goals,
(26:15):
really think hard about theimplications of that.
And lastly, please check out thebook, developers playbook for
large language model security.
Andreas Welsch (26:26):
Steve, it's been
a pleasure having you on.
Thank you for sharing yourexpertise with us.
Steve Wilson (26:31):
Thanks, Andreas.
Always enjoy it.