Industry’s Fastest Guardrails Now Native to NVIDIA NeMo

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
(00:02):
Welcome back to Safe and Sound AI.
You know, last time we really dug intothe launch of Fiddler Guardrails and
honestly, I was blown away by those
sub 100 millisecond
response times but todaywe're going even deeper.
We've got some fascinating sources onhow Fiddler Guardrails is now natively
integrated with NVIDIA NeMo Guardrails.
Right.

(00:22):
It's like they took an alreadyimpressive tool and just plugged
it right into one of the leadingframeworks for building with LLMs.
Exactly.
And that's got big implicationsfor anyone actually building
and deploying these models.
So our mission today is to reallybreak down what this integration
means, particularly for thoseof you working with NeMo.
You know, we'll be looking at howit helps you build those safer,

(00:42):
more reliable LLM applications,especially when it comes to those
persistent challenges, hallucination,toxicity, even jailbreak attempts.
So, all right, let's jump right in.
NVIDIA NeMo Guardrails.
For those who may not be familiar,can you give us a quick rundown
of what it is and what it does?
Sure.
NeMo Guardrails is, at itscore, a scalable platform for

(01:03):
managing these AI guardrails.
It allows you to implementand manage various guardrails
effectively, all in one place.
Instead of having different solutionsfor safety, security, and other
constraints, you have a unified framework.
So it's like a central commandcenter for all your LLM safeguards.
Precisely.
So I think what's really cool here is,you know, Fiddler Guardrails isn't just

(01:23):
sort of like an add-on or something.
It's actually a native partof NVIDIA NeMo Guardrails now.
And that built in aspect really.
I think that's key, isn't it?
Because what that means is thatincredibly fast response time
that Fiddler Guardrails has.
Which we talked about, you know,the sub 100 millisecond speed
that's now operating like directlywithin NVIDIA's secure environment.

(01:44):
So it's not going out to someexternal server or something.
Nope.
Right within their own infrastructure.
So what does that mean practically?
No data leaves your deployment.
No external calls.
Okay, so we've talked about allthe high level benefits, but
let's get practical for a second.
I know we've got a lot ofengineering folks listening who
are probably wondering, okay, thisall sounds great, but how hard is

(02:04):
it to actually set this thing up?
And the sources talk about a prettyseamless implementation of Fiddler
Guardrails within NeMo guardrailswith minimal setup requirements.
What does that actuallylook like in practice?
The first thing you do is you obtainwhat's called a Fiddler platform key.
That's your authentication.
Then you set that key as an environmentvariable in your NeMo environment.

(02:28):
And then finally, you update a filecalled config dot y ml which is a standard
configuration file in NeMo Guardrails witha couple of key pieces of information.
The Fiddler Guardrails endpoint andthe specific thresholds that you want
to set for things like moderation.
Ah.
Okay, so like how sensitiveyou want the system to be.

(02:48):
Exactly.
So how sensitive you want to be topotentially toxic language, for example.
And what's great is, you know, there'snot a lot of complex coding involved here.
Mostly configuring things.
You're pointing it in the right direction.
And it just slots right in.
And that significantly reduces the timeand effort it takes to get started.
That's huge, especially if you're tryingto get a pilot up and running quickly.

(03:08):
Exactly.
Now, one thing I find reassuring isthat it sounds like even if there
are, you know, temporary API issues,Fiddler Guardrails will continue
to moderate inputs and outputs.
You know, it won't justcompletely disrupt the service.
Right, you don't want the whole thingto fall apart if there's a hiccup.
Exactly.
And that you can fine tunethose threshold values.

(03:29):
You know, you can adjust thatmoderation sensitivity as needed.
It gives you that granular control.
So, for those of you listening whoreally want to get into the weeds of
implementation, the NVIDIA NeMo GuardrailsGitHub repo is your go to resource.
All the details you could ever want.
They've got it all there.
All right, so we've talked aboutthe integration, but now I want to
shift gears a little bit and talkabout the foundational importance

(03:51):
of metrics in all of this.
Because you can have all theguardrails in the world, but if
you're not tracking what's actuallyhappening, are you really in control?
Without quality metrics, you'reessentially flying blind.
You need that visibility to understandhow your LLM is behaving in the real
world to see where those potentialissues might be cropping up.

(04:11):
Exactly.
Are you seeing a suddenspike in hallucinations?
Are there emerging patterns of misuse?
You need to know this stuff.
And without the data,you're just guessing.
Right, you're relying on anecdotes.
So this is where the FiddlerTrust Service comes into play.
It's kind of the enginebehind all of this.
The brains of the operation.

(04:32):
And it's highlighted as providing50 out-of-the-box metrics, and then
you can even customize your own.
Right, it's very comprehensive.
So you've got this wealth of information.
How do those metrics actuallytranslate into the protection that
the guardrails and those broadermonitoring solutions provide?
Okay, so think of it this way.
The Fiddler Trust Service is constantlyevaluating the LLM's outputs based

(04:56):
on all these different metrics.
Is it being factually accurate?
Is the language appropriate?
Okay, so it's lookingfor all the red flags.
Exactly.
And then that data feedsdirectly into Fiddler
Okay.
So it's informing those real-timedecisions about whether to
allow a response, flag it forreview, or block it entirely.
So it's not just a static set of rules.

(05:17):
No, it's dynamic.
It's responding to what theTrust Service is seeing.
Exactly.
And then, beyond that immediate guardrailfunction, you've also got all that
data that's being collected that youcan use for ongoing monitoring, right?
Absolutely.
You can track trends over time to seeif there any anomalies popping up.
So it's not just reactive?
It's proactive, too.

(05:37):
You can potentially catch issuesbefore they become major problems.
That's great.
Now, let's talk about how theFiddler Trust Service actually
generates those metrics.
Sure.
Because it uses a couple ofreally interesting approaches.
On the one hand, you've got thoseproprietary Fiddler Trust Models, which
are highly trained for specific tasks.
But then you've also got thiscapability where enterprises can

(05:59):
define their own custom metricsusing a hosted Llama 3.1 8B model.
That's pretty powerful.
That's really cool.
So that means if you've got some veryspecific domain related risks, you
can tailor your monitoring to that.
You're not limited to justthose pre-built metrics.
Let's say you're workingin a legal setting.
You might want to track how well the LLMis adhering to specific citation formats

(06:23):
or if you're in healthcare, there mightbe very specific terminology that you
need to make sure is being used correctly.
And what's great is that thiscustom metric functionality is
provided as a fully managed service.
Right, so you don't have toworry about the infrastructure.
You don't have to set up yourown Llama model, you don't
have to manage all of that.
It's all handled for you.
They take care of all the heavy lifting.

(06:43):
And we're talking abouthandling a lot of data here.
They're talking about hundredsof thousands of daily events.
That's serious scale.
It's impressive.
And then the other approach they mentionedis this concept of LLM-as-a-judge,
where they're using APIs from OpenAImodels to score various metrics.
So you've got these two powerfulapproaches working together, the

(07:07):
efficiency of the specialized fillertrust models, and then that broader
understanding that you get fromthose more general purpose LLMs.
And ultimately, it's all aboutgiving you, the user, the insights
you need to make informed decisionsabout your LLM applications.
Transparency and control.
Absolutely.
That's what it's all about.
Okay, I want to touch onthe partnership itself.

(07:28):
Because it sounds like this integrationbetween Fiddler and NVIDIA wasn't just
some random thing that happened overnight
.It was a strategic collaboration.
They've been working on this sincethe early days of NeMo Guardrails
and NVIDIA Inference Manager.
So, what's the significance of that kindof close partnership for the end user?
It's about alignment.

(07:49):
It means that Fiddler and NVIDIAare both committed to solving
these really tough challenges indeploying generative AI and LLMs.
They're in it together.
They're in it together.
And that benefits you because you'regetting this more cohesive ecosystem,
you're getting deeper integration.
These are just going towork better together.
Exactly.
And you're likely to see more innovation,more features being rolled out as they

(08:12):
continue to work closely together.
Now their initial integrationfocused on capturing a ton of data,
prompts, responses, metadata, evendetails about how the NeMo guardrails
themselves were being executed.
And all of that was feedinginto the Fiddler platform.
That must have provided somereally valuable early insights.
Oh, absolutely.

(08:32):
It gave users a really granularunderstanding of what was actually
happening within their LLM applications.
Like under the hood.
Yeah, where were thoseguardrails being triggered?
What types of comps were causing problems?
It helped them pinpoint issues and reallyrefine how they were using the system.
And then more recently, there's beenthis integration with NVIDIA Inference
Manager, NIM, which is all aboutscaling those secure LLM deployments.

(08:57):
Logging those prompts, routingthem to Fiddler for monitoring.
Making sure that you don'tsacrifice security as you scale.
Because as we've said, security andscalability, they need to go hand in hand.
Absolutely.
You can't have one without the other.
Now, for those of you listeningwho are really intrigued by all
of this and want to kick thetires a bit, there's great news.
There's a free trial ofFiddler Guardrails available.

(09:20):
It's a great opportunityto see it in action.
So the free trial gives you 14 daysof access, 200 API requests per day,
so you can really test things out.
Yeah.
Plenty to work with.
You get real-time moderation againstall those key risks that we've been
talking about, and they've got tons ofdocumentation to guide you through it.
They even have a guided walkthrough.
They make it as easy aspossible to get started.

(09:42):
That's fantastic.
So getting started is simple.
Grab your API key andyou're off to the races.
So for you, the listener, youknow, if you're working with NeMo,
think about what this level ofspeed and integrated security could
mean for your LLM applications.
It could be a game changer.
It really could.
This podcast is broughtto you by Fiddler AI.
For more on NVIDIA NeMo, FiddlerGuardrails, and the new free trial,

(10:04):
see the article in the description.

All Episodes

Episode Transcript

Popular Podcasts

Stuff You Should Know

Crime Junkie

NFL Daily with Gregg Rosenthal

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}Industry’s Fastest Guardrails Now Native to NVIDIA NeMo

Episode Transcript

Popular Podcasts

.css-r6mb8g{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:1;overflow:hidden;}Stuff You Should Know

Crime Junkie

NFL Daily with Gregg Rosenthal

All Episodes

Industry’s Fastest Guardrails Now Native to NVIDIA NeMo

Stuff You Should Know