AI Observability and Security for Agentic Workflows with Karthik Bharathy

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Krishna Gade (00:06):
Welcome and thank you everyone for joining
us today's AI Explained.
Today's topic is AI Security andObservability for Agentic Workflows.
Everyone touts that this year isgoing to be the year of AI agents.
Let's see how we needto address these issues.

(00:27):
I am your host today.
I'm Krishna Gade.
I'm one of the Foundersand CEO of Fiddler AI.
Again, please put your questionsin your Q&A box at any time
during the fireside chat.
Today's session will also berecorded and sent to all the
attendees after the session.
Okay.
So with, without further ado, um, Iwant to welcome Karthik Bharti, um,

(00:50):
General Manager for AI Ops and Governancefor Amazon SageMaker AI at AWS.
Karthik, um, if you couldturn on your camera, um,

Karthik Bharathy (01:02):
Hey Krishna

to a thank you.
Welcome to AI Explain.
So it's just a brief bio of Karthik.
Karthik is a leader with over20 years of experience driving
innovation in AI and ML.
Um, as a General Manager for AI Ops andGovernance for SageMaker AI, Karthik leads
the development of cutting edge generativeAI capabilities in Amazon SageMaker AI.

(01:23):
Karthik, uh, you know, thankyou so much for joining us.
Uh, maybe let's start youknow, with your background.
You know, how has your role in AIOps and Governance at AWS shaped your
perspective on you know, monitoring andsecuring AI workflows in the enterprises.

Yeah, that's a great question, Krishna.
And, um, if I think about how AI Ops andgovernance has evolved over the years,
um, in fact, a lot of the changes hasbeen in tandem with innovations that we've
seen in AI ML over the last few years,you know, starting with traditional ML

(02:02):
systems to more recently GenAI and Uh,agentic workflows as as you aptly put,
um, and throughout these years from whatI've seen, um, there are three things
that that stand out really one is,um, security and governance are built
into ML workflows from the ground up.

(02:24):
It's, it's not an afterthought anymore.
Um, what essentially that means is,uh, enterprises are thinking about,
uh, robust data governance techniques,access controls, and, and how they, uh,
incorporate audit trails from day one.
Um, and, and effective securityisn't about just, uh, you

(02:46):
know, protecting your models.
It's about creating a comprehensive systemthat includes, uh, uh, looking at your,
uh, monitoring in an automated manner.
Uh, doing version control and,and also having audit trails.
The, um, second one I'll call out is, uh,the need for end-to-end observability,

(03:08):
um, and this is across both the dataand the ML workflows, um, right from,
you know, how data is ingested, how youcan have lineage starting from all the
way to data to ML, um, and all the wayto, uh, observability during, um, model
deployment to look for drift and so on.
Um, and, and finally, I would call outthe third thing is, uh, while all these

(03:32):
sophisticated tooling is in place, um,you want to have the necessary, um,
human element, um, to sort of overseethe process, uh, while it's automated,
there are critical junctures wherehuman, uh, oversight is needed, and that
helps in the decision making process.

Awesome.
So, uh, you know, being at, you know, atthe helm of SageMaker, you're probably
seeing the current state of AI in theenterprise, you know, it's adoption.
Um, you know, how would you describe it?
You know, did you shed somelight for our audience?

Yeah, yeah.
Um, I, I think if you look at it again,the last four or five years, right, the,
the enterprise landscape is evolving.
Uh.
pretty rapidly, right?
And you can notice, um, severaldistinct patterns, right?
Um, and for what's worth,like, we are in the third year
of generative AI was, right?

(04:25):
Uh, I think the first year was morearound, hey, there's this cool thing,
like, what can GenAI do, right?
But last year, Based on customerconversations, we saw that, um, customer
conversation moving from, "Hey, what isGenAI" to, "Hey, is this right for me?"
And how can I adapt this, um, intohaving a real impact for my business?

(04:49):
Um, and this year, um, weare hearing customers want
to go big with generative AI.
You know, both in terms of goingwide and going deep and, you know,
deploying these systems at scale.
and also leverage the promiseof agentic AI that can create
tangible business value, right?
And as, as we see more of these systemsbeing developed, like AI systems,

(05:16):
we, there is a need to integratethese different AI systems so you can
orchestrate more complex workflows andwhile at the same time you want to keep in
mind aspects of security and reliability.
So that's definitely one trend andthe other one I would call out is as

(05:37):
you bring in these systems and wantto make complex decision making, you
want to do so in an automated manner.
While keeping in mind, hey, there istransparency and accountability, right?
So there is increasing, uh, increasinglycustomers are looking for ways
to have human oversight and theywant to scale their AI operations.

(05:58):
That's right.
Yeah, especially in the.
Regulated industry, which we play in,uh, there is some, um, cautious approach
behind, you know, you know, with respectto the usage of generative AI or AI
agents, like the whole human in the loop.

Um, so I guess that's that begs the question, right?
What potential are you seeingfor these agentic AI systems?
You know, how are they going totransform the business operations?
Any real life examplesthat would be amazing.

Yeah, yeah, I think there are quite a few, right?
And, um, let me first break it down intosort of the different patterns we see,
um, based on the customer conversationsin AWS, and then sort of look at, um,
examples for each of those, right?
Um, so with agentic AI, um, And thebusiness value that provides it fall

(06:46):
largely in three different categories.
Um, the first one, um, would beusing agentic AI to accelerate,
um, workplace productivity, right?
So think of these as, um, day today repetitive tasks that employees
are doing, and they want to automatethis and, and gain the advantage,

(07:09):
uh, of using such an agentic system.
Right.
A good example is NFL media.
They use business agents today tohelp their producers and editors to
accelerate their content production.
They have a research tool thatallows them to gather insights from

(07:29):
video footage from a specific place.
And, um, what essentially thatprovides is, uh, when, when you're
onboarding a new hire, it reducesthe training time, um, by up to 67%.
And, um, um, and when employees, uh, theiremployees ask questions, um, about what,

(07:49):
what's going on, that can be surfacedin less than 10 minutes, uh, of what
used to take, um, close to 24 hours.
So that's one such example.
And, uh, more closer tothe to the software world.
We're all familiar with coding assistance.
And many of you may havealready used coding assistance
in one shape or the other.

(08:10):
Um, and and largely, well, they help with,um, building better code or providing.
Um, documentation or explaining, um,existing code, it's, it's not just
about the code itself, but it's,it's more about automating the entire
software development lifecycle, um,including, you know, upgrading software.

(08:30):
Um, or, um, modernizinga legacy application for

Migrating to new languages.

Absolutely.
Absolutely.
So case in point, um, withinAmazon, we had, um, these agents for
transforming our code base from an olderversion of Java to a newer version.
And, uh, there was savings of,you know, a mammoth, like 4, 4,500
developer years worth of effort, right?

(08:55):
Roughly translate to, um, you know,260 million annual CapEx savings.
Um, so that's that's the first trendI would think in terms of using it
to accelerate workplace productivity.
The second one would be intransforming business workflows
and uncovering new insights, right?

(09:17):
What I mean by that is, uh, asenterprises are adopting agents, they
want to streamline their operationsand gain insights on their data, right?
Um, and the example thatcomes to mind is, is.
Of cognizant.
They're using business agents to automatemortgage compliance workflows, and they've

(09:38):
seen improvements of more than 50 percentthat reduces, um, uh, errors and rework.
Um, similarly.
Um, Moody's is another great example.
They've used a multi agenticsystem, um, that looks at
generating credit risk reports.
Um, and again, the benefit is, uh,what used to, uh, take humans about one

(10:02):
week to generate a specific report isnow cut down to just one hour, right?
So that's the magnitude of, um, impactthat, that customers are seeing.
Finally, the third one I would callout is more in the research area
that's sort of, uh, fueling, you know,industry transformation and innovation.
Um, a good examplethere is from Genentech.

(10:24):
Um, they've deployed a geneticsolution running on AWS, um,
and, and they're improvingtheir, uh, drug research process.
So what they've done is, uh, Theirsolution roughly automates, um, you
know, huge, about five years worthof research, right, across different
therapeutic areas, uh, right, and thenit, what it does, it helps them speed

(10:46):
up the, uh, drug target identificationand also improve their research
efficiency, um, ultimately leadingto, you know, faster drug development.
So, um, net net, we're seeingsystems, agentic systems deployed
broadly in these three categories.

Absolutely.
So it's like workplace productivity,business transformations, and
then, you know, new, you know,new, new product innovations.
Um, so one thing that you mentionedand business transformations, you
know, you mentioned a few examples,especially like generating credit
reports and claims processing, right?
These are, you know,high stakes AI use cases.
So there is a need for, youknow, security, transparency into

(11:25):
how, you know, AI is working.
You know, what are some of thechallenges that we're trying to address.
Yeah, you think our organizationsare facing when they're implementing
agentic workflows for these, youknow, for these use cases or in
general, other use cases too?

Yeah, yeah, I think that's a, that's a great call out, right?
So, um, while you're looking atthese, uh, systems and I think
they're definitely Um, you know,security and visibility challenges
that organization need to look into.
Um, I'll, I'll call out a few that,that we have seen and, uh, by no means
this is comprehensive, but it, it sortof comes down to, um, the stage of

(12:02):
the ML workflow, if you will, right?
And, uh, If you think about it at thevery beginning, when you're trying to
use a specific model, um, it's, it'squite possible that the data that's
you being used, either to train a modelor, you know, fine tune model, use rag,
whatever technique that you use, um, touse a data that's, that's not authentic.

(12:23):
And this might just compromisethe performance of the model.
That's definitely.
Um, you know, it's concerning atthe same time, harder to detect, but
until the model is being used andthe interactions that are going on.
So that's one category.
Um, the second would be when, um,you know, the model is being used,
and again, it depends on the model.

(12:44):
And in the case of a proprietary model,where the model weights are not exposed.
Um, it might be an actor trying to attemptto reverse engineer saying, what is the
specific, um, uh, weights that were usedat what level and so on and so forth.
And that essentially, um,um, you know, exposes.

(13:05):
The how of the model, if you will,um, and the third one, I would think
is when the model is actually beingused, um, and there can, you know,
actors can attempt to, uh, extractinformation, which otherwise the model
would not emit, uh, it might be sensitiveinformation about the training data,
it might be information that may notbe what the model is intended for, or

(13:27):
the use case that's being deployed.
Um, so.
Um, net net, I think, um, uh,organizations would need to protect
against these model weights.
Um, you know, how necessarycontrols around, um, uh, access,
um, you know, ensure thatthere's data privacy and so on.
And more importantly, uh,ensure that there's this

(13:48):
observability that's end-to-end.
So you can, you are having necessarychecks to see how the model is performing.
Um, and more often than not, youprobably have a sandbox environment
where you're testing it, have tooling,you know, there are a few tools like
Bedrock Guardrails is an excellent tool.
So you sort of incorporate that, you know,Fiddler has an observability tool as well.

(14:08):
So these provide sufficient insights intowhat is going on in the system, being
it agentic or an automated workflow,and you sort of take actions based on

Absolutely.
So I think you touched upon afew things like, you know, uh,
adversarial attacks on models.
And now there's this whole, um, field ofAI security and model security coming up.
Um, you know, I remember like, I thinka few, few weeks ago when DeepSeek
launched, everyone was producingbenchmarks about how accurate it is
or how close it is to accuracy whenit comes to closed source models.

(14:39):
But it was pretty vulnerablefor security attacks.
People were able to easily, youknow, make it, uh, leak PII content
and whatnot in RAG workflows.
So how, how, how do you think about,you know, uh, what, what are some of
the, uh, you know, best practices thatorganizations should think about AI
security and what, what, what are someof the, you know, how do you think about
that versus application level security ingeneral that has been around for a while?

Yeah, I think, um, at the end of the day, you need a
comprehensive security approach, right?
You want to operate atthe different levels.
Um, you mentioned aboutmodel level security, right?
So let's start from there.
Um, so when you're thinking about themodel, um, like I mentioned, you want

(15:25):
to protect the model weights, right?
Um, and in addition to modelweights, you want to protect
the, um, access to the data.
Um, you know, ensuring that thedata is, is, is authentic and so on.
Um, and to address these, you would, youwould use, like, you would encrypt where
the model is being stored, the actual fileum, or, uh, to your point on adversarial

(15:48):
examples, you would, you would have atest environment where you would exercise
the model, monitor its output for, forsome of these adversarial examples.
And um, at the end of the day, youneed continuous monitoring, right?
Um, just to look at the input andoutput patterns, but also look for
drifts, drifts in the model, driftsin the data, and have necessarily, uh,

(16:09):
necessary, um, uh, alerts, so you cantrigger, like, a retraining, for example.
So that's at the model level.
Um, at, at the application level, Ithink, um, there, there are the well
known security practices, like, you know,you enforce access controls, you have
encryption in place, um, You have loggingoff the interaction patterns and so on.

(16:31):
Um, but in addition to that,tooling is often needed.
Uh, like I mentioned, the Bedrock example,uh, Bedrock Guardrails example earlier.
You, you want to think about how youaudit certain topics, um, be it at
an input level or the output level.
What is relevant to your use case?
What should not be emitted?
Or if there's certain informationthat's being emitted like a

(16:54):
PII data, how do you redact theinformation and so on and so forth.
So I think net net, the two layers ofmodel security and application level
security need to integrate seamlessly.
So in many ways, uh, uh, theseare complementary than treating
it as separate constants.

Awesome.
That's great.
So I guess, uh, you know, we talkedabout, uh, a little bit about, uh,
some high stakes use cases, right?
So when it comes to, uh, you know,transparency of AI decisions for these,
you know, for regulators or businessstakeholders, how do you think, you
know, this is going to change when,you know, agents come about and, um,

(17:33):
you know, and organizations employagentic workflows and what happens
to the transparency behind AI?

Yeah, I think fundamentally, um, enterprise would
benefit from having a governance model.
That's, that's more federated, right?
Meaning you have standards, policiesin place, that sort of dictate

(18:02):
how these systems need to bedeveloped across the organization.
But at the same time, you want toprovide enough flexibility, uh, where
each team or business unit can adaptthese standards in a way that they can
implement for their specific use cases.
So that's sort of the trade off.

(18:24):
And it's, it's a good one,uh, in the sense that you want
to provide the flexibility.
Um, of developing these differentsystems across different units.
Um, and there are, again, tools, like,for example, uh, just purely taking
the example of SageMaker here, youhave SageMaker projects where you can
automate, um, the ML workflow, say,how should it be standardized, what

(18:46):
pipelines do you need to use, whatmodels, and what quality, and so on.

So the governance is like both a tools problem as
well as a people's problem, right?
Like, you know, essentially, youknow, many companies do not have,
The governance structures today,you know, to sort of ensure that
you know, AI is tested, monitoredand secured and securely operated.
You know, what, what are some of thebest practices that you have seen
in terms of, you know, customersemploying AI governance today across,

(19:12):
you know, different business units?

I think fundamentally, um, at the highest level of abstraction,
you have, um, you know, businessstakeholders, like the so called risk
officers, if you will, who understandthe domain of what is being developed,
and they would enforce certain standardson what needs to be Um, at her too.

(19:36):
And it's important that they workin tandem with the technical team
who are well versed with what'sbeing done with the model, right?
For example, a model may havea toxic score of like 0.1.
But what does that mean is from ause case perspective, whether this
model can be approved and deployedan organization is very specific

(19:57):
to the domain they're operating.
Um, I think successful organizations havea good mix of both where, um, you have the
necessary tooling, um, where these, uh,different levels, for example, toxicity
is being, uh, uh, monitored for, they'rebeing documented, uh, either through a

(20:17):
model card or you have enough properties,maybe in a model registry, for example.
And this translates into visibility fromthe risk officer who can effectively
say whether this model or the systemis approved for deployment or not.
So the two systems working together, Ithink, definitely is a recipe for success.

Got it.
So are there any specific metrics thatyou recommend that organizations need to
track, like whether it's about securityor, you know, governance of AI, you
know, when they're testing it or, youknow, when deploying to production?

Yeah, so if you look at the metrics again at the technical
level, you have a set of metricsright at the most foundation level.
Um, you know, if you have todocument it, document what the
model is doing as a model card.
You would look at, um, the purpose ofthe model, what data it's trained on,

(21:12):
uh, what is the validation rules, whatis the quality of the model, and so on.
Um, and going a little bit beyondthat, um, you may want to document
how the model is, um, uh, emittingor predicting a response, right?
So, for example, with, with, with, youmay want to look at explainability.
Approaches like you may look at a SHAPscore, for example, or you may look at

(21:34):
a LIME score, for example, and thesemay be documented with the model that
those are good metrics to look at.
And again, with GenAI, you canlook at additional metrics around
toxicity, fairness, and so on.
You can test these models.
You can have periodicevaluations on the level.
of these metrics and test itagainst, um, standardized data

(21:56):
sets that are available today.
Or you can use custom data sets thatare very specific to your, um, use case.
Um, and then again, at the businesslevel, you want to interpret these
as saying, uh, with a combinationof these, uh, objective metrics.
How does the subjective standardsand policies play in and what does
that mean from a risk perspective?

So there is always this tension, uh, within the
organizations to adopt AI faster.
Versus doing it right, right?
So there's this like, you know, howdo you make sure you do it properly
so that you don't get into trouble?
Like what, how should like, youknow, organizations think about
like, you know, this balance?

Yeah, I think um, that, that's a key one, right?
I think there's no one easyanswer, if you will, right?
And the, and the key to balance,uh, robustness in, in having those
security controls with the operational,uh, efficiency lies in having some,
having the right guardrails, right?
Instead of, uh, creating or lookingat the problem as saying, "Hey, here's

(22:59):
one way to do it," or one set of,"Hey, this is risky versus non-risky."
You're probably looking at,uh, a, uh, a set of, um.
A range of values, if you will, rightin terms of how to look at risk.
Um, a good example would be, uh, let's sayyou have the model or the system deployed.
Um, and you notice that certainchanges introduce a higher risk.

(23:23):
Um, it's better to trigger additionalapproval workflows, um, rather than,
um, you know, just waiting on it andsaying, here's a single way to do it.
In contrast, if the same set of changesresult in a relatively lower risk,
Um, you may want to proceed throughstandardized approvals instead of, you

(23:44):
know, requiring additional approvals.
Um, a good example again would be,let's say, there's a drift in the model,
right, which is fairly common and youhave an observability solution in place.
If the drift is, um, not significantfrom the current state of the model,
you may be okay with treating that asan alert and being in the know how of

(24:05):
what is being happened and you may justtrigger a retraining of the workflow.
But on the other hand, if the driftis significant and it exceeds what
is the threshold that you've defined,um, you may trigger additional
approvals or in, in, in some extremecases, you might even consider
rolling back to the previous version.
Uh, so those are, uh, different optionsthat you can consider and, and, and the

(24:28):
key is to maintain that configurable.
So you can trade off between the, therigor and the robustness of security
control with the efficiency that it

Right.
So when it comes toevaluation of AI, right?
So in the past for classical machinelearning, you could do things
like ROC curve, AUC scores, youknow, precision recall, and maybe
even do like SHAP, SHAP plots andunderstand the feature importance.
But now with in a generative AI andagentic workflows, evaluating the
performance is not straightforward, right?
You know, there's no ground truth.

(24:58):
So how do you, you know, can youshed some light on like, you know,
how, uh, customers are going aboutthis, you know, in, in, in sort
of in the, in, in, in sectors thatyou have been exposed to so far and
what are some of the best practices?

I think the ones that I've seen are the areas where
customers are exploring are, um,evaluating the system end-to-end, right?
There's no one unique metriclike going back to the example
that mentioned earlier.
Um, concretely, you can think off havinga pipeline, um, that that triggers, um,

(25:33):
on either manually or on a periodic way.
And that evaluates the modelon certain dimensions, right?
Um, and, and evaluationis sort of a broad topic.
But, um, if, if there are certainaspects of the model that you want,
let, let's say, be it fairness.
Or, um, uh, robust, um, toxicity,for example, like you can look at

(25:55):
evaluating, for example, a model againsta toxigen model and seeing, hey, if
these inputs were send to the model.
What is the output?
And once you know the expectedoutput and the actual output, you
can actually see the difference.
Okay, the model isworking on expected lines.
Therefore, this is the scorethat you want to assign for
that particular category, right?

(26:16):
So developing that comprehensivepipelining workflow and making sure you
have observability and each of the placesand saying as a system, you do it first
at the model level, and then you do it atthe system level when there are multiple
models interacting with each other.
And then saying, given the behaviorof the system, what what is the sort
of the score that you want to, in somecases, you know, you can be creative

(26:41):
and creating a composite score.
It purely depends on how how muchweight that you assign to each of
these individuals score to create thiscomposite score and how you gauge that
composite score with respect to use case.

Especially for agentic workflows when in some cases when they
are automating the decision processright in the enterprise space, there
is a need to measure like whetherthe decisions are optimal or not.
You know, uh, it's a pretty hard problem.
Uh, any, any, any thoughts on that?
Like, you know, forexample, you know, this.
So take, take the example thatyou mentioned, like the claims

(27:15):
processing workflow, right?
Like which, which was probablylike much more manual in the past.
Now it's like, you know, automated.
How, how can, how can, you know,customers measure like, you know, if it's
working properly and if it's actuallyworking optimally for the business?

Yeah, while while you can have, um, you know, objective
metrics at the end of the day,it's the business use case, right?
And I think, um, it would involve human,um, in, um, human processes or seeing
the sort of outputs from the system.
Um, and I think that the key is to havethe necessary hooks in place, right?

(27:56):
For example, while on one end youwant to enforce controls on like
what data is being accessed or whatoutput is being generated or like
what toxicity is the scoring or theevaluation model being done, you want
to make sure there's human insight.
Um, and every decision, especially inthe early phases of when the system is

(28:17):
deployed, you want to have these humanevaluation on on the system output.
Um, more importantly, you also want tohave some sort of a pause switch, if you
will, to say that if the model deviatesfrom the known patterns, what is the
way to quickly have the humans come inand have this pause switch or even a

(28:37):
kill switch for that matter to make surethat corrective actions can be taken.

Yeah.
And so, so I think basically,you know, this might change from
industry to industry, right?
So, you know, like for example, youknow, what do you want to measure or
what do you want to control around AI?
Can can be different for differentdomains, you know, have you seen any,
any sort of, uh, insights, like, forexample, finance versus healthcare
versus like, you know, you know, someother industries, like what do what they

(29:07):
make care about in terms of, uh, themeasuring and putting security controls.

Yeah, it's, it's, um, more than the industry.
I think, um, like you call out, italso depends on what set of policies
and standards they're hearing to.
And then, yes, it goes by also theregions in which they are like the EU
AI Act or of ISO 42001, the differentregulations that that come in.

(29:35):
So there's no one size fits all, butthe more effective use cases that
I've seen, or the ones that have beendeployed successfully, factor in both
the subjectiveness of The standardsthat require you, uh, to adhere to
certain things like, hey, where thedata are stored, um, and sort of answer
the different questions related to thestandard along with the objectiveness

(29:59):
of the metrics that's being tracked.
Um, so the more successful use cases,um, uh, and they do vary across like
the healthcare and financial services.
Um, and, you know, even in the caseof retail, there are examples where
a combination of the two is needed.

So what are some of the warning signs that, you know, one
can, one can like actually see thatan agentic system may have security
vulnerabilities or monitoring gaps?
Like how can an organizationbe aware of that?

Yeah, I think the first one to look for, um, um,
is, is the data quality, right?
You want to make sure, um, you know,the, the model, um, data input and
the model, what, what it's trainedon is, uh, secure and robust.
That's, that's, uh, that's important.

(30:51):
And once you have those in place, um,I think you want to have an effective
testing strategy, um, to ensure that you,you defend against adversarial attacks.
Um, so there's even, even if, for example,there's a manipulation in the input, you
want to make sure that the security ofthe model and the system is taken care of.

(31:12):
Um, and then the one that we talkedabout about on model drift and looking
for any degradations and performance.
So continuously monitoring and lookingfor those key parameters is important.
Um, and from, uh.
The system application standpoint,um, you want to ensure that,

(31:33):
uh, the API endpoints are.
Are, uh, secured, um, again,data transmission is secure
and so on and so forth.
And you have robust, um, controlsfor both the authentication
and authorization piece.
Um, at the end of the day, I would thinkof it as, uh, as an employee, right?
An employee badges in and theemployee in many organization

(31:55):
badges out of the building as well.
And the next time you comein, you badge in again.
So you sort of reauthenticate and make sure.
That, you know, you are aware of,like, this person who is authorized
to do this particular job.
It's very similar to an agentic system.
Um, so you want to ensure that, um,another one that comes to mind is, uh, um,

(32:15):
the principle of least privilege, right?
You, you provide access onlywhen that's needed, right?
And very similar, again, to theemployee example that I called out.
Um, an employee may not have access to alldata, but when it's needed, You sort of
ensure that, hey, the person who reallyneeds that information has access to it.
So those would be some signs to lookfor when you're designing situations.

Got it.
So there's an audience question here.
So any specific frameworks, toolsyou're using for agentic workflows
to evaluate robustness and accuracy?
This is probably a good timeto talk about our partnership
between SageMaker and Fiddler.
You know, can you shareyour thoughts on that?

Yeah, no, absolutely.
Like, we're thrilled tobe working with Fiddler.
And at the outset, you know, partnershipis something that's absolutely critical
for AWS and SageMaker specifically.
Um, as we look at extending thecore AI/ML capabilities, um, and

(33:11):
to provide specialized solutionsfor different industry needs.
I think, uh, partnering with a companylike Fiddler is absolutely paramount.
Um, and what the intentis really simple, right?
We want to make sure the bestof class solutions are available
to our customers, right?
So with Fiddler, we've combined the powerof SageMaker AI, where you can train
and deploy your models with Fiddler AI,Fiddler AI, which brings in observability

(33:36):
to monitor and improve the ML models.
So, so net net customers have a one clickway to do observability with SageMaker AI.
Um, and this experience is availablein SageMaker Unified Studio.
It provides a seamless experienceand, and I'm pretty excited about
how customers can use these twocapabilities in a seamless manner.

Absolutely.
Yeah, we share the same excitement.
And for those of you who are on AWSSageMaker today on this call, feel
free to, you know, use the one clickexperience and one click integration
that we built together with workingwith AWS with Fiddler for monitoring
and evaluation of your AI models.
So let's actually, you know, maybe takea few more audience questions here.

(34:18):
There are some questionsaround different industries.
There's a question actuallyabout code migration.
We touched, touched uponit early in the call.
What are some of the best practicesfor verifying large code changes or
migrating from one language to another?
This is, I think, usingAI based code migration.

Yeah.
Um, I think the specifics actuallydepend on the language itself, right?
Depending on whether you're looking at amore modern language or like a traditional
language, like COBOL, for example, right?
Um, so I think given that
While the migration is being assisted,you want to look for, um, patterns of

(34:59):
like translation between the two systems.
Sometimes the logic may be inherentlycomplex, so there's human in the loop,
there's assisted AI that comes into play.
Um, you should definitely try out someof the tooling that's already available.
Um, with Amazon queue, um, we recentlylaunched a train went the ability to
look at the system workflow end-to-end.

(35:21):
Um, and there are obviously piecesaround security that that's very
specific to the organization as well.
Um, so in terms of best practices,I believe there's also a,
um, detailed documentation.
Um, we can find a wayto share that with you.
Um, on, on what does need tobe looked at as, as you do

And so, uh, there's another question on like specific industry.
Could you shed some lights on, on businessuse cases within financial services
or FinOps and plus, uh, you know, uh,where AI observability makes sense.

Yeah, I think there are quite a few.
Um, you know, the top two or threethat come to mind are the automated
financial reporting that I called out.
Um, you know, I mentioned about, uh,uh, Moody's use case about generating
credit reports or cognizance use caseabout mortgage compliance workflows.

(36:18):
Um, demand forecasting, uh, isanother one, um, that's, it's sort
of relevant in the context of,uh, financial services as well.
Um, and more generally, I would sayincident management that applies
across different industries is, isalso relevant as you look at more data.
And you want to uncoverinsights from that data.

And then another question from the insurance industry, you know,
beyond models, what recommendationof metrics would you have, for
instance, for claims processing?
Can you explain specific measures yousuggested to clients and share what
your assessment is on the qualityimprovements of business outcomes.

To be honest, I'm not from the insurance industry,
so I'm commenting on that.
Um, that said, I'm happy to takethat question back and come back
if we have the contact information.
I don't, I don't represent theinsurance industry, so I just don't
want to give out the wrong answer.

So Priya, feel free to reach out to Uh, Karthik on further information.
Awesome.
So I guess, uh, you know, finally,as we sort of get into the the last
few minutes of the podcast, right?
So, you know, what, what are some ofthe things like, you know, maybe like,
uh, sort of a life cycle workflow when,you know, You know, organizations that

(37:33):
thinking about, because life is movingvery fast in the last few years, you
know, you were talking about ML andall of a sudden there's generative AI.
Now there's AI agents, like, youknow, when an organization thinking
about it, how do they go about it?
You know, you know, implementing thesethings, what should be the priority?
What should they be the best practices?

I think the playbook, if you will, right.
It's certainly, there are a few commonthings across these different systems,
and I'm sure there will be a lotmore coming in the next few years.
Um, but fundamentally, I think what hasnot changed is starting with data, right?
Um, I, I can't emphasize this enoughthat the, the better your data is,

(38:12):
you know, the better pretty much yourAI model, the genetic system, all of
the goodness that's, that's out there.
Um, so have a robust data infrastructure,quality data, um, that feeds into
your machine learning processes.
Um, and if you're starting off with, um,you know, GenAI and agentic system, uh,
I would start with one high value usecase, uh, prototype it to your business

(38:37):
problem, demonstrate the value quickly,um, and taking it to the next level,
you want to establish the, um, thenecessary MLOps foundations saying, how
does monitoring play into the system?
What does versioning mean?
Um, how can I go fromone version to the other?
These are fundamental as you thinkabout taking a system from just a

(38:58):
POC to a production, these play inand just building on that and very
relevant to the topic of today is, uh,looking at the governance frameworks.
Um, what does it mean to have a simple.
Approval workflow that needs to be setup as you're scaling through the system.
And a lot of this, um, alsorequires that you invest in

(39:18):
your own team and train them.
So they are aware of the differentelements of taking or going live with
these different systems in place.
Um, just with AWS, there areenough training and certification.
I'm sure Fiddler has their trainingand certification available.
So those help you buildyour internal expertise.
And, and finally.
Um, plan for scale, right?

(39:39):
What worked for you when you startedoff with a small system may not be
applicable when you go for like a 10xor 100x of what you intend to build.
But there are, the goodness is thereare enough enterprise features at AWS,
SageMaker, in Fiddler, that help youscale as you go through this journey.

(39:59):
In converse, what you want to avoidis rushing through a system quickly to
demonstrate value, not having a gooddata or like a data quality approach, um,
not in engaging a lot of stakeholders.
Um, and, and then you, you have verylittle insight into how you would do

(40:19):
maintenance or upgrade or deployment.
So the, that, that is arecipe, recipe for failure.
So as long as you avoidthat so on the fundamentals.

No more colloquially don't do vibe checking, you know,
vibe checking and vibe testing ofyour models, you know, actually,
and actually know what you're doing.
That's a great point.
So I guess actually it'sa very related question.
Someone is asking, you know, thingsare moving really fast, even for
us in the technology area, right?
Like, you know, what type of problemsin two to three years within AI agents
that you, that will keep you up at night?

(40:48):
You know, what do you foresee?

Yeah, so that's, that's, um, there's so much I could
predict, but see, that's a question,you know, I ask myself every day, right?
Fundamentally, you know, I goback to, um, you know, back when I
joined AWS many, many years ago, um,there was this interesting quote by
Jeff that still resonates with me.
It's something around, "hey, what, whatwill not change" as opposed to saying,

(41:14):
"hey, what will change that part?"
The second part, what will changeis sort of, you know, each of us can
debate like for hours or days together.
Um, but what does not change isfundamentally customers asking for better
value and what it translates to somethingthat's more performance, something
that's robust, something that's secure.
Those fundamentals are not going tochange or something that's cheaper, right?

(41:36):
Uh, uh, like the way Bezos put it, noone's going to come to you and say,
hey, give me something that's moreexpensive or slower to perform, right?
So fundamentally looking at thesystem, um, and seeing what value it
adds to your business use case, whatdoes it translate to your customers?
I think those would be paramount asyou look at these, uh, innovations

(41:56):
that are happening in GenAI industries.

Yeah, and there's an innovation happening across
like small and big players, right?
So there's, you know, there's a questionaround how, you know, small, you know,
there's a lot of, you know, new AIagentic applications that are coming up.
And, you know, how do you thinklike they're playing within the,
you know, within the, you know,the big, you know, big players in
ecosystem, you know, you know, buildingalso building agentic workflows.

(42:19):
Any thoughts on that?
How AWS might be encouragingon the ecosystem side as well?

Yeah, absolutely.
And I think, um, one isdefinitely through the partners.
We work closely with, um,companies like Fiddler.
I think the second dimension tothat question is um, AWS providing
the choice to the customers, right?
So there is not a single model that wesay, hey, this is what you need to do.
That's something that um,you as a customer can decide,

(42:45):
um, right from DeepSeek.
to, um, the latest Llama models toour own in house Amazon Nova models.
You have all of those available toexperiment and try for your use case.
I'm sure a lot of it will be applicableeven in the world tomorrow, where
you have the choice in choosing thebest of what's applicable for you.

Awesome.
Great.
I think, uh, with that, we arecoming to the end of the podcast.
You know, thank you so much, Karthikfor spending time with us today.
Um, you know, I think one of thethings that I took away is that quote
that you mentioned that Jeff said,like, what is not going to change?
And I, I believe what is notgoing to change with AI is going
to be, whether it's your simplestatistical model or deep learning

(43:26):
model or generative AI or AI agents.
You need to test it properly.
You need to monitor it properlyand you need to make sure it's, you
know, it's secure and, and, and it'sworking for your, for your business.
So, and so I think that's,that's not going to change.
Um, I think I, and so I think, youknow, that's, that's kind of where
our partnership with Amazon comes in.
And so, you know, thank you so muchfor being on the show today and, um,

(43:50):
you know, look forward to, you know,further more conversations in the future.

Thank you for having me, Krishna.
This was great chatting with you.

Awesome.
Thank you.
Thanks everyone.

All Episodes

Episode Transcript

Popular Podcasts

United States of Kennedy

Dateline NBC

Stuff You Should Know

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}AI Observability and Security for Agentic Workflows with Karthik Bharathy

Episode Transcript

Popular Podcasts

.css-r6mb8g{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:1;overflow:hidden;}United States of Kennedy

Dateline NBC

Stuff You Should Know

All Episodes

AI Observability and Security for Agentic Workflows with Karthik Bharathy

United States of Kennedy