EP. 270 Cisco’s Approach to Developing and Governing AI Agents

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
(00:06):
All right, guys. Thank you for joining us today.
So today's we've got very good session.
It's going to be jam pads full of a lot of information.
But today we've got Mr. Farr Kadioglu, a lead data scientist
at Cisco. He's been at Cisco for six
years, which is 18 years in in the world of AI.

(00:29):
He's going to talk to us about Cisco support case management
system, which covers which has been built with RAG and it's
also an urgentic AI application.So today you're going to, you're
going to hear from Christopher. He's going to go through some
slides. Then he's going to show us a
demo of this tool. And me and Christopher are just

(00:51):
going to nerd out and I'm going to ask him some very good
questions. But for now, Misopher, can you
please introduce yourself to theaudience?
Thanks so much, Richmond. Thank you so much for having me
today. Hello everyone.
My name is Mustafa Cadaldo. I'm elite scientist with Cisco.
I have as Richmond said, six years background with Cisco,
right? And I'm solely work on the NLP

(01:13):
domain. And then I have been involved in
like almost 30 different projects till now and I'm I'm so
happy that some of them went to production.
They are still in production. And then we are still trying to
push more models to production, especially especially once we
get this, you know, after we getting this gin AI frameworks,

(01:38):
right? Yeah, it says like the short
introduction Vishnut. Yeah, thank you for that, Mister
Far. And this is not the first time
you're coming here. I think we we did a session like
this last year around maybe thistime I'm not mistaken.
So you're regular at this point.Thanks so much.

(01:59):
Yeah, that's true. OK.
Oh, no, no, let's not keep the audience waiting.
Let's get to the presentation. We're going to take us through
what the Cisco support case management is and then we'll
talk. We'll go through a demo, then
we'll I'll ask some questions that probably the audience are
thinking. Sure.
Thank you so much. So I like, I will try to make it

(02:20):
as short as possible in the demos like the presentation site
that we can jump into the demo. I hope demo will work.
And then I'm going to give you up like a quick heads up, right?
SCMAI capabilities, SCM, the support case management at Cisco
that we have a big team here to support the tickets, I mean

(02:41):
support case, meaning the tickets from the case creation
until the case is resolved, right?
We have different AI tools, which I'm going to explain it in
a bit. Some of them is generative, some
of them is predictive, right? And in that SCM, we have an
propositional design platform, which we call it bulb both, both
education framework that I'm going to touch up on that.

(03:03):
And then retrieval organize generation solutions.
We have some of them right now are in production, some of them
is right now reinstating environment and then final
testing is ongoing and then theywill go to the production as
well. And then how we integrated the
RAG with BUFF as a service, right?
So somewhat like from internal with someone from Cisco can come

(03:26):
and build up their own RAG modelwith low code or low code.
And then they can also start interact with their own like the
documents. And again, this is the new
topic. As you know the agentic
framework integration with BUFF,we have two the different right
now the agentic frameworks, agentic AIS, autogen and land

(03:47):
graph that these are right now agentic AI as a service
integration into BUFF and then Iwill have a demo and then
trainees. So as a high level in the
support case management, we havedifferent models right now.
You see like the some of them are internal, some of them are

(04:08):
external, some of them is production deployed, some of
them production ready. And you see that we have another
legend which shows that hey, some of them like the classic
AML models from I mean the classic to neural network
architectures, some of them has generic model itself, right?
And then AI and ML combined, generic combined.

(04:31):
And we have the agentic AI as well.
So we have chat bots in the bot platform.
Right now we have more than 50 external chat bots which
supports the customers. And then we have huge amount of
traffic there, right? And then right now the chatbot
is on Rasa platform that we havebeen switching from Rasa to like

(04:53):
the agentic framework in that above.
As I said, we have RAG as a service platform, like it's
integrated and we have differentRAG models like the learning and
certification, which is in production right now, Webex Chat
Assistant, which is really a big, a heavy, huge rank
implementation, which I'm going to explain it end to end, right.

(05:15):
And then we have the agentic framework as a service.
I would like to call 1T collab. One thing, Richmond, with the
new LLMS and the Gen. AI capabilities, I believe the
data science life cycle also gotchanged drastically.
What, what I'm trying to say is that normally in the past we had

(05:36):
an EDA, you know, data extraction and an EDA and then
model training and deploying themodel in production and
monitoring, right? And that again like analyzing
the output from the model, like performance analysis, again like
the created data set to retrain the model and then train the

(05:57):
model and then evaluate another platform and so on and so forth.
That that life cycle to me is drastically changed.
If you are talking about the clip like the classification
modes, how with the search engine with the LLM capabilities
right like the keyword and then the semantic search
capabilities, we can easily scale like the imbalance data

(06:19):
set with a hybrid search right semantic and lexical search and
then re ranking them as an classification model.
If we deploy this model into production, whatever we do see
in the back end, right, the model like the misclassified and
input, we can easily go and check what's going on in the

(06:41):
vector database, right? And then we can easily like they
refine the data with no downtimethat it will be reflected in UI.
In this case, the vector database, especially the Manga
plus vector database is our heavily used platform or tool
that we do build these kind of like the classification models,

(07:04):
classifiers, right? With different logic, not only
the hyper search, I'm not talking about the weighted RF
type of thing, but we have different calculation
methodologies, methods to identify the proper intent from
the vector database. So in this case, we don't have

(07:25):
this lifecycle. What we need is that, hey,
what's the output from the model, right?
Just analyze it. And then where this output comes
from, whether it's an or like the proper, let's say prediction
or wrong, we can easily identifywhich data point in that data

(07:46):
set makes that trouble. And then we can fix it because
as you know, like the in Academyor in like the trainings,
whatever the data set you have mostly well classified, right?
Well, like the refine and then clean data set, but in industry,
all mains still the main struggle and challenge is

(08:09):
cleaning the data and then make sure that the data is properly
labeled, right. Still we have these kind of
issues. And then as I said, vector
database is one of the like the good solutions for us at this
stage. So in the diagnosis pieces,
Cisco support services, as you see, we have many tools like the

(08:30):
summarization of a case, right? Like creating a knowledge base
to identify what might be the best technical troubleshooting
steps for a specific or technical input from the user.
Say that in that life cycle of acase, we got some technical
questions from the user that like the engineer should go and

(08:52):
check if it is not in his like in knowledge, right, check the
historical data, right? And in this case, in the past,
we had search tools that it brings up the kind of, you know,
raw data from the historical data set that engineers should
have gone through and then analyze like the analyze what

(09:12):
was the issue now with the new approach RAG and type of things,
right. We also like to recommend what
might be the best technical troubleshooting steps to to
suggest the engineers like not only the RAG, but one, one level
above, right. And in this kind of use cases,
we still leverage vector database like the OR graph or

(09:36):
like the vector database, right,Whichever fits the recordings.
So I'm switching this, yeah. Premium so far.
One thing I want to get really clear earlier on out of the way
is within this system that that you have at Cisco your your user
MongoDB is both the operational and vector database right?

(09:59):
Mongo DB is hosting the vector data and the operational data
for their system. Yeah, I mean, as I said, right,
So like vector databases is so famous with the REC
implementation meaning that hey,here is the like you dumped the
data into a vector database of the chunking and splitting,

(10:21):
right, splitting and chunking. And then hey, like bring up the
relevant topics from the from these regular database like the
chunks and then pass the genericto generate the answer.
This is one thing. How about the semi structured
data? Say that you have a tax and then
we have a label loaded tax, right.
And then I need to train a classifier like say I have a

(10:44):
200,000 samples and then let's say 20 different classes, 20
different actions. And this data is imbalanced
right? We have majority classes, we
have minority classes that I need to train a classifier model
like multi classifier. And then whenever I deploy this
model in production, I have no visibility.

(11:05):
I know the output from the modelbut why that model generated
that output based on which data samples right?
If I instead of going with this approach and meanwhile you need
to handle with this imbalanced data set right?
You need to make sure that modelis not overfitting with the

(11:27):
majority class right? If you go with the vector
database approach, you will see that hey, this is the most
relevant historical data set or data point which generates which
makes this misprediction and go back and check whether this data

(11:49):
is properly labeled or not, right?
If it is the case, so why this data is so close to this one?
We had like the wrong class, wrong action.
So all these things we can also configure in the back end and
then instantly we do reflecting in the production.
And not only the like the RRF type of let's say, search or

(12:13):
rerank approach we have, right, We do apply different
techniques, which also puts moreweight to the, let's say,
minority class, right? So if let's say we just fetch
the the 1st 50 most relevant data points like samples, like

(12:34):
data from the based on a query. So if we say that there are some
minority classes right there, wehave less sample, then we just
promote these ones to generate the final answer.
And the good thing it is not done yet.
You can also pass the the top end to Gen.
AI to get the final result, which will generate better
result than the the classification model.

(12:56):
And then you can also tweak withthe prompt, right?
Whatever you do in the back end instant it will be reflected in
production. Say I have a misclassification
and then it it makes an trouble right now I don't need to wait
the next release cycle. Hey, go to the vector database,
clean that data, refine it. If the like the minority class

(13:21):
like it does not pop up in in top five, create some synthetic
samples and then index it into the vector database that it will
pop up right with the similar inputs.
We are good. Now we can see the whole process
in the back end and then easily take an action Yeah that there

(13:44):
will be function in in the in production.
Thank you for that explanation. And one thing someone is asking,
hey, can we ask questions in thechat?
Yes, you can ask questions in the chat.
If you have any question based on what Mr. Ford showing on the
screen and saying, feel free to ask the question and we'll
answer it when we have some timeto answer it.
But yeah, do answer a question, do ask questions on the chat.

(14:08):
Missa, can you show us that bot system that you're going to talk
about? Yeah, like the Bot Unification
framework, as I said, it's a secure environment which is open
and ready for cloud. And then data encryption,
address and motion in motion andauthentication and authorization
is all configured, right. And in that bot, we have the

(14:29):
bot. As I said, we have more than 50
bots and most of them are customer pacing.
So just like give a pause here, right, internal and external,
two different things. Let me give you an example.
If I want to spin up an internalAI tool, right, kind of as a
data scientist and I am, you know, I'm a bit, you know,

(14:51):
relaxed that hey, if something happens, it will not found the
company's like the reputation, there will be no incident at
all, right? It will not get escalated.
At least someone will come to meand say that, hey, the
prediction is wrong. And then could you please go and
check it? If it is a customer facing
model, then we have lots of parameters, many parameters we

(15:13):
need to take into account, especially with the Gen.
AI, right? So we need to just consider how
these guardrails are properly identified that mark, there will
be no incident in production, right?
That's the reason why at the beginning we were a bit
conservative that hey, the RASA intent identifier, intent
classifier has the guardrails anyway.

(15:34):
But this is a kind of trade off,right?
Like the mechanic or human like answer from the AI or the
guardrails, how you should like to configure these guardrails,
these two different things. And then you need to like
balance these different, let's say considerations in a way that
it will work fine in our use case.

(15:54):
The bot platform right now is anhybrid solution.
We have the Rosa like the classic AIML conversational
chat, and then we have the Gen. AI like the conversation AI,
right? And then for the conversation
AI, we will also identify the guardrails in a way that it will
identifies the prompt injection gibberish profanity like the

(16:20):
other, let's say, domains like, for instance, military
terrorism, like economics, humanrights, gender equality, all
that stuff. And indirect whatever we
created, right, We created custom answers like, for
instance, both like the rack in a book, like the integrated

(16:42):
generates different answer for input, which is identified as
terrorism related or it is mental health related.
We say that sorry, you're struggling like a mental
problem, blah, blah. So for every each different
guardrails, we like customize different answers to make sure
that the bot like there is more empathy, right?

(17:05):
This is what I'm trying to say less scalable.
So so it's built to handle peak usage and then extensible for
handle multiple use cases, leverage common skills and
knowledge. So you will also see that we
have the knowledge libraries, tool libraries, right?
And then the like the action library is that you can easily

(17:29):
configure it and create your ownbot.
Anyone in Cisco can come and build a simple this like the
import simple use case with no code.
If it is more complex that you have workflows, you have API
calls, multiple API calls in onesession, right?
And then you you need to also like create all these logics,

(17:53):
right? Logic gates in the back, back
end. Then in Russia, yes, you need to
go through the hot base and thenidentify all these skills in
skill bundles. And in Gen.
AI like the agentic framework you can you should also identify
the either you will go with single agent, multiple agent,
what kind of tools you need to integrate all these agents.

(18:15):
As you see, we have different technologies here, AWS there
because we keep some data in AWSAtlas vector and the Mongo DB is
our main like the databases for hybrid search.
Whenever we have an FAQ, right? Most of the time we do go with
this Atlas Vector search, MongoDB Vector search and then

(18:38):
we have Chechi PT, Autogen, Landgraph, all the different
technologies and the tools. So I'm switching to how this
both bot unification framework impacts, right.
So from 2019 till now you see that from this is from FI 24.

(18:58):
But roughly we had we had 1.2 million unique users and then
product bookings is above $500 million and then they call or
chase deflection one around 1.4 million and then the estimated
cost cost savings is around $30 million.
So it's a huge platform externallike the customer facing

(19:20):
platform that it runs or almost and health edited.
So this is the bot, both bot admin UI, right?
And you see on the left hand side, we have the LM knowledge
and then we have the agentic AI that you can go and build your
own rack as well as your own agentic AI under different,

(19:45):
let's say, frameworks. So whenever you go to go to the
above NLM knowledge, you will see three different UIS.
The first one is the main information about your rag, like
what's the name of your rag? What's the description of your
rag? Who will do contributors?
You can also add more contributors across your

(20:08):
company, across Cisco that theseguys will come and test the
model, right? And then another thing that
toggle you see that you could enable.
Query at KB articles here knowledge articles, meaning that
you can easily start interactingwith your documents and FAQ
generation is another thing you can also let the Gen.

(20:28):
AI to create FAQs from your documents, right, How to type of
questions that generate also go other than this enabled query On
top of that, it can also generate FAQ questions for you.
And the other thing is that we have like a custom prompt and
you can also enable your like the whatever the default, right.

(20:48):
You can also switch to the your custom prompt that you can go
and modify your prompt generic configuration on the right hand
side, we have the like the buff also offers LLM API keys for GPT
type of models like GPT for miniright.
You can also bring up your own API keys that if you want to go

(21:12):
with the Gemini of these different LLM models, right?
You can also configure in a generic configuration page.
And finally at the new document page, you can come up and upload
your documents. You can come up and like the
upload your link, right, The SharePoint link, whatever the
content resides, right? You can go there and then like

(21:35):
if you want to go with the bulk upload of the links, we have an
Excel sheet that you need to download it and then Add all
your URLs, right? And then upload it again and
then trigger that process. Then like you will have all
these contents beneath that thatlink is processed and index it

(21:57):
into a vector database. Yet you will start interacting
with your VICOM here as you see,right?
No code, no code knowledge, article generation, contextual
understanding in drag which is another thing.
So this is not just like the askquestion and get the answer, ask
question, get the answer. You can also like the Latigen AI

(22:18):
or the rack keeps N number of previous interaction with you
and then the AI in memory, right?
So it understands the previous context and then based on the
previous context it also generates the answer.
OK. So one thing I wanted to touch
on is for those people who are just maybe joining us now, what

(22:42):
Miserable is going through is anidentic framework that he built
for internal users within Cisco to be able to build any agent
for any use case. So we're just going through some
of the UI components and Miserable, I have some questions
I want to ask around. Were you the one that decided
what the UI looked like? And how did you come to the

(23:03):
components of? How did you come to your
decision if you were the one that decided what the UI looked
like? So UI so basically, I mean, it
is it's an evolving process right now the the current UI you
have what we did like what we thought that it must be user
friendly. And then anyone who has no like

(23:27):
The Reg or Gen. Gen.
AI knowledge or experience. I'm not the experience, right,
Should come to like the might come to our platform, right?
Buff and spin up a rag model with no code, like with without
like the getting most there, right?
It is really simple. Hey, name it, give me the

(23:49):
definition and then description.What do you want?
Like like chat with your documents, create FAQ or go with
your own custom part. And do you have an API key?
Yes, no, if you have the API key, please provide right And
right now is it one thing I missed here?

(24:10):
But you see these all the document types video process
like the PDFHTML, MP4, PPTX and then like the word documents
that if you have a video that you want like the create a rat
from that video. You can also come to this

(24:31):
platform and up to 300 MB you could upload your video that we
will transcribe it and then index it in a vector database.
This is pretty simple to me. UI that people comes and like
builds their own rack solutions and stop interacting with it.
Yeah, it's very intuitive and I'll probably ask some questions

(24:54):
around the stack you're using. Well, how are you?
Are you doing speaker diarization when you're from the
audio? We can talk about some of the
technologies later, but I'll letyou carry on with the
presentation so we can maybe getto the demo as well.
So sure, buff regular service inproduction as you see, right,

(25:14):
This is the WCA model, like the Web Chat Assistant LLM model,
which is a Rack implementation and it's right now in
production. If you go to thehealth.com and
then log in, then you can utilize this model.
And there's another one learningand certification model which is
also in production that we constantly monitor what's going

(25:35):
on in this in this model, I'm just going to show you like the
back end, right? This is not just only like a go
to the logs in, in production space or like the go to the logs
in the database and check it, right?
Whatever is happening here, oncewe get a positive feedback, once
we get a little bit negative feedback, once we get an

(25:56):
answered question, strike model might also cannot generate the
answer like might not generate the answer.
And then we also get a notification that hey, this
answer is not we don't have an attack.
I couldn't create an answer for that specific question.
And in a daily basis, these two models have the data ingestion
pipeline, auto data ingestion pipeline that we do identify

(26:20):
which source documents are modified, deleted or newly
created that it, it, these are reflected in the vector
database. So sorry for that.
I'm just going to give you a quick, really quick technical
architecture of the WCA, right? The Rack.
So 3 pillars here, indexing, inference and caching.

(26:41):
By the way, our Rack solutions have caching capability as well.
So two different API calls from the data controller, one of them
is for the metadata, the other one is for the content.
And then we combine them, meaning that we do generate the
answer based on the persona, based on the product, based on

(27:02):
the operating system or that specific domain, right.
If the customer is like end useris an administrator, the answer
is different than the user answer or the partners.
So all these metadata and then the data like the text data is
combined and incorporated in a way that it goes to the vector

(27:22):
database after getting the embedded strike.
Then we do like the filter out the search or narrow down the
search based on the metadata we get from the user that we hit
the proper chunks to generate the answer.
And the guardrails is another thing as I said, right
guardrails. Right now we identified 1516

(27:44):
different domains of these guardrails that like the model
identifies and generates different answers once the
answer like this is basically and the caching, I'm just going
to explain what caching means and how we do it right?
So these are the feedback spacesin Webex.
So whenever we get a negative feedback in production on non

(28:06):
plug environment, right? So we directly reflect this
feedback in a Webex space. And then the SM ES also see it
right? Hey, if you look at here, we do
see a negative feedback and the question was how do I generate a
meeting usage report for a user and generally generated an
answer. And if you see here right, we
have the personal administrator product Webex meetings and

(28:27):
webinars. There was one applications
device, a persona flag true meaning that hey, it generated
that answer for the like. For that specific persona
administrator, we have an fall back persona flag as well.
Say that this answer, the user came as an administrator and we

(28:48):
couldn't find any answer for it.We direct the switch, automate
the switch to the user one levelBLOB to generate that answer.
If it is user and we don't have any answer for that, then we
just search for administrator and generate the answer.
And then these unanswered questions for that specific
persona also is shown in the webbackspace.

(29:10):
And you see here we have two things, right, Discard and
accept for positive feedback andnegative feedback.
This is for the SMEs to justify the feedback.
Hey, like it might be an arbitrary feedback for both of
them. Like for instance, the the query
came from like the the Gen. AI and then the customer, even

(29:33):
the answer was good customer didn't like the narrative,
right? They gave a negative feedback in
this case SME also sees this andthen either justifies that hey,
customer is right and this is not the answer of this question
and accepts it or discusses thatthis answer looks good.
And then there is an arbitrary feedback here, right?

(29:53):
So for instance, if it is talking about the positive
feedback and then we get a justification from the SME,
right? Hey, this positive feedback is
good. We directly index it into the
cache vector database. And next time if the customer
comes with similar questions, not identical but the similar
questions based on the similarity between question and
question, question and answer, we do provide this answer from

(30:17):
cash, right? So we get a cash like the output
from cash and then customer didn't like it.
Maybe the answer was not that one or that specific question
and give a negative feedback. In this case, if this negative
feedback feedback is justified by the SME a customer is right,

(30:37):
then we delete this feedback like the question answer paid
from cash innovative. Next time customer comes with
similar questions or identical questions, we are not going to
hit the cash vector database, but we'll go to the black
pipeline. And as you see here right in the
big like the screen in a daily basis, we do run and auto

(30:58):
ingestion pipeline which identifies which articles
deleted from cache vector database, which articles updated
the content wise updated in the cache vector database and which
articles recently added. As of I think yesterday we had
one article which is recently indexed into the cache vector

(31:18):
database. I'm just skipping this slide.
One thing on it to ask me Safariis what is the time to live of
content within the cache? So, OK, good question.
Now I'm coming to that piece. So this is the cache, right?
Yeah, I'm just showing a schema in cache.
So we have a personal product. All these metadata is mapping,

(31:40):
keeping cache, right? And then we have like you see
here, different queries. This is a kind of a synthetic
but like do not look at the timeline, but the first one got
a positive feedback from the user and then the engineer also
justified it, right? And we have all these documents,
source document, external IDs, what was the score?

(32:02):
The feedback was positive and wecreate an hash code but a unique
hash code if you go up here, right, so you will see that
there is a hash code here, but just here hash code which is
unique for that specific input. So, and then it identifies as an
identified as an primary variation steps.
OK, so next time customer came up with a similar question, how

(32:26):
I change my virtual background during a meeting, which is very
similar. I will not upgrade update my
virtual background in a meeting.Then the answer came from cash.
Right now customer gives anotherpositive feedback.
Hey, this answer is good, but under different questions.
Now in this case, before the same metadata, this answer comes

(32:48):
to the same chat hash because like the answer came from cache
vector database with the same hash code as a secondary and the
third one comes secondary and secondary, right?
Right Now we have 1 primary, maybe 10 different variations of
that question as a secondary row.
Why it is important? Now I'm coming to your question.

(33:11):
Say if this source document is updated, right, what we are
going to do with cache? So we have hundreds of
questions, right? Then we just only focus on the
primary one where this answer iscreated based on this document,
like source document A for instance, we have 200 questions,

(33:32):
other 200 questions, maybe 50 questions is created based on
the source document and this source document is updated,
right? Or deleted.
Now go to the only primary and then replicate the same
scenario. A persona is user provide this

(33:52):
Webex meetings and webinars. Here is the question.
And now we have a new answer. Because the cache vector
database is updated, we don't have that document anymore or
that document content is updated.
Right now the new answer is created it two things.
The first thing is that we evaluate the model, right.

(34:15):
We evaluate the model based on sound like the answer relevancy,
context relevancy and paid forms.
If we have the ground truth answer, then we have more let's
say metrics, right? If everything is good and then
the the scores is above that threshold, we directly reflect
this answer. The new answer in cache vector

(34:35):
database, not only the primary but the secondaries, right?
Like the whole bunch of 50 questions right got updated with
the new answer. If it is not right, it couldn't
pass the evaluator. Then it goes to the SME.
And in that cache validation, I said, like the Webex space SME

(34:57):
also sees that, hey, I have a new questions like the question
from cache and a new answer, right?
Because the the content got changed.
Or do we have a new like the content got deleted now once the
SME endorse it, right? This, this answer is also good.
It's directly reflected in cachevector database whenever we only

(35:21):
process the primary, but we reflect the answer to the
through the second reason. This is how we manage the cache
vector database with the like the dynamic data processing.
So if you, if you. Yeah.
So one thing I wanted to ask is very succinctly right, Can you

(35:42):
summarize what the benefit of using Symantec Cash within this
framework, what the benefit has been like if you can just
mention maybe 2 benefits? So one benefit is like the
latency still by the way, to be honest, I will be honest with
you guys. So we have GPT 4 right now for
the WCA, we are switching to theGPT 4 O right, Everything got

(36:07):
changed. I mean like the GPT 4 will be
like the decommissioned or depreciated by the by 6th of
June, right? Then we need to switch from GPT
to GPT 4-O or 4-O mini or Gemina.
So we need to have an extensive testing right to to see how the
model performs like we did the content we have GPT 4 as you

(36:32):
know is a small model, right? We have, we are seeing like the
really high latency. Then whenever we catch it, the
output from vector catch vector database is like subsequent if
it goes through the rack, it takes 15 to 20 seconds to
generate the answer based on thelength of the prompt.
So how many documents we do process for that specific

(36:55):
answer, then it also impacts thelatency.
This is one thing and the other thing as you know, this is the
cost. So say we have admin like the
personas as well, right? If it is administrator, we hit
the vector database, get the data and then hit the Gen.
AI. Gen.

(37:15):
AI says that sorry I couldn't find the answer, switch to the
user, get the vector database from the data and then like hit
the Gen. AI again and finally Gen.
AI says that here is the answer.So like roughly we do test
around 5 to 6000 tokens as an input and completion as you
know, 3 to 500 tokens as a completion is that each time we

(37:37):
do pass up to 10,000 tokens to Genea.
And then if this model is heavily used, then hey, here is
the cache. But the the answer comes from
the cache. And then we will what we will do
is that just make sure that based on the new content right
the the updated living content cache is checked up to date.

(37:59):
So one thing I wanted to also touch on just for the sake of
time, we'll talk about the, we'll go into the demo.
And when we go into the demo, I really want us to talk about how
you view memory within agentic system, your perspective of it
and some of the the techniques that you're using to implement
and manage memory within agenticsystem.

(38:21):
So I'll hand over to you, then we can really go.
Into that topic. OK, so like Val, I will I was
going to explain the the agent, but let me show you a few
examples like OK and while. We're waiting for things to get

(38:50):
started. Pedro, could you please remove
the the screen sharing just while we get it off.
OK, thanks. One thing, one thing I wanted to
while we're waiting guys, you can ask any questions you have
on the chat. Mustafar.
Hey, give me a thumbs up once you're ready to go with the
demo. I'll keep I'll keep the audience

(39:10):
warm. We've just gone through a bunch
of components and techniques. Mustafar's used to build one of
the agentic framework within Cisco and which Mustafa
mentioned powers over 50 chat bots in production today.
So you can go to thehelp.webex.com and you will

(39:32):
be able to use some of these chat bots that are built within
Mongo DB using Mongo DB as that operation on vector database.
So Misafa, are we good for the demo?
Yeah, right now I'm just going to show a few things sometime.
We will start with the demo, right?
Excellent. So as as I discussed like the we

(39:54):
have different notifications in the Webex space to inform our
stakeholders. Hey, like the we have two
different models and brought twodifferent environments and then
people go and ask questions, right And then it that's the
reason why we have two differentlet's say environments under
these these environments we havedifferent notification spaces

(40:15):
for instance. This is unanswered question.
So if. The Genea cannot answer this
question right. So we directly generate a report
and then let our stakeholders know that a for the specific one
and then we couldn't Genea couldn't generate an answer.
These are the source documents and then personal fall back
attempt to meaning that he triedto use the user and then it

(40:39):
couldn't generate the answer andthen it's switched to the
administrator. So there is no administrator
related notification here, meaning that we generated that
answer for the administrator, but user failed, right?
And so this is a UI where we test the model.
So if I come up here and then like as I said, normally in

(41:02):
production we has this knowledge, right?
KYC, who is the with the end customer, either it's a user
administrator or partner, and then we know what kind of
product they they want to go with or operating system or
devices, right? For the sake of the like the
demo, let me I think it's a bit as you see one, one more thing

(41:28):
right now it's GPT 4 and then wehave another space that this
also runs GPT 4 O so the stakeholders and then go and
like they check the performance of the model from both aspects,
right? If I say the how to change, I
mean my virtual background. So what it does is that the

(41:58):
first one is the current model in production, which is an FAQ
model, right? And then we do see what kind of
outputs from the FAQ model, which is basically a
classification model right now it is non fraud environment.
You might see a kind of latency here, right?
But what I'm trying to say is that this caching will like

(42:21):
eliminate this latency. I mean, if you are in the Gen.
AI domain, right, if you create these kind of like pipelines,
isn't that just only send the data to Gen.
AI to generate the answer and then hey, here is the answer,
right? This is not the case.
I mean this WCA has a huge like configuration in the back end

(42:44):
like everything. Whatever you see here, this is
end to end homegrown solution. We didn't use other than the
Mongata Spectral database, we didn't use any third party
orchestrators like the Lang chain or Lang Lang chain like
LAMA Index. The data ingestion pipeline is
end to end customized like the OK embeddings is open source

(43:10):
like I accept that and like the vector database and Gen.
AI. The rest is end to end our home
cloud solution. So this UI is created for our
engineers to go and check the model performance, right?
So they can see what kind of source documents we used.
And then they can also check, right?

(43:30):
And then the prompt, what was the prompt we passed to Gen.
AI like the document base. And then what's the computation
time like guardrails, vector search data extraction, output,
console data. You see it took 3 point 32.36.
So there are many things going on in the back end to make sure
that the answer is probably likethey narrowed down based on the

(43:52):
inquirks and Gen. AI also generated the answer in
30 seconds, the total 30 secondsand the input token we have
around 4100 tokens we used for this one.
If I say I found what I need, right?
So it goes to the cache vector database right now, but not
cache vector, sorry counts up here.

(44:13):
What's the feedback space? No.
Hey, now we have an answer how to change my virtual background.
So the responses that you changevirtual background in Mac OS,
iOS, Android and these are the user all that things, right?
Hedge code is here trust to be. I said, hey, this is a good
answer. Now it should it should go to

(44:38):
the cache vector database, right?
To index it into the cache. Right now it is indexed if I
check my background. Yeah, it's right now index like
with me. OK, so if I come up here I need

(45:06):
to change my virtual background so it and story it's quite
hilarious. No, it's all good.
And one thing I like about your your UI is you're really showing

(45:29):
the end user all information in regards to the process, right?
The computation time, the prompt, everything.
It really helps. It's very intuitive.
Yeah, like same thing. Now I I just went with the I
need to change my virtual background.
The first one was the IT has a type of mistake.
By the way, how to hinge my virtual background, right?

(45:50):
Right now I have a different query, but the answer also came
from cache vector database. If you see here right, the the
the output came from cache vector database in 0.31 seconds.
So if I give a negative feedback, same thing will
happen. It will go to the Webex space
and like the if I, I still need help, I need to also like

(46:11):
justify it, right? And then I, I could, I mean,
just give that, hey, this is fortest and then send this negative
feedback and it will come to thenegative feedback space And the
engineer will see, hey, there isa negative space here, negative
feedback here, right? And then we should either

(46:31):
discard or accept it. Accept meaning that hey, there
is no, there is no answer like the like the clean the cache
vector that rates right. Next time you don't want to see
this answer from cache, but go to the classic drag pipeline.
If discard, meaning that keepingcache, keeping cache and then we
are good with this answer. This is an arbitrated feedback.

(46:54):
So now let me switch to the agentic framework.
I will not have too much demo inthe agentic framework, but I'm
just going to show you how we built the agentic framework in
in both. Right.
Yeah. So it's it's absolutely fine
because we can also talk about some of the ideas that you have

(47:15):
for what you'd be building and what you have you have built.
So wherever you can't show, we'll fill in the gaps of work.
OK, sure. So as I said, this is a UI you
can come up and then build your own agentic framework with no
code or local no code. Meaning that if you don't need
any tools like the API calls to make right then you are good to

(47:39):
go. You can spin up an agent here
and then you can go to the LM knowledge space and then create
your own rack pipeline and connect with it, right?
So these are all racks right nowin the buff currently our
internal users created, right? So if I go to the licensing rack
and then go to the edit, right, whatever I showed you, you will

(48:03):
see here. So these are the generic
configurations and as well as the front, right?
And then you can start interacting with your licensing
board ASAP. So these are the documents
uploaded here. And then I can start asking
questions and I show you enable the debug mode, same thing with

(48:24):
the UI. Then you will be able to see
what was the question, what was the answer, what was the search
documents from and the computation time and enable
previous context. As I said, it also takes into
account the previous conversation and then generates
the answer based on conversation.
So this is one thing say that you created your rack here and

(48:44):
then you want to create an agentic AI, right?
And then you want to integrate this your rack into peer.
Now you come up here and then create agents wipe.
I mean scroll it. Let me let me make it down

(49:10):
right. So here you have different LLM
models, right? The British IT is our own Cisco
LLM model which is basically used for internal, right.
Then it you can, you can query with an API with the GP24GP24 O
GP24-O Mini and then we will have the Gemini as well.

(49:30):
So these are the LLMS. You can configure it with your
APIs. And in the tools we have
different tools we are going to create and like this right now
we have been creating a tools library.
So for instance, if you want to get your meeting recorded,
right, Yes, you had a Webex meeting and this tool is ready,
right? So yeah, hey, come up here and
then no need to create a tool like go and then fetch it or

(49:54):
integrate it on your agent. This tool is ready and then find
licenses by name, case details, right?
We have many tools here that we will consult the pile up.
Hey, what do you think we need to have?
By the way, there will be a rolebased access control.
This is the this is I think no one like just go and get the

(50:16):
like the very highly confidential data with these
tools, which is not the chase, right?
But you can also identify hey, come up here and then say that
device status, right? I have a device status tool here
that I can integrate with my with my one question I wanted
to. One question I wanted to ask,

(50:38):
Mr. Phrase, where are you storing the Jason schema?
The schema definition of the tools.
The schema definition of these tools and everything is stored
in the Mongo DB. So you're basically using Mongo
DB as like a like a toolbox basically.
Yes, yes, in the back end like whatever you see here, most of
the credentials and then all these like the critical data is

(51:01):
stored in a Mongo DB. So I mean or Atlas vector vector
search or vector database start.So in the back end, I mean I'm
not like exaggerating, but in the back end mongo DBR prospect
for search and mongo DB itself is heavily leveraged.
So one thing about like, I just want to talk about that design

(51:24):
pattern because that's somethingI've seen a lot of developers
use is where they use Mongo DB to store the Jason schema of the
tools. So you can scale the to use.
And the reason why you do this is because the NLMSI think open
the eyes guidance is just you use around you make you give
around 10 to 20 tools for the LMS to be aware of for your two

(51:46):
calling or functional chronic capabilities.
But when you actually put everything within the toolbox
within Mongo DB and you can use vector search to get the
relevant tools at inference time.
That way you could just scale the amount of tools your system
is aware of like Mr. first done here.
So that's using Mongo DB as a toolbox, your database as a
toolbox. Just wanted to like double click

(52:07):
into that, but I think for the last couple minutes we have what
is one of the most important things that you you one of the
key things you want to show the audience.
And then the second question is what's the most important?
What's the most important thing you want like an AI engineer to
take away? Going through building an

(52:27):
agentic framework. OK, like agentic framework is
one thing, but generally what I would like to like emphasize and
call out that like non prod and prod different takes internally
and external different things. All the external is different
are like the different things, right?
Like right now with the a Gen. AI UAT time increased

(52:53):
exponentially, right? Like the testing the model like
they're going back and then identifying the issues and all
the sub stuff, which is really, really longer process than ever,
right. So all these guardrails and
integrations and make sure that the model performs as expected
in production as a customer facing model, then you need to

(53:17):
spend like even three and four times than before, right?
So during this like the I think that the model creation is a bit
shorter, I would say about the UAT.
And then make sure that like themodel is ready for for
production. It takes too long, right?

(53:37):
This is one thing, and the otherthing is that it's still like
the black box for us that each time we get different answers
from Genia, you can tweak with the parameters like the
temperature, heat up, the like. Anyway, at the end of the story,
if even you pass the same input to Gen.
AI right, the output sometimes gets really quite different than

(54:00):
the previous rocks, but quite different than what you're
expecting. This is another thing then.
Then that's the reason why closed monitoring mechanism
should be established before putting this model into
production. An agentic framework, it is the
really scariest awesome tool, right?
I will say that in the past, there were many things that we

(54:25):
couldn't even imagine to make ithappen.
Right now, these, these these kind of things can be done
instantly with the agentic frameworks, agentic AI, right?
So let me give you an example. One of the big Gen.
AI, not not a genetic, but Gen. AI, we used to try what kind of

(54:47):
commitments an engineer did to users, right?
Say, Hey, I'm going to troubleshoot your problem and
get back to you by the end of today, like in two days or the
next week, right? Just think about right without
Gen. AI, how hard and what really
long process to identify the sentences from an e-mail and

(55:11):
then without like the rule basedapproach and everything and then
identify what was the commitmenttroubleshooting update you right
or whatever. Like have a meeting with you,
right? And then identify what's the
timeline Hey, when the engineer gave that commitment and then

(55:34):
just put an agenda that we will also notify the engineer before
that like the deadline use right?
And we will say that hey here 15minutes later you need to have a
Webex meeting with this one because like just think about an
engineer content like they normally handles 15 or 20

(55:55):
different cases in one day, justtypes any e-mail goes to another
one and checks what's the troubleshooting steps give a
commitment here. And then finally, he might or
the engineer might get lost, right?
Hey, what are the commitments you identify right now?
It's really instant, very easy process to generate it and then
create that pipeline. And hey, engineer, here's your

(56:17):
like the commitment agent or commitment assistant that it
will identify these things. Even you should not go with like
you might not go with the agentic framework, but you can
spin up and like the single agent which will identify your
like the analyse your inputs andidentify that, hey, this is the
customer, This is the, the commitment you, you gave that
customer. And then how I'm going to keep

(56:39):
it right? And I'm going to also like to
notify you before the deadline. So all these things can be done.
And then there is no limit rightnow.
So whatever you like, you think you can build with this agentic
framework like with this agents,right?
And then if you think about that, hey, we have a tools

(57:01):
library. In that tools library, we have
hundreds of hundreds of tools. And just think about that.
Everything is like the adjacent schemas are captured and then
you can go and like the update these schemas that you will like
the you see, see it in UI, right?
And what you need to do is that,hey, I need four or five tools
and this is I'm going to like the integrate with my agentic

(57:24):
framework, multi agent, one agent and I would like to also
communicate with another agent or with another tool.
We have protocols as well, right?
A to a agent to agent. And then you can go with the NCP
for to to communicate with the other tools from different
domains, from different entities.
Just think about it and enterprise level network that

(57:45):
these agents, these AI tools will communicate each other to
like increase the productivity, decrease the costs, right?
Exactly. It's.
Awesome. I was going to say you hit them
there on the head with definitely productivity,
increased productivity gains across all fronts for both
developers, for both the enterprise in general, developer

(58:08):
productivity as well for All Star building these tools, we're
moving a lot faster. We're able to build faster to
help to help optimize workflows within, within our organization.
Misophile, I can't thank you enough for sharing yet again,
more insightful, more more more insightful experiences that

(58:29):
you've had building RAG applications, agentic
applications and AI applicationsin general.
I think every time we talk, I come away with, with so many
learnings now I'm going to go try some different techniques.
You're showing off the semantic cache and the synthetic
generation of data. So is there anything you want to
say as we close off this session?

(58:51):
So just one thing right in the past, like it was a like a
learning curve if I'm not from my steep learning curve to to
get up to speed with AI right now this is not the case.
Anyone can get up to speed with AI really in a short time and
then build their own agents withsome agentic IDs as well, right?

(59:14):
Just to our like the developers,our friends in this domain, just
make sure that your hands are dirty with an air like to
emphasize. That's a good way to to close
out. Thanks again me so far for
taking the time to speak with ustoday.
And I can't wait to have you again, maybe later later in the

(59:36):
summer or later this year. Sure, sure.
Yashwat, thank you so much againfor having me to give this
opportunity to show our use cases, right.
No. Thanks a lot guys and thanks
guys for joining us today. We will see you again very soon.
We have webinars, we have YouTube shows.
So stay tuned. There is more good content

(59:59):
coming from Mongo DB. Thanks guys.

All Episodes

Episode Transcript

Popular Podcasts

Stuff You Should Know

Dateline NBC

The Bobby Bones Show

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}EP. 270 Cisco’s Approach to Developing and Governing AI Agents

Episode Transcript

Popular Podcasts

.css-r6mb8g{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:1;overflow:hidden;}Stuff You Should Know

Dateline NBC

The Bobby Bones Show

All Episodes

EP. 270 Cisco’s Approach to Developing and Governing AI Agents

Stuff You Should Know