Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
(00:00):
If you don't have a way of observing what these agents are
doing, if you don't have a way of evaluating them, and if you
don't have a way of putting guardrails around their
behavior, you just don't have anagentic system that's going to
really survive. Hey folks.
I am Connor Bronson coming to you live from the Galileo
(00:21):
offices in San Francisco, which are our offices are about 3 more
weeks before we move to a new location.
We are here with Chain of Thought, the podcast for
developers building with AI. I'm Connor, the head of
Developer Awareness at Galileo, and we're here for a
conversation about how AI is changing, how agents are
(00:41):
evolving. And this complexity that has
been introduced as we begin to build multi agent systems, as we
begin to expand upon more basic Reg implementations to massive
complex networks of LMS interoperating with one another,
self evaluating, some time self improving.
(01:03):
And there's a new urgent challenge that is being faced by
companies around the world, which is reliability and the
ability to have the trust neededto put these AI systems into
production. It used to be easy when this was
just an experiment, when it was fun, when it was new.
And now people are going, oh, what's the actual customer
value? Can we trust us to stay within
(01:23):
the guardrails we have? How do we ensure these systems
are not just powerful, but also predictable, safe, and
trustworthy? So today, we're thrilled to be
joined by someone at the very heart of this intersection
between data and AI, Mickey Chandra Shaker, Staff Developer
Advocate and Mongo DB, and someone you may know from
LinkedIn. Mickey, thank you so much for
joining us SO. Good to be here.
(01:44):
Yeah, I am excited to discuss the agent landscape, agent OPS
and so much more that Mongo is at the center of here within
this evolving data and AI ecosystem.
So let's maybe start with that vantage point at Mongo DB.
I know you've only been there a couple months now, but you've
obviously been involved in spacefor so long.
You've got this depth of knowledge about what's happening
(02:06):
and working with developers across different industries.
And what do you see is how MongoDB is going to push AI and data
forward in the next couple years.
Yeah, totally. And to give a little bit of
context to, you know, to that question.
So in terms of my background, like what I like to tell people
is that I've essentially had the, the, the joy and the
(02:28):
opportunity to work at every single stage of like the data to
ML to now the genre of AI pipeline.
So part of my career was workingas a data scientist, part of my
career was working as an MO OPS engineer.
And then, you know, more recently, my job has been in
terms of developer advocacy and education around how to build
(02:52):
production ready systems. Any system that involves whether
it's Jeremy of AI or whether it's classical machine learning
or even, you know, analytics. That's kind of been sort of my
the focus of my career and bringall that to my role here to, you
(03:13):
know, help advocate for developers that are trying to
build these reliable production ready generative AI applications
and platforms. So that's why I do had Mongo.
And one of the reasons why I joined is because so Mongo
already has a really rich history of providing one of the
best document data stores in theindustry.
(03:36):
And when people think documents,they merely think like PDFs,
etcetera. But really what you know, Mongo
DB was able to figure out reallyquickly was how to create a
database that essentially, you know, has a polymorphic schema,
which allows you to be very flexible with how you design the
data that feeds your applicationand to be able to encompass like
(03:58):
any kind of data, right? So that's the story of, you
know, Mongo DB, the V1 of Mongo DB.
But Mongo DB has also started moving into how can we enable
developers at all stages of maturity to not just excel with
(04:19):
their with their data store, buthow can we now then help them
leverage that data store to, forexample, support RAG
applications to support agent applications, Not just from the
the vector store that we have now on Atlas search, not just
from the memory stores that we also have available now as
(04:41):
modules in Landgraf and a coupleof our partners.
But how can we we do all that tolike help continue supporting
developers billing the next generation apps.
Now in terms of kind of, you know, where Mongo DB fits into
the industry now is we really see ourselves as the memory
(05:01):
store for agentic applications and agentic systems.
Totally. Yeah.
And in terms of like what that actually means, you know, it's
not just storing the data. It's not just storing, you know,
vector data for semantic search and similarity in RAG, but it's
also storing the chat logs, you know, things to help developers
(05:24):
debug, trace any kind of errors,and to figure out how they can
continue improving the underlying data, underlying
processes to, you know, continueimproving that user experience
for their applications. Absolutely, and I'll say that's
one of the reasons Galileo is really excited to integrate with
Mongo DB as a data platform is seeing the incredible stuff
(05:45):
being built with Mongo. There's so many cool agents that
I'm seeing developers come up with.
Are there particular use cases or particular moments of agents
in action that you've seen people bailed with Mongo that
get you fired up about where thefuture was going?
Yeah, absolutely. I mean, you know, it's really
interesting because I think a lot of the, the you see kind of
(06:06):
two schools of, of thought rightnow in terms of the, the, the
digital sphere world, the world of the Internet.
And one thought is, you know, that O agents, sorry, this like
overhyped fake thing that peopletalk about building but don't
actually build. Meanwhile, there are tons of
(06:28):
companies that are building likeproduction scale agents.
So there's a number of customer stories on the Mongo DB site as
well. But we've seen bills from both,
you know, really small start-ups.
We have a few of those partners on our site.
They're built on Mongo DB. We also have a lot of like blue
(06:50):
chip well known companies that have also started building
agentic systems with Mongo with with Atlas specifically.
And more recently, you know, Mongo DB announced the
acquisition of Voyage, which hosts a number of very high
performing, embedding and re ranking models.
And bringing them into, you know, the same house means that
(07:14):
you know any team, whether you're a solo developer that's
building their like first million dollar app or whether
you're a team at an existing like 20 year old company in, in
telecom or in finance, in medical healthcare, you name it,
they can all build like very similarly performing agentic
(07:36):
systems. It's interesting that you bring
up similarity and performance because one of the big
challenges we're seeing with agents and AI systems in general
is reliability. Obviously non determinism is
incredible tool and incredible opportunity of these systems,
but it comes with risks about trustworthiness, getting to
(07:57):
production consistently and actually delivering on the
promise of AI and the opportunity they're in.
What has been the approach for Mongo and yourself around how to
improve the reliability of thesesystems, especially with the
incredible data layer that Mongoprovides?
Yeah, absolutely. And you know, it's really
interesting. So I, I recently gave a
(08:19):
lightning talk at the AI Engineer World's Fair and it was
about essentially how to solve memory for multi agent systems.
And so I had taken a poll and I said, you know, I asked everyone
like you raise your hands if youlike are building an agent.
OK, so basically 80% of people raise their hands.
I said raise your hands if you like have an agent right or
(08:41):
multiple agents in production about those same people roughly
raise their hands. Maybe there is one or two less.
And then I asked them, like, howmany of you have gone your
agents to work like straight offthe bat, 100% of the time, no
one raised their hands, no one at all.
You know, So to me that kind of it's a good cross section,
(09:03):
especially because that, that conference, it, it represents
the, the builders who are building at the edge.
So to me, that was very reflective of like probably what
a lot of teams are experiencing right now, which is that even if
for example, you have an agent that does like a structured
workflow, sometimes the answers it gives are are going to be
(09:25):
different. It might be 1 out of 100 runs or
it could be one out of 20 runs, depending on how complex the the
task you have given that agent, right?
So I think that's a really, really big challenge because I
think your users and customers of whatever app you or product
you build, they have such high standards now for the customer,
(09:46):
like the user experience. And most importantly, if you're
dealing with like really sensitive data.
So for example, like if you're dealing with a, an, an agent
that internally helps put together financial reports,
financial analysis, competitive intelligence.
Well, actually competitive intelligence is important, but
actually financial reports, thatis a huge thing that if you're a
(10:07):
public company, you can't get itwrong.
Or even we've seen some some embarrassing examples I've
recently seen. So right now it's summer and
every publication in the world is playing out there.
You know, here's our top 50 summer reading list.
Something that happened apparently was that a few of
(10:28):
those summer reading lists that came out of reputable sites and
media channels were fake. The books did not exist.
The books did not exist and the the reading lists are still
published. And so when people went to go
look at these books, it they, they couldn't find anything.
It's like how? I mean, that's a relatively
(10:49):
trivial example in that you knowno one's.
The financial data. And I know it was actively
harmed, but the reputation risk there, you know, there those
those sites did actually take a Ding on reputation because the
common feedback people had wasn't necessarily that they
used Jenner VI, for example, to write the copy, but was that no
(11:12):
one did the editorial review to check if these books actually
listed right. So that's a relatively trivial
example. Everyone knows the the story
about the the guy who was able to get AI think a Ford truck for
a. Dollar lovely support pod.
Thank you, lovely. Support.
Fun story, you know, but what about, for example, a more
(11:32):
serious case where you have an agent that is helping to put
together initial diagnosis or doing triaging for people with
severe health conditions like that's or, or agents that put
together that do underwriting for loans, for insurance, like
those are very serious cases where the consequences can be
(11:55):
disastrous for people. Absolutely.
And I will say, I feel like I'd be remiss if I didn't mention
Galileo's wonderful case study with Magid where we talk about
how to solve these newsroom challenges.
So go check that out on our website at Galileo dot AI.
But hallucination is both a feature and a bug when it comes
to LMS, because we want them to create new, we want them to try
(12:17):
new things. We want them to be able to think
their way out of problems, whichis form of hallucination.
But what happens in the wrong areas when it happens outside of
the the guardrails we've tried to set, when it happens in a way
that affects people's lives negatively, it's a huge issue
and there's a lot of approaches being considered about how to
solve this. Obviously we focus on the like
(12:39):
observation, evaluation, guard writing side of things.
And I know Mongo is thinking a lot about agent OPS as well and
how do the systems operate. Can you tell me a bit about the
approach to agent OPS and what you're thinking is and how it's
evolving? Yeah, absolutely.
So it's to me, it's fascinating.So you know before the world
(13:02):
generative AI came about, I was working in ML OPS, right and the
the way. So let me approach you from kind
of like 2 angles. So the first angle is in terms
of I want to address some interesting, some interesting
(13:23):
trends that I see around once again, around the the building
of agents in the world. So there's one school of thought
that is sort of like, well, you know, agents because LMS are non
deterministic and also their stuff.
You should kind of just let themdo their thing and you should
treat them as a wonderful magical puppets that they are,
(13:43):
like magical box, let it go, do their thing, right?
And then it's OK, we'll catch the errors later and then kind
of figure out something out on the application layer, right?
But the other school of thought that I personally follow and
hold on to is that agents are a software product.
We have best practices that we've established with
(14:06):
traditional software and applications and software
engineering best practices there.
And yeah, something will change a bit with agents, but we should
still approach them with a certain rigor.
We should approach them with an understanding of like we have
these best practices that we, for example, built out in the
world of MO OPS. Let's, let's see how we can
(14:31):
adapt them to agents. But let's not get away from the
fact that they are still software products.
They're still code based. So in terms of, for example, how
we're approaching it on the Mongo DB side, one of the ways
that we're the one of the thingsthat we're really good at is
data. You know, it's storing data,
(14:51):
it's helping developers access data, it's helping them search
it, helping them to organize it.So that's one of the first ways
we're approaching it is how can we store like all the data that
you need not just to feed your applications, but also to help
you understand how your agents are performing, Where are the
conversations that they're having?
(15:13):
How can you then, for example, plug into observability and
evaluation providers to then be able to understand them and do
those traces? There's a few folks that are
pretty, I would say are great voices or great advocates for
that school of thought. So there is Hamil Hussain and I
(15:33):
think Shreya Shreya Shankar. And you know, I love all the
stuff that they produce because they they advocate for that
rigor for that evaluate approach.
So that's one way Mongo DB is approaching is we do data really
well. Let's like bring that excellent
excellence and expertise to developers and then let's figure
(15:53):
out how to build tools around it.
So that think kind of leverage all those different sort of
abilities the the data store, a vector store, a memory store.
Yeah, and. Then I'd say the second thing
that we're doing and this is really to both Voyage and a lot
of the recent recent improvements and features we
(16:15):
shipped around Atlas. So for example, you know,
embedding models are obviously important if you're doing sort
of a RAG style workflow, but also re ranking malls are really
important because you you want the ability to, you know, feed
the best documents, the best matches you you want to like
(16:39):
bring that quality in for what you feed into your like rag
system, right? So that's another way that we're
approaching is the Alistair scheme has shipped some really
amazing features. So now developers for example
can not just do full text search, not just do semantic
search, but also they can do hybrid search where they can
(17:01):
combine the the best of full text and semantic to create even
like better better pipelines. I love your perspective here
because I completely agree with you.
We can't always just let agents run amok.
We have best practices we can place, we have things we can
take from prior eras of softwareengineering.
(17:23):
And you know, I, I know there's this hype that software
engineering is going away and maybe someday it will, but I
don't think it's happening anytime soon.
I think we should be leveraging these best practices.
I think we should take the learnings we already have and
apply them to this current era. And I'll, I'll say Hamill is
actually coming on the podcast very soon.
So very excited to talk to him and learn quite a bit 'cause I,
(17:46):
I feel like every time I talk tohim, I, I just engage in so much
knowledge and approach to that. And he's actually contributed to
some of how we've designed some of our new features on our
platform as we've thought through this problem too.
You know, one of the features we, we recently launched that
we're, we're very excited to have working with agents that
are leveraging Mongo DB are some, some new interfaces to
(18:09):
help with the debugging, understanding and identification
of problems in multi agent systems, including, you know, a
graph view to trace. OK, like what actually got
happened within this agentic work through or this agentic
workflow. And then, you know, a timeline
view where we can look at multi agents together and say, OK,
like who was communicating with who?
(18:31):
When was it happening? Which tools are getting called
when can really help with debugging and understanding
where there are challenges you wouldn't need if you let the
brain and mock, I suppose. But again, don't think you
should do that. And then a message view so you
can really dive into and unpack a particular string of messages
and, and understand the actual chain of thought of, of what
occurred. And I think it's so crucial to
(18:53):
create the reliability that we're talking about in order to
enable enterprises to really go to production.
Because, you know, we can't riskpeople's livelihoods or their
health. And yet there are so many
drudgery tasks that we can have.There are so many hours of human
time that we can free up for creativity if we apply these
(19:15):
systems in the right way. And I'm curious, from your
perspective, what are the features that you would want to
see on a reliability side? Like if you could just wave a
magic wand and make us build something like, what would it be
that you would want us to build to help support this ecosystem?
Oh, that's so interesting, so interesting.
Let me think about that. So some so in preparation for
(19:39):
some of the, the research, research and talks that I've
been doing for Mongo, especiallyaround multi agent systems.
So you know, what's really interesting is that like, it's
the the it's the, it's not that people aren't building multi
agent systems, right? What whenever people think, oh,
(20:01):
this thing will happen in the next two years, I can guarantee
you it's already happening. Now someone in the world is
building it. But I think the difference
between people who are building the products and the systems of
the future versus like the people who are building what is
available now. Like there's just this gulf of
experience and knowledge and practices.
(20:24):
And some sometimes, you know, ittakes a while for the rest of
the world keep up. So for example, multi agent
systems right in production. So it's just now and I think
we're starting to see papers come out that are really, you
know, they've taken that second school of thought, which is that
if agents are software engineering products, can we
(20:46):
apply, for example, scientific analysis to understand all the
ways that they break? So there's a couple of those
papers out. Some of them I had referenced in
my my talk at the conference. One of them was why multi agent
systems fail, which I thought was great.
And what the paper tries to do is it tries to provide a
taxonomy of potential failures in multi agent systems.
(21:10):
So I think a lot of the focus around failure, oddly enough,
has been around single agent systems, you know, and, and
thinking about memory, thinking about like memory for single
agents, people kind of focus on like, OK, so you need like, you
need a data store, then you needlike a, a vector store, and then
(21:30):
you have a, a chat store where you save stuff and then like,
that's it. And all this, this, this like,
for example, Champlow reasoning has been about improving a
single agent. But what we're seeing now
actually is that people are building multi agent systems.
And I think that's the advice and best practices that's
developing now is instead of having agents try to do a single
(21:55):
agent, try to do a lot of complex tasks, have like a
system of small agents, each onethat is equipped to do a, you
know, one or one or 123 tasks. Make sure that you have a
criteria for how to evaluate theperformance of those agents.
And then, and then you look at that team or that coalition or
(22:21):
what have you holistically, right?
And where so, so going back to that paper, why multi agent
systems failed that I thought was so good and there's a few
other papers that I'm happy to listen share with people as
well, was that what was the implication was one that a lot
(22:43):
of these failures are actually pretty predictable and you can
you could kind of classify them in this taxonomy.
And then I think the second partwas that all the stuff around
how to make a single agent better, we need to kind of
extend it on how to make a system of multiple agents
better. Two types of memory concepts
(23:07):
that we're starting to really think about and that's starting
to come out in papers is for example, having like a skills
library and having like a concept of like a blackboard
memory. So blackboard memory is where
you have agents kind of come together to post like partial
solutions. And essentially it's read,
write, and they kind of can pickup the, the, you know, the trail
(23:33):
and each can kind of bring theirown unique sort of expertise or
focus to help solve that problem.
And then a skills library is theOK, once a pattern has been
established. It it's a little different from
a cache because the cache is like you store a query result,
right, that you can kind of fetch.
But with skills library, it's like once agents, once your
(23:56):
multi agent system has like figure something out, instead of
having them refill it out each time, you essentially save that
pattern to this like skills library.
So going back to the question oflike, what I would love to see
in like an evaluation observability framework is I'd
like to start seeing a capture of those kinds of traces in it's
(24:22):
something that's native to like those types of memory concepts.
Because I think I think those are going to be really
important. I don't think people are really
talking about it. We've seen examples of people
doing some kind of like Canvas blackboard style thing, but that
that's what I at least what I predict is that, you know, all
(24:44):
the existing understanding of memory for like single agent
systems that happened. You happen to have a group of
them in, in, in a system. We need to evolve that to think
about multi agent systems where these agents are working
together in tandem and then to have the the native like tracing
and observability and evaluationon those agents, because that
(25:08):
becomes very different, right? Instead of having like each
agent is compared to like the same criteria and says like each
agent has a different criteria that is focused on like they're
kind of specific skill set. And I also kind of see people
playing around with different types of Asian architectures.
So most of the time we think of agents in a cooperative sense.
(25:31):
But for example, what if you hadcompetitive agents?
Yeah. It works really well actually.
Sometimes, you know, So that's, that's what I would, that's my
wish list. I, I love this and I
particularly like it because it directly relates back to
something you said earlier, which is, hey, there are a lot
of concepts from software engineering we can apply to
agents. And I would even extend that and
(25:51):
say there are a lot of concepts from good organizational
development and good management that we can apply to agents like
so you mentioned earlier breaking down agentic systems
from, oh, this one agent that does everything to here are
1020, however many agents that are doing small tasks together.
Yeah, part of a team. Oh, you know, teamwork never
talked about that at all. But I'm going to look at
(26:13):
software engineering. We've never taken epics and
broken them down into stories, broken down into sprints and put
those on a, on a, on a board andsaid, OK, great.
Like here's a task we're going to do here that that certainly
is not a concept we can understand.
We don't have to break large PRSdown into small PRS in order to
actually get them reviewed and done.
Wild, wild that we have to do that specialization.
You talked about that too of andI mean, clearly that's not
(26:34):
something we do within the economy or within teams.
We certainly don't specify and diversify and customize our
goals for those specialized people either.
They don't have different goals based on their job functions.
Maybe, for example, like certainly not something we can
apply to agents definitely can'tchange the criteria for those
folks. You wouldn't maybe, I don't
know, think of us like a sales agent with just a crazy thought
(26:56):
and like a software agent is like having different goals.
Just yeah. Absolutely.
And I mean, like if you think about it too, like let's say for
example, you have you, let's sayfor example, you have an agentic
system that is focused on real estate.
So essentially an agentic systemthat will help not just couples,
you know, or individual people with dogs.
(27:19):
What's funny, I actually had worked on, I tried working on my
own real estate tech startup. Oh.
Interesting. Yeah, this was early on my
little bit earlier on my career and I was like one of five
people plus a few contractors and I was working on like the
the data engineering and data architecture and pipelines and
(27:40):
all that data in real estate is wild.
But as part of that, you know, we had to read much of these
papers. And this one paper I read was
that the the biggest reason for single people to buy a house was
because they had a dog. That was one of the strongest
(28:00):
indicators of someone who was single.
Interesting. Was to buy a dog, you know, so I
always, I always think about that example.
But you have real estate anyway.So you have a real estate system
that is helping people find homes, but it's also helping
real estate brokers find listings that they can then sell
(28:21):
to these people. And let's say that's like a
multi agent system, right? What you could do, for example,
is you can then start AB testingand experimenting with each of
these different agent personas. So let's say for example, you
have a listing research agent. All it does is go crawl
different real estate listings or an MLS is another place where
(28:44):
you could get some of this listing data.
Or you know, in that Asian example, maybe somewhere like
Mongo DB might have that data. So weird weird place to put it,
I don't know. Yeah.
But let's, but you, let's say for example, you want to test
out different language models, you want to test out different
reasoning models, you want to test out different styles of
prompts. So you have 5 or 6 different of
(29:07):
these like research listing agents.
You could essentially do this like parameter fine tuning.
You can like experiment with different prompts, obviously the
the data for the real estate listings you could store in
Atlas for Mongo DB. And then essentially like you
have a, you have a way to, you know, pick who is the the best
(29:32):
like configuration, right? Who is the best type of agent.
In order to do that, you need tobe able to observe, like observe
the performance and you'd be able to trace it.
You need to understand why, why was the performance better with
these agents? And then you can kind of like
pick the the golden star, you know?
And I think too much of that decision making has been purely
(29:55):
qualitative so far, whereas we want both quantitative
benchmarks and qualitative feedback.
And it's really when that comes together that you see highly
customized, highly successful systems.
Yeah. Absolutely.
Obviously, we both cared deeply,Mickey, about reliable AI
systems and enabling developers to build this bevy of agents
that they're not only building already, but building in more
(30:17):
complex systems and have these deep reliability needs.
How does Galileo's integration with Mongo DB help enable
developers around the world to build more trustworthy and
reliable AI agents? More trustworthy and reliable AI
agents. Yeah, absolutely.
I mean, So what it comes down tois I think there's some
(30:39):
important core components of a production ready reliable
agentic system, You know, so Mongo DB, I mean, we cover
essentially the the data store for applications, both small and
if people want to scale it to like worldwide, we definitely
cover that. We also have the vector store
(31:00):
for semantic search and we have the memory store.
So that's all great. But at the end of the day, like
going back to these are still products, these are still
software products that are goingto be released out into the wild
and interacting with real peoples.
They also might be interacting with other agents, but ideally
they're interacting with real people who, you know, have very
(31:22):
real kind of concerns and expectations when it comes to
the products that they interact with and the kind of experience
that they want. And I think if you don't have a
way of observing what these agents are doing, if you don't
have a way of evaluating them, and if you don't have a way of
putting guardrails around their behavior, you, you just don't
(31:44):
have an, you just don't have an agentic system that's going to
really survive. And it's quite possible that,
you know, your company and product brand could also take a
big hit. So, you know, I see the, the,
the tools that Galileo offers aspart of its platform really
(32:05):
helping ensure that one people can build the agentic systems
that they need to, that they canscale these systems.
And they can do so in a way thatcontinues to not just improve
the user experience, but that can help continue to like grow
the, the companies and, and the brands that are building these
products. So it really it's, it's, it's
(32:26):
enabling trust, I think by developers in the systems that
they're building. Completely agreed, and I think
that's why it's so important that companies like Mongo and
Galileo also align to open standards like open telemetry,
so that it's easy for us to worktogether and easier for
developers to leverage data across these different systems.
(32:48):
How do metrics, in particular for agents, factor in?
Obviously it's something where we've developed some agent
metrics. Do you view that as an important
cornerstone of what AI builders should be thinking about as they
evaluate and absorb these systems?
Yeah, absolutely. I mean, so part part of my life,
(33:08):
I did competitive bodybuilding and you know, and I I'm also
generally speaking a very growthminded person.
And I think the phrase forgot exactly how it goes, but it's
the you. You can't improve what you can't
measure. Yeah, that's that's I think
about. That what's measured is mana.
(33:29):
It's just a lot of these little phrases.
Yeah, yeah, yeah. So I think vibe checks are fine,
but I think at a certain point you do need to put quantitative
metrics and measures in place because So what it comes down to
is we're for most people, they are building agentic systems for
(33:51):
a company. And most of the time it's
probably a for profit company ora for profit business.
So you know, it's it's billionaire Gentex system.
It's not a silver bullet to to building a profitable company,
but being able to figure out what's working and then being
(34:13):
able to make those decisions to be empowered to say like, OK, so
this agentic workflow or this agentic feature seems to be
doing really well. This one doesn't.
So let's really kind of prioritize what's working and
then let's deal with, you know, what isn't?
I think that's just that's that's crucial.
(34:34):
You know, I think people, I people need both, you know, and
I think this idea that because agents are powered by like LMS
or you know, they could also be powered by multimodal models.
We definitely have these cases for multimodal agents at Mongo
DB that we've seen. But this idea that because an
(34:54):
LLM or a VLM or whatever is likethe reasoning part of an agent
and therefore you can, you don'thave to use like metrics to
measure it because it's language, I think it's totally
false. I completely agree.
It's, I think we get too wrappedup in this piece of like, oh
(35:15):
wait, we're language prompting and we're using MLP and you
know, I'm a native English speaker, but English is not a
very clear language. I can say things in many
different ways. It requires context to
understand. Often numbers are different if
you apply context to them. And same is true of code.
(35:36):
And that's why I think it's important to have like code
based metrics because there are are other languages out there
that are have more clarity. And particularly as we look at
like human language and how it'sevolved, it's the contextual
elements of it are, are so diverse depending on what
culture you're, you're sitting in, what type of language you're
sitting in. And so using that as a prism to
(35:58):
to look at, you know, software and agents, we have to
understand the implications of how we apply these vibe based,
really natural language pieces of feedback versus also having
context by a numbers and providing more systematic, more
scientific approaches. Yeah, absolutely.
(36:18):
And you see this too, especiallywith coding agents or copilot
style agents where the output iscode.
You can definitely measure things.
Certain things for example, likeaccuracy, efficiency, lines of
code. Lines of code's a terrible
measure for the record. It is a terrible measure,
absolutely, but certain things, for example like the runtime
efficiency of like code, all other stuff like you can and
(36:42):
like for example the number timesomeone accepts a suggestion or
things like that, that those can.
Absolutely be measured. Yeah, lines of code is terrible.
Yeah, I, I. Used to work at Linear B and
where company that helps doing metrics for software engineering
particularly and they that was like the number one thing the
founders like anyone who said lines of code, they're like, oh,
(37:03):
I'm going to get them. I know, I know.
It's it's lines of code is a, it's a very How should I put
this? So it's a?
It's a hot metric. It likes to come up a lot.
And I think the the typical pushback is always like, well, you
know, as you get more experienced in your technical
(37:24):
leadership, track lines of code is not what you should be
measuring. It should be impact.
And sometimes that impact is notin code and all that good stuff.
But, but, but other things, for example, like within coding
agents like those should absolutely, those should
absolutely be enumerated, yeah. Mickey, thank you for an
incredibly insightful conversation.
It's been so fun having a chanceto to hear your viewpoint here
(37:48):
and I can't wait to see what youget up to at Mongo and what's
next for you. It's clear that as the power of
agents grows and as more multi agent systems evolve, the need
for a dedicated reliability framework that works hand in
hand with data platforms like the incredible work being done
on Mongo DB isn't just a nice tohave, it's an absolute necessity
for anyone serious about productionizing AI.
(38:08):
So thank thank you so much for sharing your perspective and and
your research and your knowledgewith us today.
Thanks for coming on to talk about Mongo DB and Galileo's
partnership. Yeah, thanks for having me.
I mean, we're, we're super excited about Galileo's new
agent reliability platform. I mean, there's just so much
opportunity from MongoDB and Galileo.
Just just continue Deeping our partnership and, you know,
enabling AI builders to scale with MongoDB and Galileo
(38:31):
together. Absolutely agreed, and I know a
lot of our listeners would love to follow along with your
continued work. Where can they find you on the
Internet? Yeah, absolutely.
So if they want to get in touch with me, you know, please feel
free to do so on LinkedIn, Twitter, YouTube.
I also, I do have a sub stack aswell.
(38:53):
I'll share the links. Absolutely.
And in terms of other work that's going on, so we have an
amazing developer relations, developer advocacy team at Mongo
DB. I have a number of my
colleagues, Richmond Lake, Apurva, Jesse Hall, you name it,
Anaya. So all these folks are really
(39:14):
working on giving developers thetools that they need to build
these systems. So I would encourage people to,
you know, follow the Mongo DB, you know, LinkedIn account,
Mongo DB has a YouTube channel as well.
And we're present on medium dev dot to all that stuff.
So everything that we'll be producing including our thought
(39:37):
leadership pieces around H and OPS around solving memory for
multi agent systems will be on those channels.
Fantastic for our listeners. You can learn more about
building next generation AI applications with Mongo DB and
Galileo out at Galileo's new free AI reliability platform
available at Galileo dot AI. Check it out, try it out and
give us some feedback. Build a great application, build
(39:57):
a great agent, leverage Mongo asyour data store and let us know
how we can improve. We'll have links to everything
we discussed today in the show notes.
That's all for this episode of Chain of Thoughts.
Don't forget to share with your friends if you enjoyed it.
And Mickey, thanks again for coming, Michelle.
Absolutely. Thanks for having me.