All Episodes

July 9, 2025 41 mins

The Internet of Agents is rapidly taking shape, necessitating innovative foundational standards, protocols, and evaluation methods for its success.

Recorded at Cisco's office in San Jose, we welcome Giovanna Carofiglio, Distinguished Engineer and Senior Director at Outshift by Cisco. As a leader of the AGNTCY Collective (an open-source initiative by Cisco, Galileo, LangChain, and many other participating companies), Giovanna outlines the vision for agents to collaborate seamlessly across the enterprise and the internet. She details the collective's pillars, from agent discovery and deployment using new agentic protocols like Slim, to ensuring a secure, low-latency communication transport layer. This groundbreaking work aims to make distributed agentic communication a reality.

The conversation then explores the critical role of observability and evaluation in building trustworthy agent applications, including defining an interoperable standard schema for communications. Giovanna highlights the complex challenges of scaling agents to thousands or millions, emphasizing the need for robust security (agent identity with OSF schema) and predictable agent behavior through extensive testing and characterization. She distinguishes between protocols like MCP (agent-to-tool) and A2A (agent-to-agent), advocating for open standards and underlying transport layers akin to TCP.


Chapters:

00:00 Introduction

01:00 Overview of Agent Interoperability

02:20 What is AGNTCY

03:45 Agent Discovery and Composition

04:38 Agent Protocols and Communication

05:45 Observability and Evaluation

07:00 Metrics and Standards for Agents

09:45 Challenges in Agent Evaluation

14:15 Low Latency and Active Evaluation

23:34 Synthetic Data and Ground Truth

25:07 Interoperable Agent Schema

26:37 MCP & A2A

30:17 Future of Agent Communication

32:03 Security and Agent Identity

34:37 Collaboration and Community Involvement

38:28 Conclusion


Follow the hosts

Follow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Atin⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

Follow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Conor⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

Follow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Vikram⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

Follow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ ⁠⁠⁠⁠Yash⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠


Follow Today's Guest(s)

AGNTCY Collective: agntcy.org

Connect with Giovanna on LinkedIn

Learn more about Outshift: outshift.cisco.com


Check out Galileo

⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Try Galileo⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Agent Leaderboard

Mark as Played
Transcript

Episode Transcript

Available transcripts are automatically generated. Complete accuracy is not guaranteed.
(00:05):
Welcome back to Chain of Thought, a podcast that
positions builders to put AI into production.
I'm your host, Connor Bronson, head of developer awareness at
Galileo. You may notice if you are
watching on YouTube, but we are not at home today.
I'm not sitting in my office. I am delighted to be at Cisco's
office on Santana Rd. in San Jose, and we're going to be

(00:25):
diving deep into the fascinatingworld of agent interoperability
and the infrastructure that willpower the next generation of
AI systems. I'm thrilled to be joined by
Giovanna Carafio. She's a Distinguished Engineer
and Senior Director at Out Shiftby Cisco, where she's at the
forefront of building the Internet of Agents.

(00:46):
And we're delighted to be part of their agency collective
effort with Lang Chang and with Galileo leading the steering
committee alongside Out Shift byCisco.
Giovanna brings an in depth perspective to this
conversation. She's one of Cisco's youngest
distinguished engineers. She has an incredible research
background, a background in mathematics, originally from
Italy, based in Paris today veryjealous with a remarkable

(01:09):
background that spans from reinventing Internet protocols
with information centric networking to now architecting
the open standards that will enable AI agents to collaborate
seamlessly across the enterpriseand across what we think will be
an Internet of agents. Now she's working to reinvent
how the Internet's communicationprotocols will work with agents,

(01:31):
in particular. Giovanna, welcome to Chain of
Thought. My pleasure, thank you for
having me. We really appreciate you making
the time. I know you're only in the States
for a couple weeks, and so it's fantastic.
So we got a chance to sit down with you.
The timing really couldn't be better for this conversation
coming off the heels of Cisco Live and some of the really cool
announcements there. Though this recording will

(01:52):
likely go out in a couple of weeks.
So if you already heard the announcements, pretend it's
happening live for you right now.
And there's so much momentum building around the agency
collective, the open source collective that you are helping
to lead. We're witnessing the foundation
for this emerging agentic stack,for a world where agents are

(02:14):
embedded throughout the enterprise, throughout
organizations. Giovanna, let's start with that
big picture. What is agency?
What is this collective we're talking about?
And why is an Internet of agents?
What you think the future will be?
Sure. So agency is an open source
collective that Cisco Galileo and Lang chain as founding

(02:35):
members launched in March. And that's because we believe
that in order to have agents coming together and collaborate,
even when developed with different agentic framework and
deployed remotely, that really calls for a new Internet

(02:55):
revolution as the one Cisco pioneered many years ago, where
we really provide the developersor the enterprise that deploys
these agents the tooling for interoperable large scale
distributed agentic communication.

(03:16):
And this is what agency is about.
If you want to go through the main pillars of agency
essentially is around identifying discovery agents.
We think that there will be agents specialized a bit like
subject expert matters that we have in our teams in given

(03:37):
domains or for doing given tasksor attached to tools and and for
that to come together. What's important is that we
enable a discovery. So this is what we call in
agents agent discovery and we have recently launched a week
ago the service that powers agent discovery.

(03:59):
And this is where we're going tobe able to go and search agents
based on skills, based on publishers, based on
compatibility with given tools and based on their reputation,
which is something that Galileo really helps build.
So this is discovery phase. The second one, which is very

(04:20):
important is the compose and deploy.
So once you have located these agents, what's really important
is that they come together againwhere they are.
They can be served as a service.They don't have to be deployed,
which is most of the case these days and and and they will
interact again across the Internet for that to happen.

(04:43):
What's important is that we rethink the agentic protocols.
So we have seen a lot of protocols coming out these days,
especially for agent to agent. ACP is a protocol we have worked
on with long chain as an extension of agentic protocols.
A to a has been proposed recently by Google and we want

(05:03):
to support all of them. MCP is a very popular protocol
these days for agent to tool interaction.
This happens at application layer but for that to be
efficient. And when I say efficient means
cure low latency interactive, that also calls for a transport
layer protocol, which is fundingbecause this is what we were we

(05:26):
were working on days ago with information centric networking.
And this is what we're doing in agency with a protocol named
Slim that was recently announcedin the past days where that that
enables group based low latency secure interactive
communication. That's one.

(05:47):
Aspect, but I really want to go into observability and
evaluation. Our favorite topic?
Our favorite topic that that we work on with Galileo, I think
this is really key to unleash the power of identical
application because we need to help developers.
So the the the first step there which we are working together
again with Galileo, Long Chain, Tracelope, Pedantic and many

(06:12):
other of the 50 and plus agency partners is to define an
interoperable standard schema for agentic communications.
This is the underlying layer foragents to be instrumented and to

(06:34):
reconnect all the metric events log traces, so the male
telemetry that comes out of them.
And we are trying to do that extending open telemetry
standards because of course agents will come within
application and and we want thatto be as open as possible.

(06:57):
And so there we are driving the communication standards.
We recently the released 2 days ago the SDK that supports the
schema and we are working with partners such as Galileo and,
and, and the collective. That's the foundation.
But what really excites us is that it really opened them to

(07:21):
doing a bit more than just providing visibility.
So the next step, and this is something that we, we we really
think it's important is to be able to explain this agentic
communication. So today we have seen a lot of
inagent observability or even evaluation.

(07:44):
What we want to do next is really to provide the visibility
of the agentic graph. I know Galilee was released in
these days way to reconstruct the agentic graph.
This is very important. We want to have that in agency

(08:05):
released one of our next components, because this will
give both the developer and the enterprise deploying the
application in production the way to understand how the agent
communicate, how the data are passed between an agent, one
agent and another, how the tool are called.

(08:26):
And this opens to much deeper analysis.
Yeah. Absolutely agree.
I think it's crucial that we make debugging and understanding
agent interactions much easier. It's why we released our graph
view so that you can see the basic trace and understand it in
a different way much more visually, which I think is

(08:47):
especially great in order to enable not just developers, but
business users and other folks who are are building agents or
seeing agent interactions. It's also why we're adding other
views like timeline view, which we recently released, so you can
actually see on a timeline as different agents are working
together. How are they communicating?
Where are they calling tools? How do they work together?
And then messages so you can seethe full trace within how that

(09:10):
agent will will experience it and how an individual will
experience the other end. And I think it's really
important we have all these views and, and more to come
because the the interface for agents is so based around
natural language right now, but it's rapidly expanding to
include multiple different mediums.
And I would expect we're going to have a variety of ways that

(09:33):
we need to interface with these agents and which they need to
interface with each other. And so being able to understand
that communication, make it morepredictable, make it more
reliable is crucial. And it's why we've released our
AI reliability features and are excited to bring a lot of that
to the Agency collective as well.
Totally. I think this is a key point if
you really want to see this application in production.

(09:53):
So these days we see a lot of single agent or some talk about
multi agent, we want to go to the next level or really see
business application adopting these agents in a trustworthy
way. And so for building that
confidence, it's really important that the evaluation

(10:15):
goes deep into explaining how this system work because they
are much more complex than normal application.
And this is something that we want to to, to drive in agency
and it also companies such as Galileo are, are driving.
So there I would say that there are a few aspects that are

(10:37):
really important. So first of all, the selection
of good metrics are not too manybecause we don't want to confuse
the users. We really need those metrics
that are key to capture the the the dynamic of this multi
identic systems want to recommend user which one to use
and and and I think this is one of the themes that that you are

(11:00):
also working on. We want that evaluation to be
done in a cost effective way. So today it can be lengthy task,
it can involve a lot of models and data set creation.
We want to to have that as a seamless process that is really

(11:21):
integrated into the development and and and that builds this
trust on on the agentic system. Absolutely agreed.
We need to start bringing the metrics for agents, the
intervability for agents to developers versus forcing them
to always go somewhere else and you know, pull that information

(11:41):
into their platform of choice. It's why we're doing things like
it may be live by the time this release is actually using MCP to
bring agentic metrics and visibility into the IDE.
So you can simply prompt your way through using Galileo by
ingesting our docs, ingesting the agency docs and any agency

(12:01):
related agent can easily add observability with open
telemetry with Galileo. That's why we're really excited
about the AI reliability features that we are are rolling
out with as part of our ARO liability platform specifically
built for agents. And I know that's something
you're really passionate about is this idea of creating
predictability for agents because obviously the non

(12:23):
determinism is the magic, right?It is the opportunity, but we
have to have structures within and operates with, have some
predictability, with have some explain ability and reliability.
I'm using a lot of ability phrases here, but it matters, I
swear. Tell me more about your
perspective on how to create predictable agents that do what

(12:46):
we need them to do and have enough trust to actually go into
production. Yeah, yeah, we all love the fact
that this model seems kind of magically doing their job.
But maybe because of my background, I, I think it's
really important we get to explain because even the, the,

(13:07):
the stochastic nature can be explained.
And So what we are working on and in the in the collective and
also as Cisco it's Plank, it's really to make the agent or the
agentic collaboration predictable.
That means that we want to explore by testing all the

(13:29):
possible state of space of solutions that these agents were
working and try to characterize patterns, make sure the output
is consistent where the agents are provided with the similar
input. It also goes into recreating,

(13:53):
reconstructing the normal behaviour that for these systems
is not so trivial. Doing that I think is going to
be very important, especially for enterprise to, to, to have
confidence about the output of the systems.
And so this is something that isreally important my my view,

(14:15):
yes. Absolutely.
And there's also the challenge of latency, which we were
talking about a little bit before we started recording,
because we have this expectationof, of real time.
Often LLM's will not fulfill allthe needs of evaluation systems.
Often human feedback will not fulfill all the needs of

(14:36):
evaluation systems. Now there are places for those
LM as a judge is a, you know, atthis point kind of well known
technique, continuous learning through human feedback.
Galileo has it. Many other folks are are
starting to bring in human feedback.
I know Cisco's leveraging it and, and leveraging those SM ES
while also creating your digitized SM ES with an element
of judge can provide a lot of excellent observability and

(15:00):
evaluation feedback to AI systems and agents in
particular. However, if you need them
operating in real time, you alsoneed guard railing.
You also need lower latency metrics.
Do you have any insights on to how you think that should be
approached by teams? Sure, sure.
I think you're touching upon very interesting aspects.

(15:23):
So we, I would say that we have been working on the low latency
from an agent, the communicationperspective, as I was mentioning
before, we're working on this group communication where you
can think that multiple agents with different skills will,

(15:45):
when, when when provided with a question will be able to answer
and and and combine their output.
And that really calls for for a low latency communication
protocol because otherwise it's really unusable.
So this is very important, I think in terms of evaluation of
this system. We also want to monitor latency.

(16:09):
But I would say that the the step farther that I found really
exciting is to be able to provide recommendation for
improvement. Yeah.
And so doing what, what you can call an active evaluation.
So not just observe and score the existing, but really trying
to quantify the margin for improvement and provide

(16:30):
recommendation, whether it's to the developer, the composer of
this application, or even the, the, the enterprise running in
production to optimize, optimizefor latency for sure.
Optimize for costs and, and costs, as you mentioned before
related to running this application and cost related to

(16:52):
the evaluation. I think what you're doing is
absolutely awesome and, and key.If you want to have this 360°
evaluation must be cheap and, and, and, and fast if you want
to have it and, and. So, yes, to me, the frontier
which I found really exciting isthat to go beyond just
evaluating and really providing remediation and helping root

(17:15):
cause analysis and, and, and, and fixing errors even in real
time when the application is running.
So I think this is where we should go next.
Completely agreed. And obviously we've done a lot
of research on that and with ourteam here at Galileo as part of
creating our Luna family of evaluation models.

(17:35):
Folks who have checked those outmay know that they enable lower
latency evaluations, they enablereal time guard railing for
things like EII toxicity, promptinjection attacks and more.
And we have actually now released Luna specific metrics

(17:58):
for agents as well. So tool selection, quality
action advancement is the agent actually taking actions to move
forward action completion to actually fulfill the task you
want tool airing. We now have a family of Luna
metrics that are fine-tuned withthese SLMS.
We've developed so that we can have significantly lower latency
agentic evaluations and feedbackas well as that guard railing to

(18:20):
enable that production challengethat you were talking about
here. And we're really excited about
that and excited about how that's also fueling our insights
engine where we can say, hey, great, like here based off of
these extremely latency real time metrics, here are the
insights that we have. Here are suggestions for changes
you can make. And to me, that's the magic I I
see this huge opportunity to keep leaning in there.

(18:43):
Yeah, especially think about it if you're, if you're, if you're
going to be able to do it live, if you're going to do to be able
to embed this evaluation with the systems and and provide
recommendation as it goes that that would really I think check
all the boxes from giving transparency and control and and
even the margin for improving the systems.

(19:04):
I really hope that that you're getting there.
I, I think we're, I think we're there, which is really, really
exciting. The, the next step for me is
going to be OK, how do we now enable rapid implementation of
that? Do we do it like in the ID
through MCP like we talked aboutearlier where you say, OK,
here's my I, I've like I'm I'm building this application, I'm

(19:26):
experimenting. Oh gosh, like here's the
problem. Great, let's hit complete.
We're good to go. You we do it through
integrations with agent providers and folks, we're
building agents like an example of this might be like N8 N where
you can look at our agent graph and say, oh, like here's where
tool airing is happening. The suggestion is to move this
step later. Great, just hit accept and it

(19:48):
flows back and N&N your agent changes.
Now we're not quite there yet, but I think that's where things
are going and I'm really excitedabout that.
That future looks like especially once we can solve
these protocol challenges aroundactually having great
communication because imagine the self improving agents that
folks are working on. And I think there's so much

(20:08):
opportunity for self improving evaluation, self improving
agents to say, hey, let me take the inputs of these, these
evals, these observations that are coming through the small
language models, these Luna models that we have.
And then apply them in real timeto improve myself based off of
whatever guardrails have been set for me.
Which implies other AI and reinforcement learning and

(20:29):
that's also great. Yeah, you were mentioning about
the the the lunar models. I think what's also interesting
there is that Galileo calls for custom metrics.
It's really important that we wewe do this evaluation in a way
that is really targeted to what is the intent of the agentic
system. So this is something that we

(20:49):
also want to to open source as example to the developers.
And so one of the things that again is coming in agency is
this metric computation engine where a few example of important
metrics related to the evaluation of the agentic
communication, the agentic framework, the task delegation
workflow efficiency will will will will be there.

(21:12):
Absolutely agree with you. And it's the same reason we've
added automatic insights and metric suggestions within our
platform as well. And we're really excited to work
with agency to bring some of that to the collective because I
think it's so essential that if I come in, I have a customer
support agent like, yeah, I can suggest four or five metrics
from the Galileo platform that already in there and likely will

(21:34):
work and probably a couple open source ones with agency that,
yeah, you should probably have context adherence and dual
selection quality. But maybe we have 4 custom
metrics that we've developed or that we can suggest that we
develop with you, that you can do it live with an LMS judge or
SLM or by code. And I think that is where we can

(21:56):
really start this data flywheel moving as we get customization
around these different systems. I mean, enterprises have such
complex internal systems that not everything is going to run
open source, not everything is going to be simple.
So how can we enable the base standards to work together and
then customization based off of the direction that each system
goes? Absolutely.

(22:18):
What else are you thinking aboutwhen it comes to agents right
now? What is what is top of mind for
you when it comes to making surethey succeed?
Beyond evaluation. I'm we can dive deeper in
evaluation. For service, I'm really thinking
about bringing the evaluation ina different way, observability
and evaluation. So today we are going through
the standard route of instrumenting and collecting

(22:42):
data and computing upon which isrequired.
And then as we said, there are alot of things already there to
do a lot of challenges. But I'm thinking whether it's
possible. I mean, this is just a thought
this days to really embed is observably the evaluation as
agents within the system and that that would be able to to

(23:06):
work in a localized manner whilethe application is running.
I think there is a lot you know of, of AI that that, that that
will help not just in terms of what we observe, but how we
observe. So yeah, I think the future will

(23:27):
will, will bring a lot of new opportunities for disrafting
also how we're doing, how we're doing that.
Another thing that really passions me is the capability to
do evaluation without a lot of ground truth data.
I mean, you know this better than mine, Galileo.
You're trying to to remove this need for good ground truth data

(23:50):
set because in many, many cases you just don't have them.
Well, it's also why we have synthetic data set.
Yeah. So I mean, AI agree, like we're
not always going to have ground truth to work off of.
And it's part of why we've addedsynthetic data set generation to
our platform. So you can generate your ground
truth, especially when you're creating these custom metrics.

(24:10):
It may be like, oh, we want thisidea, But I think part of this
too, it comes down to continued fine tuning of your metrics and
of your ground truth because what you may initially establish
may it may change over time. You may get SME feedback.
You can use auto tune metrics, Galileo, for example, to provide
that human feedback and auto tune the metrics go back to the
SLM or the LLM and say, hey, great, like adjust based off of

(24:32):
this. But I, I think it's a really
interesting challenge to your point where we increasingly
have, I mean like data sets built on data sets and AI built
on AI data sets. And I'm I'm really optimistic
about what we're doing as synthetic data sets, not just a
GAL layout, but all all over the.
World yeah, also to to to reducethe need for human feedback

(24:55):
because this is there but. It doesn't scale as well as we
need to, right? It's crucial to have in certain
points, but if you try to do it across the board, not enough
time, not enough humans. What else are you thinking about
as far as observability evaluations when it comes to
agency? Where do you see the next 6-12
months go? I think first of all, I think we
should succeed in making this interoperable agentic schema and

(25:23):
instrumentation standard. And this is something that is
happening in open Telemetry to me is the key to everything
because for sure, agents will bedeveloped with multiple
frameworks and interconnected. The entire Internet of agents
principle lies upon that. So this is something that I see
happening because again, even even accelerated by the agency

(25:47):
collective that we we, we see a lot of partners and, and, and
companies in general pushing forthat.
And the other thing that that again, companies such as Galileo
are promoting is really the capability to have this
evaluation done in a compact, effective way.

(26:11):
And, and then this will I think Buster the development or more
complex application because people would be less scared
about the complexity of the system.
So this is to me something that is hopefully coming in the next
6 months. While of course after that we'll
see the evolution, I will try toto to adjust even observably

(26:33):
than evaluation with the very rapid development of disoriented
system that that that we are observing.
Let's get specific for our audience.
I know a lot of folks listening are familiar with Agent Connect
protocol. A lot of folks listening are
familiar with Model Context protocol, but some aren't.
Some may only know 1, and I think it's important to set the

(26:55):
ground truth of this conversation.
What are the differences betweenthe agent connect protocol and
the model context protocol, and how can they work together?
Slash When should they be used? Yeah.
First, let me say that with protocols, and this is true in
general beyond the genetic, you see the mergence and, and, and

(27:16):
you can have multiple protocols trying to achieve the same thing
before you, you, a few of them really become the reference.
So I see this happening today. And in agency, we want to
support them all. Now you mentioned specifically
MCP and ACP. Well, they're trying to achieve

(27:38):
2 very different objectives here.
So MCP is really aimed at agent to tool interaction which is
very important and, and, and with that especially when the
tools are remote and, and and not integrated within the agent.
So this is a protocol that we want to support.

(28:02):
We're already integrating and supporting an agency also in
terms of observability. Now again, this is to me related
to agent to tool interaction is coming sooner and is very
popular because it's much simpler for these agents to use
tools and for tools to provide you a way like an API to be used

(28:28):
by agents behind MCP servers. That, that that's MCP.
It's, it's, it's really becomingpopular.
And again, we want to to to support it agent to agent
communication, just trickier one.
We have tried even there to start from existing protocols at

(28:48):
the time, the identic protocol from from long chain.
Now we see the emergency of other protocol A to A for
instance. But there are there are many
others. I think that there will really
need to progress and see what's really required.

(29:10):
But to me what's really important is that we powered
that with the lower level level transport layer that supports
them all. It's a bit like what what what
TCP has done and and and the andthe Internet product will stock
for for for a long time. We want to have that good

(29:32):
foundation and this time I wouldsay again, based on my ass work,
we want to make it secure by design, low latency and grou
base because when I think reallybeyond in agents, we're really
beyond this agent wage and communication in most of the
cases, especially when the communication is driven by

(29:54):
natural language question totally multiple agents will
collaborate. And this is also one of the the
frontiers of identity communication.
We want these agents that today we're stitching into identity
graph to be able to autonomouslycollaborate.
Yeah. And and that really calls for

(30:14):
for a bit more than just one protocol or or a point to point
connectivity. It's a great point and Ivana,
I'd love to understand more about that vision of the future
as we move from a couple agents interacting to thousands,
millions of agents. What are the biggest technical
challenges that you're anticipating and how is agency

(30:37):
seeking to solve them? Well, you know, as as Cisco, we,
we always say this in three dimension connectivity and this
is what we're just talking about.
The, the, the, the transport layer protocols that we are
pushing with Slim should be ableto scale.
That's why it's going to be a, a, a data centric protocol that

(31:03):
that by definition is not connecting points.
So this is something that is proven to to to be a good
fundamental principle for the design of protocols.
The second one is security. So there as well.
There is a lot to come. We haven't touched upon that,
but it is definitely a lot of work that starts with identity.

(31:24):
It goes into be able to connect in a secure way even in
enterprise with zero trust, identic concepts, agents and and
and three, we are always coming back to the observability they
want that we need to make it work at.
Scale. We did not pay her to say this,

(31:46):
I promise. It's really something that that
that is close to my heart this days because we are working and
I think a lot of potential there.
But as we said also as a requirement.
So there we need to make this durability and evaluation.
They are all part of the the thethe the same dimension be

(32:07):
scalable. You mentioned security.
I want to. I actually had that as my next
question, so I'm glad you brought it up.
One of the most compelling things happening around security
is agent identification and frameworks around ensuring agent
identity. Otherwise, hey, how good is a
communication protocol if agentsare lying to one another and

(32:28):
stealing information? Why is identity such a critical
challenge for autonomous agents,and how does Cisco and the
agency collective plan to solve that problem?
That is good to mention. We're just release an agent
identity component in in agency.So all started for us first of
all from defining A schema for identifying agents, we call it

(32:53):
OSF is 1 of the component of agency.
We call it like that. If you want to to replicate what
OCSF with open cybersecurity schema framework has done for
security becoming this long lingua franca for, for defining
cybersecurity data. And so in this case, we want to

(33:16):
really characterize, we pushed afirst version of OSF, a really
welcoming contribution there about what defines an agent.
Also because the definition of agents itself, as we were
mentioning before, is, can can be challenged.
And so we want for these agents to have to, to be uniquely

(33:37):
identified to have a clear provenance, whether it's for the
agent and, and, and, and the data to have a clear way to
spell out their skills so that that can be discovered based on
skills and they can be evaluatedor you can have a reputational
touch on, on, on on the skills. And beyond that, we want to have

(34:03):
an identification of agents of tasks and and then there's a lot
more work coming, but this is definitely one of the important
dimensions for agency today. Absolutely.
And we're excited also to release a new agent leaderboard
that's we have an open source when you may have seen our first
version that is going to includea lot of focus around agent

(34:26):
success with certain verticals and tasks.
Because we we agree there, there's a huge opportunity here
to understand success, to definewhat we want out of predictable,
hopefully agents. And there's a huge need to
secure and observe that. I'm curious, what do you want to
see from Galileo as we build thenext few months?

(34:47):
Like what are the things that wewould, would or could or our
building that would be exciting to you and and useful?
Well, you know that I, I, I, I, I want for Galileo to push some
of these awesome work you are doing in defining metrics,
especially for multi agentic system and an agentic contacts

(35:08):
and communication interto agencyin an agency.
I think this is very important. We can have a few sample medics
and the metrics computation engine as one of the elements
that we want to provide the developer to, to, to, to start
to, to have been through this journey of putting together a

(35:31):
genetic application. But this is coming and I know
we're we're working on that. Yeah, we are really exciting
about Galileo bringing all theseexpertise in defining metrics
and recommending metrics into agencies so that really together
where we can have a more predictable and reliable agentic

(35:53):
system. How can Galileo's Luna models
unlock more opportunities for lower latency evaluations?
Well, I think evaluation should be constrained in budget because
you don't want to have it too expensive and constrained in
time. So what Galileo is doing with
the small language model in Luna, it's amazing.

(36:15):
This is a key step for having a deep evaluation, a total
evaluation of this system, so cool us to that.
Well, we're hopefully excited tobring some of that to open
source as well. So I think there's going to be a
lot of fun coming here. Thank you so much for Ivana, for
the conversation and for bearingwith me as we knock out a couple

(36:37):
last things here. I think we're all set, honestly.
I absolutely agreed. I think there is a major
opportunity for us to collaborate with Cisco and
others within the collective Llama index lane change.
So many other folks who are there to create systems for AI
agents that help them to stay asreliable as we we need stay on

(36:59):
task and succeed. Because really in the end, we
have a trust problem to solve. And we want to enable every
enterprise, every business, every hopefully builder around
the world to leverage AI agents in a way that helps make their
lives easier. And as you've pointed out there
on this conversation, now there are keys to that.
There are, there are basic layers we have to solve.

(37:19):
And I think we're, we're starting to get there, but
there's so much opportunity to build upon that.
You know, we talked a bit earlier about the, the small
language models, Galileo's usingour Luna models, which we just
released new ones on. And I'm really excited to see
like, are there open source opportunities around that?
Are there opportunities to extend that?
How can we make things more realtime so that guard railing is

(37:42):
more successful? And you know, some of this I'm
sure will stay within the paid platform.
Some of it will will come to agency and, and be fully open
source. But it's just such an exciting
time. I there's really nowhere else
I'd rather be than than part of this as we, we build.
I mean, I think the future of the Internet and the future of
knowledge work, which I see yourexcitement about it too, like.

(38:03):
Really agree. Let's go back to work.
Giovanna, it's been such a pleasure.
Thank you for joining us all theway from Paris.
We're so glad we could snag you while you were here in the Bay
Area. This has been such a fascinating
deep dive into the future of agents, frankly, the current
situation with agents and the infrastructure that is going to
power the next generation of AI systems.

(38:25):
Thank you for sharing both the technical vision for Agency and
the practical lessons that you're learning from Building
the Systems. Where can our listeners go to
learn more about the Agency Collective and to see the work
of the Steering Committee without Shifts, Galileo and Mike
Chang? Well, it was my pleasure.
First of all, thanks a lot for this very interesting
conversation. I think it really shows that

(38:48):
that we have a path ahead of us and and we want to get there
very quickly. So for everyone who wants to
know about our work in the collective, I would say go to
agency.org. And specifically on some aspects
such as observability and evaluation, we have a working
group. So I will encourage people to
just go there and participate tothe meetings and, and, and take

(39:12):
this journeys with us. Absolutely.
And you can find links to all that in the show description as
well as at Galileo dot AI. If you want to try any of the
features we've talked about today and very much check out
the GitHub as well. agency.org will have links to all that.
There's a lot of opportunities to contribute and we're so
excited to continue to include developers from around the world

(39:33):
and, and very excited about the the opportunities to keep
building these protocols and collaborating with A2A, MCP and
everyone else. To our audience, if you're
building agents, thinking about agent interoperability,
evaluations, observability, predictability, reliability, all
these phrases, all the abilities, this is a community
we want you to join and the Internet of Agents isn't this
distant future. I know it's kind of feels that

(39:54):
way sometimes as we talk about the future, what it looks like
it's being built right now by teams like Cisco, by folks like
Giovanna, and the Open Source Collective needs you to help
shape the future. So thank you so much for tuning
into this special open source focused episode of Chain of
Thought. Be sure to subscribe wherever
you get your podcasts. And if you know a developer, a

(40:17):
builder, whether they're a data scientist, someone else, someone
who can contribute and you thinkwould be interested in the
collective, send them a link to this episode.
Have them check it out. We'd love their thoughts, their
feedback, their PRS if they wantto make a submission.
And you can always find way morecontent like this, much more in
depth. Everything from AM BS
perspective on open source to somuch more on the Galileo YouTube

(40:39):
channel for more episodes and deep dives into the world of
productionizing AI. This has been fantastic.
Really, really appreciate you sitting down and thanks for
letting us steal your time. Nice meeting you.
Nice meeting you as well. Yeah, definitely have to stay in
time.
Advertise With Us

Popular Podcasts

NFL Daily with Gregg Rosenthal

NFL Daily with Gregg Rosenthal

Gregg Rosenthal and a rotating crew of elite NFL Media co-hosts, including Patrick Claybon, Colleen Wolfe, Steve Wyche, Nick Shook and Jourdan Rodrigue of The Athletic get you caught up daily on all the NFL news and analysis you need to be smarter and funnier than your friends.

On Purpose with Jay Shetty

On Purpose with Jay Shetty

I’m Jay Shetty host of On Purpose the worlds #1 Mental Health podcast and I’m so grateful you found us. I started this podcast 5 years ago to invite you into conversations and workshops that are designed to help make you happier, healthier and more healed. I believe that when you (yes you) feel seen, heard and understood you’re able to deal with relationship struggles, work challenges and life’s ups and downs with more ease and grace. I interview experts, celebrities, thought leaders and athletes so that we can grow our mindset, build better habits and uncover a side of them we’ve never seen before. New episodes every Monday and Friday. Your support means the world to me and I don’t take it for granted — click the follow button and leave a review to help us spread the love with On Purpose. I can’t wait for you to listen to your first or 500th episode!

Dateline NBC

Dateline NBC

Current and classic episodes, featuring compelling true-crime mysteries, powerful documentaries and in-depth investigations. Follow now to get the latest episodes of Dateline NBC completely free, or subscribe to Dateline Premium for ad-free listening and exclusive bonus content: DatelinePremium.com

Music, radio and podcasts, all free. Listen online or download the iHeart App.

Connect

© 2025 iHeartMedia, Inc.