Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Bret (00:10):
Welcome to the first episode
of my new podcast, a Agentic DevOps.
this episode.
Is kicking off what I think is goingto be a big topic for my entire year,
probably for the next few years aroundwrangling AI into some usable format.
For DevOps, you probably heard of AIagents by now, or the MCP protocol.
(00:35):
I guess I should just say MCP,since P stands for protocol.
And these two things together arecreating potentially something
very useful for platformengineering, DevOps, and that stuff.
it has so much potential that.
In the first quarter of 2025, I kind ofthought this was gonna be a big deal.
This was gonna be, uh, if we canfigure out how to keep these things
(00:58):
from hallucinating and going crazyin our infrastructure, this could
potentially be the AI shift forinfrastructure that I was waiting for.
So started this podcast.
We recorded our first episode atKubeCon at the beginning of April,
2025, and this is gonna be a series ofvery specific episodes around getting.
(01:21):
Ais to do useful automation and workfor DevOps, platform engineering,
infrastructure management,cloud, you know, all those things
beyond just writing YAML, right?
So the, the intro for this podcast,there's a separate episode for intro.
It kind of goes into my whole theory ofwhy I think this is gonna be a thing.
And this episode we really try to breakdown the basics and fundamentals for
(01:42):
those of you that are catching up.
Because it's a lot.
There's a lot going on.
It seems like We have announcementsevery day this year around AI agents or
Agentic, ai, however you wanna call it.
I am calling it Agentic DevOps,and hoping that name will stick.
Now, this episode, since it'sfrom the beginning of April.
And it is technically now just gettingreleased at the beginning of June.
(02:06):
We're a little bit behind onlaunching this new podcast.
Um, I think everythingin it's still relevant.
There's just been a lot more since.
And I don't know the frequency yet.
I don't know how often thispodcast is gonna happen.
It could be potentially every other week.
It could be weekly.
I just don't know yet because weare not gonna do the same thing
here as on my usual podcast.
If you're someone who knows thatone DevOps and Docker talk that I've
(02:28):
been doing the last seven years, thatone is still gonna have AI in it.
But this one is very specific andthere might be a few episodes that have
syndication or whatever you wanna callit, of the episodes on both podcasts.
But most of the time we're gonnakeep the focus of just everything,
DevOps, everything, containers onthe DevOps and Docker talk show.
(02:49):
And this one is gonna be veryspecific around implementing useful
AI related things for Agentic DevOps,or automating our DevOps with robots.
So I hope you enjoyed this episodewith Nirmal from KubeCon London.
(03:10):
Hey, I'm Bret.
And we're at Kon.
We are Hi, Nirmal.
Nirmal (03:15):
I'm Nirmal Metha.
I'm a principal specialist solutionarchitect at AWS and these are
my views and not of my employers,but this episode is all about
Bret (03:24):
AI
Nirmal (03:25):
agents
Bret (03:26):
for DevOps and platform engineering.
Ooh.
So let's just start off realquick with what is an AI agent?
Okay.
So we've heard of ai, weknow ai, gen, AI chat, GPT.
We've talked about.
running LLMs, runninginference on platforms.
Yep.
And that we are managing the workloadsthat provide other people services.
(03:47):
Absolutely.
So how is AI agents different than that?
Nirmal (03:51):
This is a air in
terms of bleeding edge.
Yeah.
This is it, right?
Yeah.
Like we're a year ago.
No one
Bret (03:57):
had this
Nirmal (03:57):
term
Bret (03:58):
six months ago.
I don't think anybody's
Nirmal (03:59):
talking about it.
I'm very few people.
Yeah, very few people.
and we've seen it in the news alot of vendors and big companies
announcing Agentic ai, that's anotherterm's ai, so AI agents, Agentic
It's giving your LLM, like your chat,GPT or your Claude or local LM Lama.
Yeah.
(04:20):
Access to run commands.
On your behalf.
Or on its behalf.
Bret (04:27):
Yeah.
And we call those tools likethat, if you hear that word.
Tools.
Yeah.
That's like the generictool, like I guess a shell.
Could be a tool.
Correct.
Reading a file could be a tool.
Accessing a remote, API ofa web service is a tool.
Yep.
Searching could be a tool.
And so these tools what what makesthat different than what we've
been seeing in our code editors?
(04:50):
Yeah.
How is that different?
Nirmal (04:51):
I'm a platform engineer
and I want to build out an
EKS cluster using Terraform.
That's what we use.
So I'll ask let's say Claude or chat GBT.
Yeah.
I'm a platform engineer and I want tobuild a production ready EKS cluster.
Please create.
The assets I need, and itwill spit out some Terraform.
Yaml, right?
Yeah.
Bret (05:12):
And it's writing text.
Nirmal (05:13):
It's writing text.
And I can, I'll doubleyou a little button.
I copy that.
Put it in, or there'll be, if you'reusing Cursor, all these other tools,
you can put it into some TF file.
Yeah.
I can then take that and I can askthe LM what's the command that I
need to run to apply this Terraform?
To actually stand up the, what it's,what's described in this terraform.
(05:34):
It'll spit out, okay, you wannado Terraform plan and then
Terraform apply and all that.
Terraform in it or whatever, andI'll just copy those commands and
check 'em and write them myself.
So the LLM is not executinganything on my behalf.
On, on your behalf.
Agent would be defining a tool set.
(05:54):
So I could give, I could definea tool called Terraform or a tool
called Shell I could describe whatthat tool does in natural language.
Bret (06:05):
Okay.
Nirmal (06:05):
And then I can give
the LLM system a list of these
tools and their descriptions.
And tell it.
Okay?
Back to the same scenario.
I'm a platform engineer and I wantto create an EKS production cluster
using Terraform, and I want youto create it right for me because
it has the access to those tools.
(06:27):
Now it internal reasons, okay,I need to create some Terraform.
I need to validate it in somekind of way, and then I need.
I need to execute this Terraform.
Is there any tools thatI have in my toolbox
Bret (06:40):
In this case, sorry the
i is the, you're referring
to yourself as the ai, right?
Yeah.
Sorry.
It's no longer thehuman doing this, right?
No.
We gave it instructions and we sit back
Nirmal (06:48):
from the perspective, from
the perspective of the, LLM the
Gen AI tool itself, the LLM systemthat's the I in this scenario.
Yeah.
I, the LLM is deciding.
The Gen NI tool is looking at its list ofavailable tools and matching what it needs
to it, figure it, it's reasoning aboutwhat the end goal is and it looks and
(07:13):
says, there's this tool called Terraformthat allows me to use infrastructure as
code to deploy resources on the cloud.
That sounds like what I need.
Maybe.
And it.
Generates the terraform just likeit did the first time around.
It knows what command to run.
It generates the command and thenthe magic here, a little box will
(07:36):
show up and says, do you want meto execute this on your behalf?
You click the button, you click thebutton, and then it executes that
Terraform apply Uhhuh and it sounds verysimple, but it's a very different paradigm
in terms of thinking about how we interactwith infrastructure or systems in general.
(07:58):
Like broadly systems in general.
Because we are no, like in thisway of looking at it or thinking
about it, I, as the human, are nolonger executing those commands.
I am.
Trusting to a certain extent thatthe LLM can figure out what it needs
to do and giving it a guardrailset of tools to use and execute.
Bret (08:23):
Yeah.
And so we're giving the, we'regiving the Chaos monkey XI
mean, it's automation, right?
We could actually classifythis as just automation.
It just happens to be.
Figuring out what toautomate in real time.
Rather than the traditional automationwhere we have a very deterministic plan
of, steps that are repeated over andover again by a GitHub action runner
or a CI CD platform or something.
(08:43):
Yeah.
Nirmal (08:43):
And the agent part is the
piece of software that enables.
The LLM to execute.
Bret (08:51):
Yeah.
Nirmal (08:52):
and pull, pulls this all
together and one, so back to what I was
talking about with the infrastructureand there was a part where I said,
okay, how do we define what tools areavailable for the agent system to use?
and how do I want theagent to call those tools?
(09:13):
And reason about them, andthere's a protocol called
MCP Model Context Protocol.
Just outlining a standard way ofdefining the tools, the system prompt
for that tool and a description.
Bret (09:25):
And this is like an API where
you like define the spec of an API.
Nirmal (09:27):
It's a defined spec of an
API and the adoption of that API is
Bret (09:33):
just exploding right now,
Nirmal (09:34):
essentially.
Bret (09:34):
Yeah.
So we're to, to under if you're not,okay sorry, lemme back up a second.
That's a very valid point because that'sthe reason I wanted to record This's
a I don't wanna be a hype machine.
Correct.
But I'm super excited right now.
if you can see inside my, inmy enthusiastic brain, I've
only been paying attention tothis for a little over a month.
If you asked me two months agowhat an AI agent was, I'd say,
(09:56):
I don't know a robot that's ai.
I don't know.
I now think I've got amuch better handle on this.
I've been spending so much of my liferight now, deep diving into this, to
the point that you and I are talkingabout changing some of the focus
this year on, on all these topics.
Absolutely.
Because I think this is gonnadominate the conversation.
This is, these are, there's gonna bea lot of predictions in this and we're
not gonna talk forever 'cause it'sgonna need to be multiple episodes to
(10:18):
really break down what's going on here.
But we now have the definitions.
AI agents, what are tools?
The protocol behind it isessentially MCP right now.
Although that's not necessarily gonna bethe only thing, it's just the thing right
now that we're agreeing on by one company.
Exactly.
Nirmal (10:33):
We have to caveat this with, this
is like this is early like Docker days.
This is like
Bret (10:40):
Docker in day 60, right?
Yes.
Like we were like right afterPython in 2013 when we gave that
de, when he gave that demo, Solomon.
Like we all saw it and didn'tunderstand it fully, but it
felt like something right.
And like you and I both, that's whywe were early docker captains, is
we saw that as a platform shift.
(11:00):
we've seen these waves before over,over our careers of many decades
that we earned with this graybeard status with effort and toil.
And I feel like this is maybe the moment.
That was the moment of 2013 and that,and yeah, I'm not alone in that feeling.
yes.
Nirmal (11:18):
And there's just to be clear,
there's massive differences between
like paradigm shifts in terms of likevirtualization, cloud containers.
And the tooling of softwaredevelopment and systems development
and right systems operations,it's still in that same vein, but.
Yeah, we're not replacing,
Bret (11:36):
this is not replacing infrastructure
or containers or anything like that.
This is just gonna change the way we work.
Nirmal (11:41):
Correct.
And also it's broader thanjust like IT infrastructure.
Like this has implications withsoftware design or application,
like what an application does.
And I want to think ofthis as a teaser trailer.
To subsequent new series, episode.
A new series.
Yeah, absolutely.
We're gonna have to
Bret (11:57):
come up with a name.
I'm toying around with the idea ofAgentic DevOps, and just classifying
that as the absolutely as the themeof certain levels of podcast episodes.
You've heard it here first.
Heard it here first.
This
Nirmal (12:07):
is Agentic DevOps.
Another term we're seeing is AI four ops.
Again, this is early days.
None of this is like
Bret (12:13):
Yeah.
Set in stone at all.
Yeah, and if you're at KU Con todaywith us, if you were here at this
conference all week, AI was a constanttopic, but it wasn't about this.
It actually, there was only one talk inan entire week that even touched on the
idea of using AI to do the job of anDevOps or operator or platform engineer.
Like people are, what we're talkingabout at KU Con for the last three
(12:35):
years has been how to run theinference and build the LLM models.
And so we are just still usinghuman effort to do that work.
But this, I feel like I'm gonna draw theline in the sand and say, this is the.
month or the definitelythe year, that kicks off.
What will be a multi-year effort offiguring out how we use automated
(12:59):
LLMs Essentially with access to allthe tools we want to give it with
the proper permissions and onlythe permissions we want to give
it right to do our work for us.
In a less chaos mon monkey way, right?
Like less chaotic way.
Potentially.
Potentially.
It could, this thing caneasily go off the rails.
Absolutely.
I will probably reference in the shownotes Solomon Hike's recent talks about
(13:21):
how they're now using Dagger, whichis primarily A-C-I-C-D pipeline tool.
So he's talking, and a lot of mylanguage is actually from him iterating
on his idea of what this might looklike when we're throwing a bunch
of crazy hallucinating AI into whatwe consider a deterministic world.
Correct.
Nirmal (13:40):
I think with containers and cloud
and on the infrastructure APIs we have.
We were chipping away and reallyaiming at deterministic behavior
with respect to infrastructure.
Ironically, maybe notironically, I don't know.
Now we're introducing a paradigmshift that reintroduces a lot
(14:02):
of non-determinism right into.
A place that we have been fightingto non-determinism for a long time.
Bret (14:10):
We have been working
to get rid of all that.
And now we're, that's why I keepsaying Chaos monkey, because we're
throwing a wrench into the system.
That is in some ways feels like we'regoing back to a world of, I don't
know, what's the status of the system?
I don't know.
and this will probably be anotherepisode, I feel like this Agentic
approach where we're actuallycan have the potential to pit.
(14:30):
The LLMs against each other, right?
And have differentpersonas of these agents.
One is the validator, one is the tester.
One is one is the builder.
And they can fight amongst each other.
And it all works out.
It actually ha happens toactually work out better.
And so if you're like me andfor the last three years of
understanding, ever since GPT.
(14:50):
3.5 or whatever came out.
We all saw chat GPT as a product, andthen we started with GoodHub copilot and
we started down this road As a DevOpsperson, I haven't had a lot to talk about
because I'm not interested in which modelis the fastest or the most accurate.
'cause you know what?
they all hallucinate andstill even today, years later.
(15:11):
Code agents and we and you can seethis on YouTube, you watch basically
thousands of videos on YouTube ofpeople trying to use these models to
write perfect code and they just don't.
And so we in ops, but we look atthat, I think, and the people I
talk to even for years now are like,we're never gonna use that for ops.
But now my opinion has changed.
(15:32):
Yeah.
Nirmal (15:32):
yeah.
And I. If you're listening to thisand your gut reaction is, wait we
have like APIs that are deterministic.
Like you just
Bret (15:40):
Yeah.
Nirmal (15:40):
We can just call an API.
We can have an automation tool call anAPI to stand up infrastructure and like,
why do we need to recreate like anotherlayer that makes it non-deterministic.
And looks like an API but isn't an APIand you don't really know what it might
do or which direction it might go.
Yeah.
And you're feeling I don't know.
(16:01):
That doesn't seem like it wouldsolve any problems for me.
And it seems like it mightintroduce a lot of problems.
You're in the right place becausethat's exactly what we're gonna explore.
Bret (16:09):
Yeah.
Nirmal (16:09):
one thing for sure
though is it's here, right?
I and so I feel like as good engineers,as good system admins and operators
Bret (16:20):
are we enjoy, we love our crafts.
We, we look at this as an.
Art form of brain power and Right.
Reaching for perfectionism in ourYAML and in our infrastructure
optimization and our security.
Nirmal (16:32):
And we have a healthy
sense of skepticism on new tools,
new processes, new mechanisms.
Yeah.
When you, when availability of yourservices is paramount and reliability,
you want to introduce new things in a.
In a prudent manner.
And so we're gonna take thatapproach, but we're not going
to dismiss that this exists.
(16:54):
Clearly there's a lot of interest,energy integration happening,
experimentation happening and somepeople are already starting to see value.
Yeah.
and we're gonna explorewith you where that, goes.
Bret (2) (17:08):
Yeah.
This, just to be clear, this isKubeCon April, 2025 and almost
no one is talking about this yet.
It feels like it's right under thesurface of a lot of conversations and
a lot of people maybe are thinkingabout it, but I'm not even sure that
we're honest with ourselves around.
(17:28):
That this is coming,whether we like it or not.
And only because, yeah, not only, butone of the large reasons is business.
Okay.
Lemme back up.
You know how in a lot of organizations,Kubernetes became a mandate, right?
So there's lots of stories that cameout over the course of Kubernetes
lifetime of teams being told thatthey need to implement Kubernetes.
(17:51):
It didn't come from a systems engineeringapproach of solving a known problem.
It came down.
Because an executive decided thatthey read a CIO magazine article
that said Kubernetes was a coolnew thing and they did it right.
I hear this all the time.
I confirm this multiple times thisweek with other people, and I now feel
like we're not talking about it yet.
(18:13):
But I did hear multiple analysts say theirorganizations that they're working with
expect that we are going to reduce thenumber of personnel in infrastructure.
Because of ai.
the only way that's possibleis if we use agents to our
advantage, because we can't, yeah.
(18:33):
I still don't believewe're replacing ourselves.
I don't think the agents willever in, in the near term.
And as far as we can see out, let'ssay five years they will, they
won't be running all infrastructurein the world by themselves.
They can't turn on servers.
They maybe you can actually pixie bootand do a power on a POE or whatever, but.
Like we still need someone to give themorders and rules and guidelines to go
(18:56):
do the work, but to me, I'm startingto wonder if very quickly, especially
for those bleeding organizations thatare looking to squeeze out every cost
optimization they can of their staff,that they're going to be mandated to
not just take AI as a code gen for yaml,but to start using these agents to.
Increase the velocity of theirwork . And my, one of my stories is
(19:20):
over the last 30 years I do this intalks is every major shift has been
about speed, cost reduction in speed.
Sometimes we get 'emboth at the same time.
Sometimes they're one or the other.
We get a cost reduction, but wedon't go any faster, which is
fine, or we're going faster, butit's not necessarily cheaper yet.
Nirmal (19:35):
Right.
Bret (19:36):
And.
I feel like this is maybe the nextone where We're gonna be feeling the
pressure because all the devs aregonna be writing code with ai, which
in theory is going to improve theirperformance, which means they're writing
more code, shipping more, or need, orwanting to ship more code, potentially.
And if we're not using AI ourselves.
(19:56):
To automate more of these platformdesigns, platform build outs,
troubleshooting when we're in productionand things are problematic and we
don't wanna spend three hours tryingto find the source of the problem.
If we're not starting to use agents to,to automate a lot of that and reduce the
time to market, so to speak, for a certainfeature or platform feature then I don't
(20:17):
think these teams are gonna hire moreof us to help enable the devs to deploy.
What it could end up happening is weend up more with more shadow ops, where
the developers are so fed up with usnot speeding up to the, if they're
gonna go 10 x we have to go 10 x. Yeah.
If they're gonna go three x or whateverthe number ends up being in the reports.
And Gartner puts out like the AImakes it efficient, more efficient
(20:38):
for developers to, to code with ai.
And the models get better andthe way they use it is better.
And so they're shipping code faster andthey can do the same speed with three
times less developers, or they can just.
Produce three times more work, which Ithink is more likely, because if it's
the common denominator and everyonehas it, then that means every company
can execute faster and they're gonna,they're gonna want to do that because
(20:58):
their competitors are doing that.
So that's a's, that's a veryloaded and long prediction.
Nirmal (21:03):
That's a hypothesis.
It's, I think there'sa lot of predict here.
It's gonna take some time for us toeven chip away at that hypothesis,
but it's a good starting point.
If we're, but assuming that is likethe hypothesis that organizations
are looking at to adopt thesetools that's a great starting point
for us to help you figure out.
(21:23):
what they are, why they are, what they do.
Yeah.
And how to use them.
Bret (21:27):
This is this, by the way, a
lot a little bit of that opinion
of mine, and there's more to come'cause I've got a lot more written
down than we're never gonna get to.
But a significant portion of that isactually coming from what I've learned
this week from analyst whose job itis to figure this stuff out for their
organization and their customers.
Interesting.
And so I, I am a little weighted by their.
(21:50):
Almost unrealistic expectationsof how fast we can do this.
'cause we are still humans.
An organization can't adopt AI untilthe humans learn how to adopt AI and
the humans have to go at human speed.
So we can't just flip a switchand suddenly AI is here and
running everything for us.
At least not until wehave Iron Man's Jarvis.
Or whatever.
Like until we have that, we still haveto learn these tools and still have
(22:12):
to adapt our platforms to use them.
Yes.
And adapt our learning to use them.
And that's gonna take some time
Nirmal (22:16):
and.
I'd like to, like the partingthought for this is Okay.
And here, like you said, there's an underthe surface kind of thing happening.
Yeah.
So whispers,
Bret (22:26):
it's almost like murmurs and under
Nirmal (22:28):
the surface.
Yeah.
AI agent, AI agents, mag
Bret (22:32):
DevOps.
Ooh.
This is our ASMR podcast.
Moment of the podcast.
Nirmal (22:37):
Like MCP protocol.
Bret (22:38):
Yeah.
Nirmal (22:39):
you mentioned HA proxy on
the previous podcast, about load
balancing and figuring out thestreet, like token utilization of
GPUs and tokens and all that stuff.
and we had a conversation at the solobooth and they were talking about having.
A proxy for an MCP gateway, one ofthe things that we're seeing the early
signs of is these new workloads, right?
(23:01):
This agentic kind of thinking Aroundeven just executing the agentic platform,
if you will, And everything fromlooking at the tokens and optimizing
load balancing to inference endpointsor MCP is, doesn't behave the same
way as like just an http connection.
Necessarily.
(23:22):
And solo.
We were talking to them andthey have an MCP gateway.
We're seeing a little bit moreof a trend on AI gateways.
Is DO the project has an AI gateway andso this is not just another workload
and looks like just a web server.
And the networking andeverything is gonna be different.
Not dramatically different,but We'll, but drift different
enough that we need to be aware.
(23:43):
'cause even if you're not usingany of these tools, someone in your
organization is probably gonna say,oh, we need to integrate this stuff
into our software, to our right.
Whatever we're delivering.
And we'll need to knowit even at that layer.
So we're gonna also cover thatcomponent as it relates to.
The Kubernetes ecosystem, right?
And cloud native.
Bret (24:02):
Yeah.
I think this, if we had to do like anelevator pitch for this podcast, it would
be we now have a industry idea aroundthese terms agent, and then it uses an API
called MCP to allow us to give more work.
To these crazy robot texting thingsthat we have to talk to in human
(24:24):
language and not with code, right?
It's running code, but we'renot talking to it with code.
And that it can now understand all thetools we need to use and we can just give
it a list of everything I wanted to use.
here's my Kubernetes API, here'sall my other things that I, you have
access to, and here's my problem.
Go solve it.
And that paradigm.
(24:44):
Three months ago, two months agofor me, I didn't know existed.
And that's why I've been sittingon the sidelines with ai.
Like it's cool for writing programsthat mostly work in a demo.
It's cool for adding a feature tosomething I already have, but it's
not doing my job as a platformengineer or DevOps engineer.
It's just helping me write text faster
(25:05):
Then I can type into my keyboard.
And that was not that interesting.
That's why you didn't see a lot ofme talking about that on this show,
was it just wasn't that interesting.
This is an interesting topic for ops andfor absolutely engineers on the platform.
Nirmal (25:17):
Yep.
Bret (25:18):
So
Nirmal (25:19):
stay tuned.
Yeah.
And I, I love crazy texting robots.
Crazy
Bret (25:24):
texting robots.
Maybe that's the title.
TBD.
Alright.
Alright.
See you soon, man.
See
Nirmal (25:32):
you.
See you.
Bye.
Bye.