Can AI Agents Safely Become DevOps Engineers?

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
(00:00):
It's a bit of a shortcut tothink AI non deterministic.

(00:03):
We need to be deterministic, sowe shouldn't use AI for that.
I think the right analogy is AI has theability to replace human beings on certain
tasks or help human beings augment them.
So it's not about replacing, everythingthat people are doing, but in terms of,
of analogy, it's really about buildingan agent that can behave and think and

(00:25):
act exactly like a human being would do.
So, for instance, your platform engineeror your DevOps engineer, working for
you is actually non deterministic.
Humans are non
Humans are non deterministic.
Right.
This is the Agentic DevOps Podcast.
I'm your host, Bret Fisher, and todayI have my guest, Sam Alba of Mendral

(00:49):
on Sam goes way back, uh, one of theearly devs at Docker, co-founder of
Dagger, and now co-founder of Mendral.
So he's been focused on Cloudnative and now Agentic products
for 15 plus years at least.
And this is a wide ranging conversation.
We mostly focus on Mendral, the tool.

(01:11):
And the elevator pitch, I guess, that Iwould give is, this tool currently looks
at GitHub, it's focused on GitHub Actionsright now, but GitHub as a platform,
particularly on actions and the workflowsand the events that happen there, but
not just actions and it tries to act likea DevOps junior engineer essentially.
It's one, one of the closest, if notthe closest thing I have had as someone

(01:34):
who focuses on GitHub Actions andCI in there and just administrating.
And managing that platform for DevOpsteams in particular, uh, it feels like
the closest thing to an AI buddy thatis constantly looking at my GitHub,
finding problems, not necessarily in thecode that the application developers are
making, but in everything else around it.

(01:55):
The linters, the testing infrastructure,the GitHub action workflows and those
pipelines that are happening in there,the logging events, uh, anything
misconfigured in security that dependabotand renovate stuff, like just all the
stuff that is focused around the code.
For managing the platform forstoring code, running workflows

(02:16):
and automations on that code,and then eventually shipping it.
They operate in this middle spacebetween the developers and the
production deployments in infrastructure.
And I think that's a sweet spot for me.
Like that's exactly what Iwanted to talk to Sam about.
And we spend quite some time digginginto the use cases for this thing.
I actually run through some of my ownexperiences because I, they onboarded

(02:38):
me to the platform over a month ago, soI've been using this thing irregularly,
but as a single operator of over ahundred repos, now, in my own GitHub,
some of those are actual infrastructurethings in production for my own use, but
also a lot of examples for my courses,a lot of demos and sample tools and

(02:58):
sample code that I'm have to manage.
I treat this like my operations, right?
So I'm doing the dependabot, I'mdoing the security reviews of things.
I'm automating things with GitHub Actions.
So while I'm not necessarily thepicturesque large team, managing
large, big projects on there, I dooperate my own little business of one.

(03:19):
I operate that in a verysimilar fashion and I need help.
I need a lot more DevOps help thanI admit, because I need to operate
my business and I don't necessarilyhave time to manage the platform
automation stuff around my code.
And that's really the problem I thinkthat Mendral's trying to go after.
So we break down what it does, whatthey're doing as an early stage startup.

(03:42):
They just graduated YC.
And we then get into even more detailsaround AI and because they're building
today and they're graduating justa few weeks ago from Y Combinator,
they are one of these new AIcompanies that are building with AI.
They're using AI in the productand their product ships AI
features to us as the users of it.

(04:03):
So they're kind of thetriple threat of AI.
So we lean into that a little bit,talking about what they're using AI for,
what they see for the future of theirproduct, how do we going experience
these things in the future where we'reall operating our own AI harnesses
to manage our agents like Claude Codeand OpenCode and whatnot, when, how
do we use this tool in the future?

(04:24):
Anyway, we get into all that.
It's a great conversation.
I, I was excited to, to have it and wewent on so long that I, at some point
just had to say, okay, we we're gonnahave to stop talking and make this
another episode because we could havegone for, uh, hours, I believe on this.
So please enjoy this episodewith Sam Alba of Mendral.
Welcome to the show.
Hi Bret.
Thanks for inviting me.

(04:45):
we started this new company, Mendrall.
We are now building an AI DevOps engineer.
we, we basically see that, you know,the emergence of coding agents and
how they are shaping the, the futureof CI, CD and software delivery.
And so we're building anagent that can unblock steam.
and now thanks to AI, we canautomate certain things that
we could not automate before.

(05:06):
And so today we have an agent that,monitors and fixes, some of the,
software delivery issues, flaky tests,slow build, broken release processes.
So, let's get into it because I,so for, for the audience, Sam and
I talked, I don't know, a couplemonths ago at least, I think.
and we had talked about Mendral.

(05:26):
I'm building GitHub Actions coursesand content, and I've been a GitHub
Actions consultant for probablyhalf a decade at least, or more.
I think, actions I argue is like themost popular, certainly for open source,
but the most popular, CI platform.
I'm just kind of going to call itautomation platform from now on because
people do a lot more than just CIin there and deployments and stuff.

(05:47):
Solomon had clued me in to you allbecause we were staying in touch
around Dagger and, that's still going.
we started talking about yourfocus and it felt like a tool,
that I should have in my toolbox.
it also felt like something wheretypically with large platforms,
just like a cloud platform, likeAWS or Google Cloud, GitHub Actions

(06:07):
is a very raw platform to me.
I feel like it's got a lot of features.
It's got a lot of sharp edges.
But it also stops short ofwhat a, team typically needs
out of everything, automation.
And, we all talk about privaterunners and we talk about sometimes
like custom dashboards or, you know,org level statistics and awareness,

(06:28):
more, Observability into GitHubActions those are common questions.
and I don't often havegreat stories for that.
because I think that market historicallywhere people have tried to have very niche
little products, but pre AI, that solveda little pain point for GitHub Actions.
One, GitHub Actions wasn't quiteas popular five years ago, right?

(06:50):
It hadn't rose Jira and Travisand a lot of the other ones.
And at the same time, there wasn'tthe popularity, but these tools,
since they were pre AI, they werevery limited, I feel like, in
things that they could help with.
And so we saw, I saw personally littlecompanies starting out, like little
hobby products that almost felt like it.
They weren't true, they weren't YCCombinator companies trying to come out

(07:12):
and be actually full fledged company.
They were more like side projectsand someone figured out, oh, I
could spin up runners faster.
Which is like a whole new segment of themarket where there's now many companies,
that host your runners for you, andthey're faster and cheaper and better.
And then we've had little companiesexperiment with like GUIs or
web dashboards that do more thanwhat you might get out of the

(07:33):
basics of GitHub Actions there.
When we walked through yours,was very excited about it.
because rarely on this podcast, afterwe're about to hit 200 episodes of
this and the DevOps and Docker talk,and rarely is there something there,
especially in the Kubernetes land,that I feel like is meant for me.
That's something that solves myproblems, even as a solo developer

(07:53):
and as a consulting DevOps engineer.
So, could you talk a little bitabout like, when you both said, Hey,
look, we're going to start this wholenew company because we believe this
is the right time for this thing.
Like, Where was your headspace?
What problems were you trying to solve?
Yeah, so, is a really broad, area.
actually mentioned it.
not exactly for just running your test.

(08:15):
It's more like a workflow engine.
it's a lot about orchestrationand automation, and we
do a lot of things in CI.
CI was always a bottleneck.
every time, you know, you haveteams, as soon as you have some CIs
you start with some workflows, likea linter, a builder on your GitHub
Actions or some other CI systems.
And CI is always a bottleneck becausefirst of all, it's a central place

(08:39):
for integration and for runningyour tests when you ship your code.
When we started Dagger, we wantedto help people with this bottleneck,
building programming tools soengineers could actually solve
their CI issues more efficiently.
When we started, this new company,Mendral, Andrew and I saw an opportunity
to finally automate certain thingsthat we could not automate before AI.

(09:02):
You know, for instance, thereare some release processes
that you need to run manually.
every team that's growing hassome sort of manual operations in
their, in their release process.
And on the other side, they haveissues that they don't spend time on
fixing because it's never the priorityyou want to build your product.
You don't want to build your CI.
And the problem is theseproblems are piling up.

(09:23):
That bottleneck is even getting biggerwith aI because now you have coding agents
that push a lot of code to your CI system.
And so the problem, this bottleneck thatwas already a problem is getting worse.
And it's only the beginning.
And so we thought that now we have thetools to, well now the problem is bigger
one side there is more demand for it.

(09:45):
And on the other side, we finallyhave the tools thanks to AI to
solve these problems efficiently.
So the idea with Mendral is notso much to make your CI better and
equip developers, it's actually toreplace that work, to automate that
work entirely so those developerscan focus on their applications.
So that's really TLDRof why we started that.
And, and obviously software deliveryand CI CD in general is really broad.

(10:09):
There is security involved.
There is, you know, qualitycontrol regression testing, like
a lot of things goes through CI.
And so we started initiallyby building, we built an and a
data platform really, looks at
everything that's going on on your CIsystem, the logs, the code changes,
the past incidents, all of the events,the behavior of the team, like all

(10:33):
of this data, and then we startedto build specific agents on top.
That's really what we'rebuilding with Mendral.
We have a lot of, you know, whenwe started YC, we didn't know
exactly what we had a workingMVP, but we didn't know when, what
would be interesting for people.
We ended end up spending a lotof time fixing flaky tests and,

(10:54):
you know, reliability problems.
We have some teams actually using theagent to improve performance of their CI.
Like for instance, implement sharding,strategies on top of their pipelines,
to have some parallelization and sothey can ship faster because the CI can
complete in a shorter amount of time.
There are also some teamsusing us for security reasons.

(11:15):
Like for instance, our agent islooking at security alerts, look at A
CVE, see if it's exploitable on yourcode base, auto remediate if needed.
So there are really thingsthat you would expect a senior
DevOps engineer to do for you.
Uh, some of those teams actuallydon't have specific roles for that.
So use the agent it's a person.

(11:35):
Yeah.
I think I've been in tech for 30 years,20 of those years I've been dealing with
some sort of code management system.
And that doesn't always have anautomation system built, like we
didn't always have something likeGitHub where the automation system
and the code storagesystem were the same thing.
but I'm just thinking back, and I don'tknow a time where I would agree that

(12:01):
the CI or the automation system for codewas a first class citizen in the team.
It always feels like it's.
The, the, I don't know if this is anAmerican phrase, the, redheaded stepchild.
Like it always, yeah, it always feltlike it was just barely working.
And especially when it came to Jenkins,we're not sure if we could exactly

(12:23):
get, we don't know how long it wouldtake to recover from a server failure.
Sometimes those serversare under someone's desk.
Sometimes those servers are unique,often, especially with Jenkins, not to
pick on it, but it was the most popularand it was self hosted, so you always had
these special snowflakes of servers thatwere built typically by developers, not
usually by ops, where the professionalsysadmins were operating, right.

(12:46):
Where they were the ones creatingsystems to manage and control servers.
But often I would walk in and find thedev team had their special CI thing and
it might be running on someone's machine.
It might be under the desk,it might be in the closet.
And those days, I think because we'repretty much all using at least some sort
of cloud API for managing our automation.

(13:06):
And we might have runners in differentplaces, but we're, leaning into more
cloud stuff, although it's amazing howmuch I still see Jenkins and talk to
people that are still using Jenkins.
I often look at this as like, we haveneeded tools like this for so long to
help clean up the, to me it's like thejanitor because while you're saying
you're building the DevOps engineer, Iknow that every DevOps teams wish they

(13:28):
had another DevOps engineer to help them.
And feel like right now, at leastin this moment in AI, management
seems to think that they're goingto be reducing the number of DevOps
engineers not increasing them.
I just recently launched somethingcalled the Agentic DevOps Guild
for people, personal plug.
But this is a membership programwhere DevOps engineers are coming
in to help accelerate their AIlearning onboard AI tooling.

(13:51):
and I do these onboarding calls.
And so for the last three weeks I'vebeen having multiple, calls, with
engineers as they come onto the program.
And one theme that's maybe not amajority yet, but it's a consistent
theme is, DevOps engineers are worriedof unreasonable expectations of their
management to do magic and they don'teven know these tools yet, right?
Tools like Mendral that run ontop of maybe an AI infrastructure,

(14:14):
they don't yet know Claude Code.
They're just dipping theirtoes into copilot sometimes.
I think operators and DevOps engineersare maybe a little bit behind
application software developers interms of their expectations to onboard,
development or coding tools for AI.
And of course, just six months ago Iwas talking to people at KubeCon, that
were saying, well, we're never going toput AI into infrastructure management

(14:36):
because that's a non deterministicand we need full determinism.
And yet when I use your tool, whichfeels like I don't actually know your
architecture, we'll get into that.
But it feels like there'sAI in the background.
I'm writing.
I'm editing plans for executionwith it in human language, right?
Like, I'm not checking boxes.
I'm talking, I feel like I'm writingback I'm not literally chatting, but

(14:57):
I'm writing a plan, helping it editthe plan so that it executes properly.
I'm writing, paragraphs of sortof my rules for how it should do
certain things in infrastructure.
Um, so it feels very AI based eventhough I'm not, I'm not, I'm not
literally chatting with the chatbot yet.
But at the same time, I feel likewe're, DevOps engineers are just in
this unfortunate situation right now.
We're getting hit from all sides.

(15:18):
We're expected to keep operatingour understaffed janitored, or
caretaked, infrastructure, right?
Our struggling infrastructure.
we're worried that we're going tohave less staff here pretty soon.
We haven't yet completely consumed all theAI madness that the app developers have,
because we know that basically we're oneprompt away from production going down
because all we typically have the keys tothe kingdom, and a lot of teams have keys

(15:43):
that, you know, I was just talking to oneof the engineers onboarding that said,
I have the Terraform keys on my machine.
I have all the production AWS andcube control keys on my machine.
I need to learn how tosandbox this AI agent.
Or if it does one thing wrong,I am I'm probably losing my job.
Right?
And that just feels much higherstakes than an app developer that
got the wrong font on somethingand has to recommit a new PR.

(16:06):
It just feels higher stakes forall of us in the operations realm.
So that all being said, it feelslike you've kind of nailed it
without saying like too much ofa fan boy yet, but it feels like
you're solving a problem for me.
And so, for the audience, over the lastweek, I've actually been working with the
team to try to get more of my problemssolved mean, I have over 100 repos.

(16:27):
Majority of those are trainingrepos or sample repos for learning
Docker, Kubernetes, GitHub Actions.
Like, so there's lots of samplecode, but it's app code, right?
There's lots of sample code GitHubActions, kubectl, like all sorts
of various infrastructure stuff.
I have an impossible time to keep,like I cannot keep up with all of it.
I can't even keep up with the NPMupdates, much less, you know, CVE scans

(16:51):
failed, linting jobs, failed test runs.
There's just so much stuff that'shappening in the background.
I just, basically ignore it untilit becomes a problem for my students
or somebody bugs me in an, issue.
And I feel like finally with Mendral,from a user's perspective, it's giving
me an opportunity to shortcut that.

(17:11):
It doesn't fully automate everythingfor me yet, so I want to ask about
that, like where your vision's at.
But doesn't fully automate, doesn'tsolve all my problems automatically.
It just feels like it's raising the stakesof what's more important for me and not
to get distracted on the stupid stuffthat I might not need to worry about
because they're not really an issue.
They're just, you know, a failedlinting job isn't as important as
a failed test job, Um, so it's nicethat it elevates things that are

(17:35):
important and it also rolls things up.
it's like if you had this issuea hundred times in a hundred
repos, maybe fix that one first.
And so where do you see all of this going?
If this thing is, you know, helping,if it's giving intelligence and
insight, is what I feel likeit's doing to the failures that
I have in my automation platform?
Is this thing eventuallylike learning from me?

(17:57):
And is it going to start solvingsome of these automatically?
Like, where do you see that going?
Yeah.
So, yeah, it's an interesting questionand, is a lot to say about what you
said earlier about deterministic andnon deterministic and the use of AI,
pipelines that are deterministic actually.
and so.
Really briefly, I'll explain, justfor context, the, how Mendral is, is

(18:18):
behaving and how the product works.
And so usually teams, on board with asingle GitHub app install, one click
and then we start ingesting all theCI logs and events on our platform.
And we basically run the agent insuch a way that the agent can see
everything that's going on on your CI.
It's like, it's like someone lookingat your logs, your events, your code

(18:39):
changes, everything that's going on andlooks for opportunity to be helpful.
you mentioned linter failing, that'sone, can be one of your pipeline
is slower by 30% percent this week.
And it was faster last week.
Why?
It can be GitHub is down,like, uh, what's going on?
everything is broken.
The agent actually is able tospot these kind of problems.

(18:59):
and and tell the team, don't worry,the GitHub is down, it will come back.
It's not you.
Uh, they are, looking for manyopportunities to be helpful the problem
with most AI tools is that, You know,people tell you, oh, it can do anything.
It's very powerful, but you stillhave to find the right thing
that need to ask to the chatbot.
you know, One of the key, architecturewhen we started to build the product was

(19:21):
that we didn't want to give yet anotherdashboard, yet another chatbot to people.
And that's why it's an agent that joinsyour Slack and start working for you
exactly like a human being would do.
One thing that's important to, tokeep in mind is it's not for me, it's
a bit of a shortcut to think AI nondeterministic, non deterministic.
And We need, we need to be deterministic,so we shouldn't use AI for that.

(19:44):
I think the right analogy is AI has theability to replace human beings on certain
tasks or help human beings augment them.
We have some of our customers who alreadyhave DevOps engineers and big platform
teams, and actually they have Mendraljoining that team and augmenting them.
So it's not about replacing, everythingthat people are doing, but in terms of

(20:06):
analogy, it's really about building anagent that can behave and think and act
exactly like a human being would do.
So, for instance, your platform engineeror your DevOps engineer, working for
you is actually non deterministic.
Humans are non
Humans are non deterministic.
Right.
And the, output and the workthey do can be deterministic.

(20:28):
Like example of a linter,like linter breaks.
We have to understand mistake,what's the problem, why it broke.
All of that can be done by AI.
Then fixing the linter and pushing aPR that can be totally deterministic.
Uh, so the way Mendral works today,is we, didn't push the Cursor too
far in terms of automation becausewe care a lot about security.

(20:49):
Like we have fairly large teamsusing the agent in production.
And so we're very cautious aboutthe kind of, changes we make.
because the agent actually hasthe capability to open PRs and
push code to people's repos.
And So what we do is the agent willask every time when it wants to do
something, it will ask for the permission.
So you have to confirm that yes,you can go ahead implement that.

(21:13):
Yes, you can go ahead and do this or that.
Over time we have people asking us moreand more to actually automate more.
So if the level of confidence of the agentis greater than, let's say 85%, percent, I
want a PR to be open automatically.
because actually have a prettyhigh merge rate from the pull
request that module opens the pullrequests that are accepted by teams.

(21:35):
And the reason for that is becausewe have fairly long coding session.
So when the agent implementssomething, it's fairly similar to
what you Claude Code or Cursor.
But the main difference is that itwill wait for the CI to complete,
wait for the logs, to show up andactually wait for the confirmation
it actually fixed the problem.
So if it's fixing your linter, itwill wait to confirm the agent itself,

(21:58):
will wait to get the confirmationthat the problem was solved before
saying, yes, okay, my PR is ready.
Exactly like a humanbeing would do, right.
And so, that's the main thing.
but yeah, over time, like we'regoing to push for more automation
and we're going to do a lot more.
you asked also about the,um, the, learning phase.
That one is actually very interesting.
I don't know if you want to reactto what I just said, or if I should

(22:20):
expand on the learning aspect.
Yeah, let's talk about the learning.
So on the learning side,what's very important exactly.
Like when someone joins your team,the person doesn't have context.
So you start by watching the team, joiningthe team, making themselves useful, right?
but then over time, the personwill knowledge a lot of things.
problems, the tools, the best practices,and then it will get better and better.

(22:44):
Same thing with the agent.
So the agent, when it sees something,that needs to be remediated or it sees
problems, it can identify patterns andmaintain a list of what we call insights.
They are basically opportunities forthe agent to do something can be a
failure, can be a performance regression,security alerts, etc. All of that
is being tracked by the agent andconstantly being refreshed with new data.

(23:08):
That's why, for instance, if it spotsa problem the first time, the level of
confidence on the resolution might be low.
And then as problem appears severaltimes, level of confidence will
get higher to the point that we canentirely automate the resolution.
And that happens based on, the agentbeing able to, to constantly update
this living memory and taking thatinto account whenever it is a problem.

(23:32):
So when it sees the problem, itdoesn't just look at the problem in
the logs, it sees at all the contextof the problems that happened before.
It has the ability to look at, youknow, similar issues or similar
patterns that he saw in the past.
And I can dig into the implementationof that because it's actually
quite interesting how we architectarchitected this, this agent.
But then also team sometimes talkto the agent on Slack and say, oh,

(23:56):
whoa, whoa, whoa, you did that,but we actually don't do that.
Like let's say, you know, we,always follow that benchmark, so we
always use this tool or we alwaysdo this, and basically people react
to the agent and say, Hey you keepin mind we do this and not that.
And the agent also has a memorysystem and maintains its memory.

(24:17):
You can also review the memoriesand edit them, et cetera.
But yeah, the idea is to make thatlearning part entirely automated so the
team doesn't have to care about whatthe agent knows and what it does not.
and so that's really the key part,about the non deterministic way.
I think the right way to frame itis really think about what a human
would do and think about the factthat an LLM can resonate and navigate

(24:41):
through our problems in the same way.
you and so that's really what makes thiskind of automation possible today it
was not before LLMs were good enough.
Yeah, and it's subtle.
The more I feel like we're spendingleaning into just trying to use AI
in various scenarios, the more I feellike my mind expands to understand

(25:06):
where it actually could apply tothings that I didn't even think about.
Like I feel like to me, Mendral,the premise to a, to a non AI,
what I would call a blue pillar.
keep a blue pill and red pillmatrix, 25 year old dated reference.
but the blue pill people would be theones who were not necessarily anti AI,
but very, you know, very much not pro AI.

(25:28):
They're not leaning in hard, they'relike, yeah, yeah, it might help me
with my code completion, but I'mnot looking to put AI everywhere.
And that's fine, there's people like that.
And then there's the red pillars,and I used to consider myself the
blue, and now I'm basically all redand looking for new opportunities.
In this use case where we're saying,okay, yeah, it's going to be helpful in
CI at first when, you have this three orfour years of experience of hallucinating

(25:49):
agents and things going crazy.
That sounds like a wild premise.
But what I find is interesting, andI'm using this also in my own work with
things like Claude Code and OpenCodeand learning how skills and other,
other new tools all patterns that ushumans are using to help guide the
AI, give it more context, give it moreguardrails so that it won't hallucinate.

(26:11):
And we, like you said, we get tolike the 85% percent trust level,
you know, we get to this certainlevel of trust, particularly
with a certain model or a certainharness, and then we start to relax.
And I've noticed recently, like evenmy prompts are getting sloppier.
Like, I'm not promptengineering anymore, right?
Where we were consumed with that a yearago, where you got to have the best
prompt, the only way you're going to geta good, reliable AI is the best prompt.

(26:32):
And you go to these websites and theyhave all these listed prompts, and now
I'm just like, Hey, can you fix that?
I'm very vague, I'm very casuallike I would to an employee.
But for me, I know that, that one,that's because I've, I'm consistently
using skills, which are these largedocuments full of context, and that
I'm operating on larger and larger,whether we want to call it context
or memory or sessions, whatever.

(26:53):
In the AIs, I'm constantly Continue touse the same session, which it compresses
and I expand it and it compresses.
what I'm finding in my AI conversationsas I'm able to stick with the same
conversation, the same session muchlonger because now we have Claude
Code with million token context now.
And that just unlocks, I feel like for me,I mean, yes, I'm using more tokens because
I'm in the same session, so I'm not aRalph Loop, fanatic where I'm constantly

(27:16):
dumping context and starting fresh.
But I find that I'm actually able tohave these conversations that extend for
more than a few days, but into weeks.
And it remembers the thingthat I told it two weeks ago.
And that's where it starts, I think,to get really, really interesting.
And that feels a little bit like whatI'm playing around with, with Mendral,
I wanted to call out one feature that Ijust started playing with a couple days

(27:39):
ago and this very specific problem of,we've got so much, what's, the term we use
for teams, that have all the canon insidetheir brains and it's not documented.
I'm trying to think what that,
It's tribal knowledge.
Tribal knowledge.
Okay.
So, a while ago, and I love thatterm, tribal knowledge, like

(28:02):
these are the things that youonboard that new DevOps engineer
And they're, they're notgoing to just read docs.
they're going to make mistakes.
And then a team member is going to say,oh, no, no, no, we don't do it like that.
This is what we do.
I ran into that where it wasn'tdocumented in any of my agent files yet.
It wasn't anywhere in code or, documentedthat when I run superlinter, particularly

(28:27):
against GitHub Actions, because sinceI'm teaching GitHub Actions consulting
GitHub Actions, I'm all into that.
and I've always been very concernedwith the security side of it.
And now that we've had a rough year,GitHub Actions has had a rough year.
This last year has notbeen kind to, to me.
the GitHub team in termsof security attacks.
I'm not even going tocall them vulnerabilities.
I'm just going to call them sharp edgesand misconfigurations a lot of times.

(28:48):
because it turns outthey're not hacking GitHub.
They're just finding peoplethat didn't configure things
correctly on an open source repo.
And this is all happening.
So I'm always scanning my GitHub Actions.
I use actionlint and I usesomething called Zizmore, I
think that's how you say it.
Zizmore is Zizmore.
Um, And Zizmore is veryfocused on security stuff.

(29:08):
And so down this rabbit hole, I'm goingto go with you for a second, but what
was happening was they added a new ruleto the linter that says, from now on
you need all of your actions need to bepinned, Which for those of you out there,
if you're using GitHub Actions, hey, ifyou don't know about pinning, I've got
a bunch of videos and courses on that.
Like, you absolutely shouldbe pinning all your actions.
And rightfully so, now, the linter warnsme that I'm not, when I'm not doing it.

(29:34):
And in reusable workflows, there is apattern that surfaces with teams where
if they're controlling the reusableworkflow, they don't tend to pin to the
SHA hash from their calling workflowsbecause they control everything
centrally from the reusable workflow.
And that's the point of them is tocentrally control things so that
we don't have a hundred differentrepos that I have to update every

(29:56):
time a different workload changes.
And so this security tool doesn'tknow that, Mendral doesn't know
that that is not documentedanywhere in any of my systems.
It's not in my notion,it's not in my repos.
So that's tribal knowledge.
And what I was able to do is it wasflagging a bunch of these things and
saying, Hey, Mendral, Mendral was thinkingthe fix to this is that it needs to

(30:16):
pin the SHA, but in my team of one, thereal fix is no, we need to just write
an ignore rule for the Zyzmor linter.
And that's very specific tomy workflow, not something
that other people would know.
And so I could give, instead of makingthat documented somewhere and then
somehow connecting some MCP and someconvoluted way to let Mendral know it

(30:37):
has this memory feature where I can justsort of dump in this tribal knowledge
in just copy and paste or writing instories, basically user stories that
helps it guide the decision of how it'sgoing to treat future failures, because
what was the outcome was happening, andsorry for the audience, this is a really
long story, but I feel like it's verytactical and relevant as an example.

(30:58):
Is, in that scenario, if I had a juniorengineer, I would say, okay, please write
all, we're going to need to update allof these calling workflows with a hundred
repos now need this new line inside ofthe GitHub action for this particular
calling workflow that calls a reusable.
And then from now on, everyone inthe team needs to know we don't

(31:20):
shop in on our calling re workflows,we only shop in on the Reusables.
And with Mendral.
I was able to put in one memory and Ihaven't seen the outcome yet, but I'm
gonna ask you in theory, I guess thatmeans that from now on, every time
Mendral sees this error, its new PR plan.
That it's going to give me, isn'tgoing to say, Hey, I'm just going

(31:40):
to replace this with a SHA hash.
It's going to say, oh, I'mgoing to put in a new rule.
I'm going to basically ignore this in theZyzmor linter so that we don't, we know
that this is okay and we don't flag it.
Is that kind of theexpected outcome of that?
Yeah, that's right.
there, are many ways you can influence,the behaviors of the agent today.
And yes, indeed you canactually create a memory.

(32:02):
It makes sense when there isa pattern that you want to be
widespread and apply everywhere.
the sticky note that you want toput on the deck of your engineer.
That you want to makesure is never forgotten.
Another thing you can do is, forinstance, let's say you want to
migrate or to implement, some sort.
of, let's say you have this new linterthat you want to migrate from an old
one, or you want to implement a newsecurity tool or something like that.

(32:25):
So, the outcome will be deterministic,but asking that will be certain, more,
likely done in a non deterministic way.
maybe there are somethings that you don't know.
Like for instance, the agent has theability to do a web search or like look
into best practices or things like that.
And so what you can do is also askthe agent, Hey, this is my plan.

(32:46):
This is what I noticed and this is aproblem for me, I'd like to address it.
The agent will actually create aninsight for it and track it, and
propose an implementation when thelevel of confidence is high enough.
And and yeah, it has also theability to, update, findings.
So sometimes the agent, you know,you talk about tribal knowledge.
There is also some of the tribalknowledge and our own experience,

(33:08):
as as engineer with Andrea,Olivier into The agent itself.
So we constantly change the way we, youknow, Mendral does stuff, and that's why
it feels magical sometimes when peopleon board with it, because the, what they
actually don't realize is the agent hasour own experiences hardcoded in, in
some of the subagent that we, we wrote.

(33:28):
And so you can actually,influence that too.
And the agent has the ability to updateits own insights based on your input.
So if there are certain thingsyou disagree with, because let's
say, you know, if you would hireme as a DevOps engineer, I have
certain things I would like to do.
And you're like, well, no, actually,Sam, I'm paying you, so I need
you to do it this way instead.
can ask Mendral the samething, and it will be fine.

(33:51):
So yeah, the, customizability isimportant because that's the kind
of control you need from someoneyou would hire as well, right?
You expect this person to bringtheir experience, but also you
have your own requirements.
so so yeah, that's, that'ssomething we constantly, improve
and, make available to, teams.
Yeah.
And that's the kind of thing where, mean,testing tools have this problem, linting

(34:15):
tools have this problem, really any tool.
But I think like those are the twoareas where, maybe the linters might
be the worst, where when somethingnew is added, a new rule to a linter.
uh, I, think I had anothertool recently, actually.
Well, that was, that was a NeoVim tool
Okay.
It's like everybody's refactoringall their apps now with AI.

(34:35):
So it broke a lot of things andI spent an hour on it and AI was
able to solve it so much faster.
So, but not so not related.
That was just somethingthat happened yesterday.
Because I mean, the stories, if someone'slike in the trenches as a DevOps engineer,
that scenario that gave, while it's justa linter, not inherently high stakes,
I it wasn't a testing failure or aproduction deployment failure or anything.

(34:57):
But that's the kind of thing when I workwith DevOps teams, that's the toil, right?
That's the thing that they don't wantto have to go change literally a hundred
repos of microservices and back endthings because they've implemented
some sort of central reusable workflow.
But now the crux of all that isthey have this thing everywhere
and something breaks, and nowthey don't have the tooling.

(35:18):
To update.
They basically end up spending dayswriting scripts to create the PRs in
100 different repos to fix and then,automate the creation of the commit,
creation of the PR, the acceptance ofthe PR, and then the merging of the PR.
Because they don't want to literallyspend 3 or 4 days just mindlessly
clicking through 100 repos to do this.

(35:40):
And I've lost count of how many timesI've watched teams go through this
toil of, like, your week is going tobe fixing a stupid linter rule across
50 to 100 repos because the, devteams don't want to do it or whatever.
Or it's now our job to do it forsome reason, even though we're
not the application engineers.
Sometimes I have worked with teamswhere it's the DevOps engineers

(36:00):
literally implementing the lintingrules for the software engineers.
I don't know why that happens,but, sometimes we get saddled
with work that's not ours.
But that automation part,I feel like is another.
it's almost like a hidden feature thattalk about raising issues, we talk about,
helping understand the nuance of thingsand the patterns that you're seeing
so that it can be more intelligent.
But at the very end of this really isall about, to me, saving the toil of

(36:26):
manually checking repos for things.
Because, constantly strugglingwith workflow failures that
aren't rising to the top.
Because when you have a bigenough team, you've always got
workflows running, you're justconstantly inundated with workflows.
And the challenges is the alert fatiguein Slack or whatever tool you're
using because you're, working with.

(36:46):
just can't keep up.
So you start to say, okay, well nowwe're linterfailures, we're no longer
alert in Slack, so now we're goingto just remove those from Slack.
We're too busy to do that now.
We're only going to deal withdeployment failures or only testing
failures on these particular repos.
Like you end up having to forceyourself to ignore a whole series
of problems because you just can'thandle, there's just too much work.

(37:08):
And to me, the more exciting thingis that I might be able, you know,
and maybe someday I don't know if itdoes it today, if it would be able
to go and simply apply a hundred PRsfor that particular calling workflow.
because I name it thesame thing everywhere.
But my hope is that someday, if nottoday, like that would be able to get
automated so that I could just be donelike basically a hundred PRs later,

(37:31):
it's taken 30 minutes or something.
I didn't have to write a script,I didn't have to worry about
nuking or breaking some reposbecause I wrote the wrong script.
I can, give that to Mendral,which is exciting for me.
Yeah, I can tell you more actuallyabout, about that, because there
are a lot of things we're workingon today actually that some of
it actually might be available bythe time this podcast is out, so,

(37:53):
okay.
yeah, we're working, yeah.
quite, fast on these things becausethere is a lot of demand for it, you
know, as soon as people onboarded theagents and they see the value that it
can unblock thanks to the data layerand, the, which usually we refer to
with the agent harness, which is kindof the combination of the tools and the
context and the way we build that andkeep the agent accurate at any given

(38:15):
time, actually, this really the key.
But yeah, in terms of what works today,so the agent is not stuck with one repo.
Sometimes in CI, when you have,you use products, you realize
that repo maps to a project thatcould sometimes map to a team.
Some companies do it this way.
And the issue is, well, it's greatwhen you have, you know, that

(38:39):
mapping is great and works fine andyou're okay with those boundaries.
Problem is, most teams are not,and sometimes having a repo is
just an implementation detail.
And exactly like you said, you want tothink about something you would apply
to all of your repos, all of your codes.
need to think about this repobehave like this or like that,

(39:00):
etc. TLDR is, Mendral has beendesigned, so it's not tied to a repo.
It's actually tied toan organization today.
uh, that's, that has beendesigned one or many repos.
We have teams with like agigantic monorepo and that's fine.
And usually what they do isthey have this mapping inside
folders and sub directories.
and then you have some teams who have lotsof repos actually, and there is no mapping

(39:23):
whatsoever from a team or role to a repo.
And Mendral doesn't care.
Like insights can beapplied to many repos.
One thing that we were working onthat that's actually I'm very excited
about is, so we have this data thatcomes and that knowledge and the agent
running based on events and teamsasking for stuff and things appearing
and background tasks and all of that.

(39:45):
And so the agent constantly working foryou and constantly looking at its data.
We are building the ability foryou to have your own agents on top.
So basically you will have asimple way to define your own
sub agent on top of Mendral.
And so it will really feel likeyou're giving, some specific
instruction, like almost a missionto Mendral behind the scene.

(40:07):
In terms of architecture,it's actually a real agent.
It's like a real agent that'slinked to everything else.
And it's a sub agent that willbe called by main Mendral agent
whenever it needs to be called.
And so you'll have the ability to saywhenever, there is a code change or
whenever, like, we'll actually addmapping to other sources of data.
it could be whenever there is newexception on Sentry, for instance,

(40:31):
because we are actually expandingbeyond the CI logs also, I want to do
X, YRZ, want to do this, and I wantto be notified on Slack or not, you
know, because I don't want the noise.
So you'll have the ability to buildyour own, so I call that agent because
it's de facto like a real agent.
Some people might call that likeagentic workflows or something,

(40:54):
but it really depends whatwill be your use case.
but yeah, we want to, you know, I realizedthat It's actually a dream to onboard
an agent that is doing work for you.
and that dream is true today,actually, saying that in, you know, not
overselling the thing, but that works.
That said, every teamhas a unique CI, right?
every time you have a unique set oftools, unique set of, best practices,

(41:18):
you are using several externalservices and infrastructure, data
that you are managing somewhere.
And so all of that is unique to a team.
And you cannot have an agent thatwill figure this out entirely.
And so we're gonna make the abilityto plug those data and this,
this integration into the agent.
So it can actually Work in amore, specific way to your needs.

(41:39):
so that's really whatwe are building today.
And there is a lot of demand for it.
Um, actually two things.
There is a lot of demandfor customization.
and plugging to other datasources and other workflows.
And there is another demandfor, having more automation.
So
it's very interesting to me to see,and you said it well earlier, is.
it's interesting to me to see that, peopleactually trust AI tools a lot today.

(42:03):
Like some people ask us like, canyou actually for, some of those
problems, like automaticallyopen and merge the pull requests?
And like, well, you know, it's going fast,
So, so
Well, that's the,
talk.
we'd get there.
Yep.
the trust, right?
Like if you had, like on the, onthe live stream, we had a couple
weeks ago I was talking about thatI don't remember the last time.

(42:24):
My coding agent locally.
and that use, I use Opus, Iuse sonnet, basically any state
of the art model I use GPT 5.
4. I have all the subscriptions.
I don't remember the lasttime I would have classified
it as a true hallucination.
it has been months.
Right now it gets things wrong,but it's usually because I was
lazy and didn't give it context.
Right.
it just made the wrong choicebecause it acted like its first week

(42:45):
as being the engineer on my team.
And I blame myself for that, right?
it's a me not you problem.
And so that has definitely, for me, Ihave certainly established more trust.
Like I might be at that 85% percentlevel like keep talking about.
And if I did this in CI,if I was using Mendral.
and, you know, I don't technically Idon't have to know what agent, what models

(43:07):
you're using on the back end, right?
Like, I just know this thing isable to give me plans reliably that
I agree with, and it gives me thisimplementation plan to fix a problem.
And I go, yep, that soundslike it's great, a great plan.
That's exactly what I would see ina PR from a junior engineer while
I'm reviewing it, right, I justhappen to be reviewing this pre
pull request, implementation plan.
And if I did that a hundred timesover the first couple of months of

(43:29):
onboarding a tool and I never saw, or,you know, almost rarely ever saw anything
wrong and those wrong things weren't.
That wrong, they werejust maybe a preference.
I would absolutely be more trustworthy.
And I can imagine myself right now ina team where if I'm onboarding uh, an
orchestration agent engine like Mendral,where I'm going to tell the team, okay,

(43:49):
they have this new automation featurethat they've just launched so that we
can fully allow the AI to make the PRsand then commit them automatically.
And maybe it's two differentmodels with two different contexts.
That one makes it, another one reviews it.
Right?
We've been discussing lately abouthow, like, when are we all going
to be comfortable with the AIwriting the software, and then the
different AI reviewing the software?
And does it need to be a different model?

(44:10):
Does it need a different system prompt?
Like, you know, these are questionsthat are coming up actually
within uh, the guild meetingsthat we are having every week.
And I can see myself very quicklysaying, well, we're going to allow
it to auto merge any linting failure.
Because it hasn't been wrongin its implementation plan in
two months or whatever, right.
And we're going to tiptoe in with lowstakes stuff and then, you know, maybe

(44:33):
we can make a rule where Dependabot, thisis another big pain point for me, right?
When you have dozens and dozens ofrepos and Dependabot, or renovate
comes out with a minor update.
And typically, it's, a wonderful life ifyou're in a monorepo right now, because
if you're in a bunch of microservicerepos, you just have sprawl of repos, you
know, one JavaScript dependency moduleupdate, and suddenly you have 20 PRs

(44:54):
to approve and they're all the same PR.
So they're, and I don't believethat Dependabot has a, just do this
for me in all 20 repo mode, right?
it doesn't automate that process.
So then I'm literally going throughand clicking, I think I'm probably
at the point of comfortable withmy local AI saying, Hey, just
use the GitHub command line tooland look through all of my repos.

(45:15):
this would be a very long prompt, butlook through all my repos for this one
particular Dependabot for updating.
It's got this exact title becausethey're all going to have the same title.
And if you see that in there, goahead and accept it and merge it
with the GitHub command line tool.
I do think I'm at that level, but,so if I'm at that level, and I'm not
even the most aggressive AI personI know, like I'm not even, I haven't
even installed OpenClaw, right?

(45:37):
I haven't installed any of thesecrazy, orchestration engines.
I'm sure that there's lots of peoplethat are absolutely comfortable with
this and I'm, I am all about thisbecause there has never been a good
time to tell the story of DevOpsautomation platforms as a thing that I
can implement in a reasonable amount oftime for a reasonable amount of money.
everyone that I know that'sstruggling with this, you

(45:59):
know, don't want to entirely.
invested in a proprietary tool sometimesand they want to open source everything
themselves, which is the tough road aheadfor anyone that's trying to do that.
because I'm doing this in my courseswhere I'm trying to explain how
to do all these AI workflows thathelp you do a lot of automation.
And I can tell you that tools likeMendral are the easy button versus
trying to do it yourself with a bunchof more workflows that all do the

(46:21):
things that Mendral's already doing.
But this is, there's, you know, we'vehad all these different automation
engines over years that have tried tobecome a market dominant force for just
easing the toil on DevOps engineers.
And I never feel likeanyone's really cracked it.
I don't go into any shop and find thatthey're all consistently, or a majority
of them even are using one tool beyondGitHub Actions to automate things.

(46:45):
And very rarely do I seepeople with GitHub Actions.
Only the, like most mature teams thatI see in GitHub Actions are doing
things where they might have like acentral repository of actions that
are aggressively doing things on otherrepos in an automated way, like checking
the security settings across all myorg repos in order to make sure that
we're not exposing a security riskby allowing, you know, forked pull

(47:07):
requests to automatically run actions,for example, which is one that's really
biting people in the foot right now.
Like that thing needs to be locked downon every repo, and there's no tool built
in to tell you what that setting is, soyou have to literally either hand code
something check it yourself or createactions that do all this automation.
And it feels like we're right at thecusp of just, I just write a, either

(47:28):
a skill for this or I have a toolthat automatically does this for me.
I do like this agent idea because Iwould like to put in like a DevSecOps.
That's the next one for me where Iwanted to go and actually like, look
at these org and repo settings andreport them, port back when it finds
one that's not set properly for me.
Maybe something that I don't wantto be a linter, but I want it to be,

(47:49):
you know, security checks on my reposand on GitHub itself that an engineer
doesn't want to have to run that scriptmanually every day and then do all the
work of fixing the things every day.
I'd rather just an AI do that,'cause it's a binary thing.
It's like either this setting'schecked or it's not, and if it's
not checked, you need to check it.
This is a very
basic thing.
Yeah.
there are a lot of things like that.
Like for instance, when you start,putting together some compliance

(48:10):
like, you know, soc 2, for instance.
there are a lot of controls that, youstart implementing that very important
and sometimes time consuming as well.
Like you need to have a humanchecking that constantly or regularly.
and so yeah, thiscompliance is another thing.
And security in general, is somethingthat uh, we think we can help with.
you mentioned Dependabot and the noisethat it causes and, we had some early

(48:33):
prototypes of having some rules that are,when some rules are met, actually the, I
don't even want to know about the PR andI just like merge the thing and so so we
if it's a point release, yeah, ifit's a patch release, just do it.
Just do it.
Yeah.
Yeah, especially the buildworks like the CI passes, like
I don't want to deal with that.
And so yeah, we had some prototypeswhere we automated entirely

(48:56):
some of those, use cases yeah.
And I think they, from the feedbackwe're getting from our customers,
people are getting there actually,um, they are getting ready and
so, yeah, it's very interesting.
And think very soon you'llhave the ability implement
your own DevSecOps agent.
we're going to keep addingmore agents ourselves too,
because magic, plug and play.
like it when you're on board because youdon't have a lot of time evaluate another

(49:20):
tool so you prefer to onboard it, it runon the side and see if it's valuable.
And then if it's valuable you wantinvest more in it, which means
defining your own agents, and thenon top of that, we do orchestration
on top of this fleet of agents.
so they are called at the righttime with the right context.
You mentioned somethingalso about hallucination.
I think it's very interesting becauseI think you're right that LLMs got

(49:43):
better recently in the last few months.
And, they're definitely a lot morepowerful in term of how they think
and the kind of mistakes they make.
we actually built a lot of, engineeringaround the LLM to deal with this.
And so we realized that, you know, theera of the rag is kind of over now.
Like you don't need to pull likean entire context and try to guide

(50:05):
every single thing that the LMshould do, or, or should consider.
Instead, the prompts are gettingsmaller and you put a lot more
intelligence in the tools themselves.
and what I mean by intelligence is, forinstance, in the case of Mendral, we
do static analysis on the tool calls.
And so we are able at runtime whenthe agent calls some tools to detect

(50:27):
some drift from initial mission.
and we, nice thing is you can, you havethe ability when you do that correctly,
with result of the tool call to actuallyinfluence the thinking of the agent.
So it's almost like Yeah.
That's great.
the agent starts somewhere andstarts doing something, and then at
some point he goes on the side onsomething, you know, useless calling

(50:48):
comments or, you know, some thingsthat are not actually very useful.
You can actually steer it back to plan.
And so we spot those thingsin the tool calls and say, no,
no, actually don't do that.
Do this instead.
So for the agent, it's like I'm callinga tool, and it's weird because the tool
is telling me to do something else.
So that's really works.

(51:09):
But the nice thing at the end is thatyou, have almost a dynamic prompt.
So instead of having like very longprompt that you pass initially.
You let the agent pull the context,and from that context, you can actually
dynamically change that prompt at runtime.
And that's really whatgave us best results.
coupling that with sub agents,it's actually very important.

(51:31):
Claude Code is doing that really well too.
Like when it
explores codebase, it's doingthat in a sub agent because
you don't need the whole thing.
All context of the explorationback into the main loop.
You don't need that once you got theresults, you want just that result to be
in the cause of your LLMs moving forward.
And so that's also what we do.
And so so yeah.
It's very interesting to see that likethose patterns being LLMs getting better.

(51:56):
I think all of that is gettingcloser and closer to, the kind of
work that a human could do For you.
And so, yeah, that's reallywhat motivated us to start this
company in the first place.
Because you can sort of foresee, I canimagine that you can foresee like these
things are getting better at a steadypace and it's not just the models that are
getting better, we're understanding betterhow to, you know, because we don't just.

(52:16):
you know, taking a blind model andputting it into a random situation
where you need to have it write code.
it's like, to me it was like bringinga kid to school straight out of
university who just learned how toprogram in that language and sitting
in the chair on day one and saying,okay, now, write me some code.
Commit it to the do all thesethings without any context, right?
We just didn't understandthat we needed context.

(52:36):
And of course we had smaller contextwindows, so that was also a struggle.
But like for me recently, I use OpenCodea lot more than Claude Code now.
every time I keep trying to go backto Claude Code, even though it's
got some really cool things thatthere's not yet in OpenCode, they're
back and forth It's my two favorite.
And I talk about this a lot, Ithink I probably mention it on
every show nowadays because I'mjust obsessed with it all day long.
But started to integrate LSPs, which Ithink is leveling up code accuracy the,

(53:01):
we don't really see it happening, butjust feel like my OpenCode, because LSPs
are in there out of the box we're withthe Claude Code, I think you have to
actually add the extensions manually.
where OpenCode dynamically injects itwhen it sees a language in real time.
And I just feel like OpenCodefor me is a little bit better.
And I don't have a way to prove thistheory, but I think it's maybe because of
that LSP background where it's constantlyhelping tools, helping keep it on the

(53:23):
rails essentially, of how it's writing.
And this is all I feel like leadingus to what some of the experts out
there for the last year, over a yearreally, I think last year I saw a great
talk from the president of, I thinkit was Gradle, talking about not only
is AI coming for DevOps and operationsbecause it just has to, because the
software development lifecYCle can'tbe optimized for agentic coding without

(53:49):
the rest of the pipeline also improving.
if we're going to improve 2x,the entire pipeline, the entire
lifecYCle has to be improved, 2x.
We can't just have developers triplingtheir PR rate and then the rest of
us all act like nothing's changed.
we're going to have to accelerate,we're going to have to use AI as well to
accelerate, unless we're suddenly goingto double the number of ops people, which
nobody, I don't see any teams doing that.

(54:10):
they're going to have to use AI.
And in that premise, if we're possiblyat this moment in time, if we consider
these more junior engineers, not somuch senior engineers in terms of their
overall intelligence and accuracy.
If that's the case, then we're going toneed better guardrails, we're going to
need more testing and, you more rulesand more guidelines for them to follow.

(54:31):
And then we're also goingto need to remediate faster.
A lot of times when people talk aboutremediation of failures or recovery from
failures, at least in like the Kubernetesworld that I live, a lot of people
are talking about that from just beingable to detect failures in production.
Where I live is more in the CI world, andthat remediation is more important to me.

(54:51):
And I feel like that hasn'tbeen clearly unlocked.
And it feels like tools like Mendralare way forward in that regard in
terms of they're going to help merecover from failures faster so that
I can, you know, this is all in Git.
like we were this entireconversation is talking about Git.
GIT is protocol thatallows me to undo mistakes.

(55:11):
So if we're so apprehensive and wesometimes in DevOps, especially in
ops, we get so apprehensive aboutchange and we're constantly fighting
our one side of our brain wants tonot change anything because it works
right now and we're just fine with it.
The other half of our brain's like,this is all needs to be better.
It could be so much better.
Let me fix things, let me improve things.
And that tension is justnaturally in our brains.

(55:32):
And this feels like away for me to go faster.
Also, understanding that maybe it'snot going to get it correct 100 percent
of the time, when it fails, it'salso going to elevate the failures.
It's also going to find the failuresand fix them faster so I could go faster
if that means, well, that outage mightbe 10 minutes on that test failure.
we broke a test, we fixed itwithin 15, 20 minutes, nobody

(55:55):
even noticed, we're all fine.
So what's the real risk here, ifthis thing is really just committing
PRs against my infrastructure,it doesn't feel like a huge risk
because I'm not having git repo.
Right.
at least that's how I take it.
is the stakes are actually alittle lower because I'm in git.

(56:15):
thing you said actually that's veryinteresting is, just about going faster.
I, heard people telling me thatthe production of code has been
solved it's going to be fine.
And I think it's about to be solved,honestly, using a lot of AI ourselves.
But I think there are some physics that,are ruling the world that are not going to
change, which means, you can write a lotof code in parallel, but at a given time,

(56:38):
there is only one version of your codethat goes to production at a given time.
And so you, you always needthat no matter what happens.
and I'm sure the softwaredelivery is going to change.
We're going to participate to that change.
It has to change, that's for sure.
Like when you even just a detail.
But when you look at how PRsare being reviewed today?
Like, realize that maybeit's not the right, paradigm.

(57:02):
be done differently or possiblywith different tools actually,
But in any way the
to do human q a, by the way, Q, not Q,
A, QA, like we used to do human qa.
Now, if you're doing humanqa, your legacy, like if
you, yeah, this could happen,
you still need to do, it's actually avery good example because you right?
it's automated you still need to do it.

(57:24):
And I think the same thing happenswith, software delivery at a given time.
You need an integration loop that'sgoing to make sure that you can
actually ship the one single versionof your software at a given time.
you don't have this constraint when youpublish code thanks to Git, actually,
and you know, you can open any branch,any PR, anything, but you still need
to integrate them at some point.

(57:45):
And I think that that's going to stay.
and so even though all the processesand the way people work is going to
be different, but, yeah, no, so that'sactually very interesting, to hear
in terms of going faster and what areall the things needed to go faster.
It's not just about writing code.
Let me ask you real quickon a very specific subject.
our friend Victor Farcic, you mightknow him, DockerCaptain alumni and,

(58:07):
YouTuber, who has been doing a lotof AI videos over the last year.
We had a conversation recently wherehe believes, and I totally agree with
this statement, that the harness, ourlocal harness, whether that's Claude
Code or OpenCode or Copilot and VSCode or however you want to roll, That,
that's going to be the way that not justdevs, but maybe DevOps and operators

(58:32):
and platform engineers and SREs.
This is going to be like our window tothe world, and the more we can stuff
context into it, the more we can give it.
learned yesterday from another DevOpsengineer that's making his own, he
has his own skills, essentially.
He's not using skills for this particularthing yet, but he's creating a me skill
and that's how I this, and I'm goingto practice this in the next week and

(58:52):
see if it helps there's a rising theorythat we shouldn't be telling the AI.
Hey, you need to be an expert marketer.
You need to be in this role,you need to be an expert SRE.
But it's more important that we tell itabout us and what, how we work and what
we expect, rather than tell it what it is.
And so one of these engineers in the guildwas saying that he's had a lot better

(59:14):
output of his LLM by describing himself.
And he, I think, I think he injects itas a command or something in his harness.
I see of that as more of a skillthat I just need to flood into each
conversation at some point so that everysession, the AI knows more about me
and my role and how I want to operate.
And so getting back to Viktor, Viktorfeels like this harness that eventually

(59:35):
will know us personally better.
It will have maybe it'smore docs from the team.
It'll have more access to.
confluence or Jira or Notion orwhatever you might have that gives
it more context about environment.
So thus, that seems to be the bestway forward for interacting with all
of our systems, not just our code.
This the theory he has.
I think this feels like apattern a thing that I want.

(59:55):
Do you see Mendral as beingsomething rather than in Slack
what is the primary chat interface?
You know, obviously there'sthis great dashboard.
Probably going to alwayshave the dashboard.
I don't gravitate to Slack right now.
Like, I'm not someone who jumps intoSlack to have a conversation with AI.
I always think aboutit being in my harness.
do you see it being like an MCP or an A2A?
I don't really understand exactlyhow agents talk to each other.

(01:00:18):
I don't actually currentlyhave any of my own.
How does that, what does thatfuture look like for Mendral?
So definitely it's very interestingtopic and, so short answer is yes, yes.
I do also have a single interface andI, constantly, tweak it and improve
it and customize it to, to my profile,my needs, my skills, all of that.
And so, so, yes.

(01:00:39):
And, and today Mendral has a lot ofknowledge and he's building a lot of,
fairly large context at a given time aboutyour state of your software delivery.
And we started to get people asking us,Hey, can I use that knowledge locally?
Because we are actually doinga lot of things locally.
You know, I mentioned softwaredelivery is gonna change.
And one, I think one of the biggestchange that's gonna happen in CI/CD is

(01:01:01):
a lot more things will be done locallyon the machine, that's for sure.
And so, you know, starting with thereviews, of code, like Anthropic released,
and there are more and more people doingreviews locally before they land and
I think that's only the beginning and,I think, um, eventually, we can expect

(01:01:22):
the, code to land on your CI to be moreand more perfect because there is a lot
more thinking and calls and and back andforth happening on your local machine.
And so, yes, indeed, we wantMendral to be available, locally.
so it can be included entirely withthe way you do work already and
also bring this knowledge beforeit lands on CI, kind of sucks today

(01:01:44):
that every time you need to, thereis this big surprise that happens
when you have to kick in CI runner to
verify certain thingsthat you cannot verify.
Otherwise, I think that needs to change.
And so yes, in terms integration,
right?
Like, this is why you were workingon Dagger for seven years, yeah.
Exactly.
that's exactly right.
And, in term of implementation,you mentioned, MCP, A2 A, I think,

(01:02:06):
think what's, where most peopleare moving right now is by having,
really good, well documented CLIs.
' cause again, LLMs actually performLike the human brain, not at
the same level, but it's verysimilar in the way it thinks.
And so do you think your software engineerwould behave better with an API and
MSCP server or with a great CLI, Andwhen you look at Claude and skills, it

(01:02:27):
actually works much better by using CLIs.
and so I think, um, MCP isgood for certain things.
but I see it.
Exactly the same as anAPI API MCP same thing.
it actually works better usually whenyou have remote MCP servers, but remote
API or remote MCP server, at the endof the day, it's not very different.
Uh, and so I think the best way tointegrate with some of those APIs is

(01:02:49):
to have a really good CLI That yourharness, as you said, can integrate with.
And so that, that's what we're planningto do eventually, is to have, A-I-C-L-I
that gives you all the capabilities asyou could get on Slack or the dashboard.
Those are just front end to theengine that we run on the backend.
So yeah.
short answer, yes, and I'm glad youasked because, I think it shows also

(01:03:10):
that you're, pretty advanced withyour own harness because not a lot
of people are not like that today.
You know, sometimes it feels, whenwe're talking to each other, we're
like, yeah, of course it's obvious.
You know, need improve your skills,your customization, your profile.
And I do that too.
It's just that so many people outthere who are still figuring out what
they should do with AI, you know,

(01:03:31):
Well, and the reality is that sixmonths ago I knew none of this, right?
We didn't even have skillsuntil six months ago.
Like there's just, yeah, there is so much.
And we're all, it's the only reasonI think any, any team that is in
our sort of level of maturity orbeyond is getting productive is
because the AI is doing the work.
Like we are, we're so consumedwith having to learn patterns.

(01:03:52):
I'm reading constantly.
I've had to adopt a ReadWise reader inthe last year as critical part of my
learning workflow where I just dump everytweet, every blog post, every YouTube
that I think is interesting around AIthat I think I probably should consume.
I throw it in a read, shout out toReadRise Reader, uh, Readwise is the
company the product reader is theapp that you can consume all these

(01:04:15):
different types of media in one place.
And I can log, I can have it summarized.
I can tag it, I can do allthese great things with it.
It's kind of like to me, the graduationof the old Feedly or the Google reader
or the RSS readers that we used to have.
But it's kind of becoming like my podcastand YouTube player at this point too.
Instead of using the algorithms, Ijust dump things into it that I think
are interesting from the algorithm.

(01:04:35):
And then when I want to befocused and learn, I go there.
So that's like a hack for meto keep up because well, and
I'm not nobody's keeping up.
Like this is a crazy time.
it's insanity.
no one can actually know it all.
No one's an expert yet.
So it's exciting to see that becauselike, do you know specifically how we
would implement that in terms of mehaving a Claude Code in front of me?

(01:04:56):
How would that talk toMendral, like getting into
your architecture for a second?
is that an A2A, I'm notsmart enough to know.
Is A2A the thing that would allow myagent to somehow talk to like an API
on your system that has an agent.
Do you know anything about this stuff
yet?
so, I mean by a CLI is, I think agood example of that would be look
at the difference when you use eitherOpenCode or Claude Code, or even Cursor.

(01:05:20):
Like look at the difference of interactingwith GitHub with on one side, uh, GitHub
MCP, which is actually quite good.
And on the other side, the GHCLI.
I don't know if you tried bothactually, but, I invite you to do it.
you'll see that there is a hugedifference between the two.
Basically TLDR is one works reallywell, the other is a bit clunky.

(01:05:44):
I'll Let you guess which one.
Well, the CLI is great because whenyou look at the agent session and how
it navigates the CLI, it basicallyreads, reads the help, it requires
your CLI to have really good errormessages so the agent can actually
react from it, basically, it's the samething like you, your, your CLI needs

(01:06:07):
to be intuitive so a human can use it.
If a human can use it well withoutmuch documentation, it means an LLM
will use it pretty well because itcan read those messages and, and can
interact with the CLI and so yeah, whenwe do, we got a lot more success with,
really good CLIs when we integratewith services than anything else.

(01:06:28):
And then it's obviously your job as,I mean our job for Mendral, to make it
work Well, you know, with our API and theCLI, but for the user, and for both the
user and the agent, the, you know, yourlocal agent, like, it doesn't matter.
the, the CLI,
the CLI works well, is able toexpose the right context is able
to grab the right input and output,and have, the right integration.

(01:06:51):
All of that is obviously complex,but it's the problem of the person
building the CLI and the API behind.
So, yeah, I wouldn't think too muchabout the best way integrate, like
you can actually write a skill, thatsays, Hey, you have the CLI, X, you
know, and the CLI is to do this.
some context.
That's it.
The integration is done and it willwork much, much better any MCP server.

(01:07:15):
Yeah, I just realized while you'resaying that, maybe a question I should
start asking products that are onthis Agentic DevOps podcast is, are
you prepared for the AI to sign upand use your tool versus the human?
Exactly.
that's like now we have Stripe doingthis, we have, people booking their planes
with AI and OpenClaw's doing all this.
And I'm wondering for companies thatare built in the AI era, and actually

(01:07:38):
using AI just like AI centric, right?
are they also, presumably thinking,well, let's see how far Claude Code
can get just signing up and using thetool implementing our preferences since
presumably your local harness knowsmore about you and your infrastructure
than Mendral does on day one at least.
usage scenario that you're considering?
Yeah.
And, I think that's, you know,there are, even startups right now.

(01:08:01):
Like, I think there were a few in our, ourYC Batch that are specialized in giving
access, like to, to all the services outthere, like booking a flight or anything,
giving it access to agent harnesses.
so for instance, creating actually A CLIon MCP server to interact with some of the
services that today are only accessiblethrough a dashboard and some clunky ux.

(01:08:22):
so yeah, I think the web of tomorrow isgonna be adjunct and, it doesn't, very
interesting because it's not so importantanymore to make, interface that's really
good beautiful for humans, but it's moreimportant that it's ergonomic to an agent.
so yeah, definitely.
right now for us specifically, Mendralis giving you like a turnkey solution

(01:08:43):
in a few clicks, but I realized thatthe integrations with other agents is
gonna be key moving forward becausewhat everyone wants is not a single
agent, it's actually a team of agent.
So even when you use Claude Code, youare actually already using several agents
underneath, locally on your machine.
And when you call to remote, services,you already have a team of agents with

(01:09:04):
some agents running on your machines,some other agents running remotely.
And I think that's the future.
That's what people want.
Yeah.
All right, we're going to dosome rapid fires real quick, but
before that, I think I have one morequestion on the future of Mendral.
Like we've been talking about GitHub.
Not everybody that listens to thispodcast is maybe using GitHub.
We've even got some people, Kurt, I'mhearing news recently that some teams

(01:09:24):
are leaving GitHub for various reasons.
Are you, like, where do yousee going out the rest of 2026?
Do you have plans for other tooling, otherplatforms, other, anything like that?
Like,
we're already working on some of it.
Yeah, we started to work withbigger companies lately realized
that some of them have different CIneeds, you know, things like we got
questions for CircleCI, buildguides,so yeah, we're gonna support them.

(01:09:48):
the Mendral today is, we talkeda lot about GitHub, because I
would say So, based on what wesee from people, it's probably 80%
percent of the demand, at least.
so big still, but do agree that it'sgoing to disappear over time, or at
least reduce, you know, for me, GitHubis more like a protocol nowadays.
because when you look at bringinga GitHub app that replaces GitHub

(01:10:11):
Actions with another, CI system, oreven replace the GitHub action runners,
or bring another review tool on top.
some people migrated to linearinstead of GitHub issues.
So yeah, everyone is grabbing apiece of GitHub and integrating
with GitHub because they have to.
So it became more like an integrationprotocol than anything else.
And so, we have to follow that.
And so we made Mendral CIagnostic by design, from an

(01:10:34):
architecture point of view.
we haven't built all the integrationsyet because there is still a high demand
for GitHub Actions, but it's gonna come.
Yeah, I secretly wish that theywould just open source the, the
non open source parts, the APIessentially for GitHub Actions.
That way we can have local runner, likewe can have, we can do this all locally.
We can still, you know, I feellike at some point the automation

(01:10:55):
engine behind GitHub Actionsis just going to be commodity.
And I'm looking forward to that futurebecause we do have all these rough edges.
It does feel like a verylow level, raw tool to me.
but I, you know, I absolutely love it.
you know, I use it every day.
I make money by selling courses onit and stuff, so I obviously love it.
It's just, it's there's a lot therethat could be improved and, you know,
they would need, like, you know,another hundred engineers on that

(01:11:17):
actions team to move at the pace thatI think it needs right now in AI.
I mean, you talk about this aslike, we need these other tools.
There's like a futurewith these other things.
And even GitHub's, former own CEO hasleft, started a new company, funded
I think 60 million by Microsoft, orat least partly by Microsoft to help
solve this agentic coding problem.

(01:11:37):
They sound like they're going to be alayer on top of GitHub, which again,
makes me feel like GitHub is becominglike a cloud provider in a sense, just
for code, because they already havebeen, but they're just going to be this
thing that we maybe don't touch thatmuch, and that lots of people are using
things on top of it, but don't everactually have to go there because the
AI is the one submitting, you know, Idon't type Git commands anymore, right?
there's so much I don't do, and theonly reason I think I'm even still going

(01:12:00):
to GitHub is because one habit, like Igot a lot of bad old habits to break.
I've got a lot of things that I shouldbe asking my local agent to do or look at
or go find out that I'm manually going,doing, and doing, and I don't know why.
feels like I need to sometimes justbreak myself the habit and see how
far I can go in the day without everactually going to the GitHub website.
Um, because you're right, like theGitHub CLI is doing a lot more.

(01:12:23):
And now, I mean, even now wehave the Gmail, CLI like a Google
Workspace launched recently, a CLI.
so you can access all of these workspacetools, Google Docs, Google Gmail, Google
Drive and all that stuff from a CLI.
So yeah, that, that's awesome.
the CLI future, I feel like isstrong and I'm here for all of that,
'cause one of my favorite things touse go for is to buy, to make CLIs
for solving my own little problems.

(01:12:44):
And I've already got, like mostof us, I think probably half dozen
local projects that I'm just makingCLIs for my own use to feed back
to the AI to do things for me.
So it's just a fun time.
We could talk forever.
I love talking to you about this platform.
but a couple of quickfires for the audience.
I think I'm gonna need to startputting these in my show just to start
asking engineers that are on the show.
current favorite harness.

(01:13:06):
Oh, I use Claude Code a lot actually for,
even more than code.
Yeah.
I start to customize it forautomating some of my non
technical work too, actually,
Okay.
Are you into, like the new Dispatchand the Computer Use yet, is that
like for personal stuff or forthings that aren't necessarily code?
Are you leaning into some ofthe cloud stuff with Claude?

(01:13:27):
so usually I use a combinationof, a couple of things.
I, I like a lot of the Anthropic productsand so, I use Claude Code, the CLI,
uh,
locally for most of the.
The code, I started to automate, a lotof my code boring work, even with Claude
Code and some integrations locally.
And so I maintain like a catalog ofskills, uh, of personal skills that

(01:13:48):
I used to automate some of my job.
The problem with Claude Code isthat very specific to code also.
So I'm aware of that and so sometimes Igo to Claude Desktop, which I think has
a lot of, you know, a way of managingthe context that it's entirely different.
The way it manages memory andcontext is very different.
So I like Claude Desktop.

(01:14:08):
I do not like Claude Cowork, Actually,
That's some of the things thatpeople start talking about and
like, I do not like it because Ithink it's not as advanced as Claude
Code in terms of context managementand multi agents and all of that.
I think it will get there eventually.
Yeah,
and then I started latelyand that, one might make you
laugh, but it's kind of weird.

(01:14:30):
started to use the Claude Code, iOS app.
the cloud app, has aCode, tab or something.
you have a kind of a tiny versionof Claude Code inside your
mobile, and it's using sandboxes.
And so I started to use that to makesome very simple PRs on the repo.
use that only for very simple stuff,you know, like, oh, I need to update

(01:14:51):
that on the landing page or something.
I do that from my phone.
And that's kind of scary because itworks quite well for simple stuff.
So it's almost like a preview of whatwe're going to be able to do tomorrow, you
know, like almost like talk to your phoneor something and work happens and code
gets pushed to progress.
yeah.
so,

(01:15:11):
I feel like we're so close to, youknow, the Tony Stark Iron Man, Jarvis.
feel like we're getting soclose, at least for developers.
Like, I feel like we're thefirst wave of really the people
that are onboarding with this.
I went and saw last night
For the third time in two weeks,I went and saw the movie Project
Hail Mary, because I read the book.
My wife and I loved it.
We're big Andy Weir fans, and thisnew movie is amazing and perfect,

(01:15:35):
and Ryan Gosling is fantastic.
And we sat next to some people thatwere I'm really, really into it.
And so we started talking the fact.
the movie theater.
the kind of person thatloves a full movie theater.
think the reason you go to movies isto the experience with other people.
So the guy next to me, leans over at theend and says, we started talking about the
movie and how it's a positive experience.
view of the future where the world'sactually collaborating and working

(01:15:56):
together to solve esoteric problems.
And he said that there's thissubreddit he's a fan of called,
I think it's called Humanity.
Hell yeah.
Or Humans, hell Yeah.
Or something like that.
That's like a post war Look at the futureof civilization where we all tend to agree
on that and we're solving bigger problems,we're dealing with aliens or whatever,
and then we all just, hold hands.
And so we start talking back and forthand I get home and I'm walking the dog

(01:16:19):
want to know more about this subreddit.
So I'm in the OpenAI app, andthe ChatGPT app essentially.
And I'm having a a speakerconversation in the audio mode.
And I'm just talking back and forth withthe AI and I'm living in a little bit of
an urban area, so there's other peoplewalking their dogs and I'm realizing that
people around me are hearing someone.
I, I sound like the old guy onspeakerphone walking around talking

(01:16:42):
to someone on speakerphone, But I'mactually just talking to an AI about a
subreddit and sci fi, and I'm realizingthat I'm walking my dog and I'm way
more advanced than everyone around me.
I'm like doing things that are sci filevel, future where I'm talking to a
robot, but also I look like grandpa,talking to someone on a phone, on
speakerphone, that's way too loud.
It's like really weird scenario.

(01:17:03):
So I feel the same way.
Like think these things are all, happeningso fast and it's very cool to do.
Um, okay.
last question.
I'm assuming you're using Opus a lot.
Is that your, uh, your, Opus 4.
6 person or are you open 4.
5 person?
know some people that didn't jump into 4.
6.
I like Opus 4, 6.
But, actually not using it for everything.
I'm a big believer of, theright model for the right task.

(01:17:24):
Mendral also is built this way.
We use Anthropic models today,and we're going to move, multi
models, multi providers, soon too.
I believe there models that arereally good for certain things.
Opus is really good for complexreasoning, but it's also very slow.
And sometimes you don't, need to,like, if you summarize an email or
something, or like if you want torewrite some things that you wrote.

(01:17:47):
so yeah.
but I'd say if I have to pick one,yes, that would be my go to, but
yeah, usually I multi modalbased on what I'm doing.
Nice.
thank you so much, Sam, for being here.
We could talk for another hour.
People are already going to go, Bret, likeyou can't keep having these multi hour
podcast episodes, but I feel like there'sso much to talk about and it's great to

(01:18:08):
actually talk to not just founders, butengineers in the thick of it that are
also living and drinking the, you know,the Silicon Valley Kool Aid a little bit.
it's great to see the differentlevels or I guess maturity levels of
everyone, that we have on the showand to see where everyone is at.
And always fun to have peopleon the show are ahead of us.
And I feel like, you know, touchingthe AI even closer to what we think is

(01:18:31):
the utopia of this Star Trek future.
I feel like we're in.
So I'm excited that you guys areprogressing so quickly on the
product and I'm looking forward tousing more of it, especially since
it's scratch is a niche for me.
I'm looking forward to having you backon the show maybe later this year and
talking through some of the advancements.
We'll probably have new models by then.
Supposedly Claude's going to have thisamazing new model later this year.

(01:18:52):
We'll see see whether it lives up tothe height, but, it'll be interesting
to see what you can do with someof the new stuff that's coming out.
Yeah, thanks.
a lot for the opportunity to shareall of that and very, very cool.
Didn't feel like a time at all.
I'm glad.
I'm glad.
Uh, so yeah, you can find Mendralat mendral.com, M-E-N-D-R-A l.com.
well thanks again, man.
Thanks for joining us, and I'llsee you in the next episode.

All Episodes

Episode Transcript

Popular Podcasts

Dateline NBC

The Breakfast Club

Stuff You Should Know

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}Can AI Agents Safely Become DevOps Engineers?

Episode Transcript

Popular Podcasts

.css-r6mb8g{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:1;overflow:hidden;}Dateline NBC

The Breakfast Club

Stuff You Should Know

All Episodes

Can AI Agents Safely Become DevOps Engineers?

Dateline NBC