#19 No Capes Required: How Kroger’s Captain America Team Got Superpowers

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
(00:06):
Imagine this, you push out a shiny newapp update to thousands of your users.
Only to have your support desk floodedwith 700 tickets in a week about
your app being an absolute mess.
It sounds like a nightmare, right?
In fact, it actually happened at Kroger,America's largest supermarket chain.

(00:30):
Now, picture theturnaround within a month.
Those weekly support ticketswere slashed from 700 to seven.
How?
Okay.
Kroger flipped the script by embracingan AI powered observability platform
that cut through the chaos and pinpointedissues before customers even noticed

(00:53):
that led to fewer panicked calls,happier developers, and a smoother
experience for millions of shoppers.
If that feels like tech magicstick around today, we're unpacking
how Dynatrace made it a reality.
So welcome back to Thriving InAmbiguity, the podcast where I break

(01:15):
down the complexities of technology,ideally without losing our minds.
And I'm your host, Steve Mancini,and in the last episode I dove into
infrastructure observability, whichis the foundation of proactive IT
operations and saw how AI and automationrestored control to ops teams.

(01:37):
This week in episode two of ourDynatrace series, we're leveling
it up to application observability.
We will explore how to keep yourapplications running smoothly
with less operational effort.
We'll cut through the noise andthe complexity that typically bog
down your developers and your ITteams, and ultimately allow them

(02:00):
to eliminate firefighting so yourteams can focus on innovation.
Now, today's conversationcenters on a real world story
from Kroger, the grocery Giant.
They underwent a massivedigital transformation.
Moving into a complex hybrid,multi-cloud environment.

(02:21):
With that complexity came aflood of monitoring challenges.
They had multiple tools, theyhad siloed data, and it led
to consistent war room calls.
We'll hear directly from J Cotton, theperformance engineer lead at Kroger
about how his team tamed that complexity.
I'll play selected clips of aninterview Dynatrace did with Jay in

(02:45):
two separate segments, and after eachone, I'll break down the insights
with some context from my ownexperiences as a former IT director.
By the end of the episode,you'll understand how Dynatrace's
application observability.
With features like Pure Path DavisAI automatic, um, instrumentation

(03:07):
via one agent and the smartscapetopology model can turn monitoring
mayhem into a catalyst for innovation.
So grab a coffee or a RedBull and find a comfy spot.
And let's dive in.
Let's set the stage byhearing directly from Jay.
My name's Jay Cotton.

(03:27):
I am the lead application performancetech lead for Kroger and Kroger Company.
Kroger, about five years ago, went ona very aggressive approach to change
our digital presence of the world.
We had very little, almost none, andit was a essentially a rebirth for us.
We had to start from scratch.
We took the steps forward to essentiallyreinvent ourselves Through that

(03:48):
reinvention process, there came alot of changes and a lot of things
that we hadn't really thought ofwe needed to do to move forward.
And that's when I was brought inand when we started investigating
additional monitoring tools in differentways that we could actually put
things inside of our infrastructure.
So we are affectionately knownas the Captain America team.
It's very important to buildbridges inside the IT community.
Far too often there arebridges that are broken down.

(04:11):
And your network team, say one thingand your database team say another
thing and nobody's talking the samelanguage, and you've got developers
and infighting and nothing isworking and nothing is going well.
The main goal of the Captain AmericaTruth and Justice team was to bring
that common linko to the teams to makesure everybody was talking together.
When we got to resolutions ofissues faster, we wanted to
make sure we looked at all theproducts that were in the space.
Dynatrace was a clear winner.

(04:32):
The whole idea between ai, the AIoperations, being able to alert
teams automatically, not having toset up the alerts, not having to
do the things that we need to do.
By having to maintain 15 different kindsof agents, depending on technologies.
When I'm on a monitoring team andwe're trying to make sure we can
monitor the new technologies, now Ihave 16 different kinds of agents to
monitor all the different technologies.
With Dynatrace, I just have one agent.

(04:53):
With new oncoming technologies,you need to keep developers engaged
and developers doing new things.
The second that you make them stop,I have to figure out a solution and
figure out a way to monitor whatthey're building and what they're doing.
It slows them down.
Dynatrace has helped us move forward fast.
Captain America Truth and Justice team.
How awesome is that?
I love when you see creativity in ITteams, but think about what he said.

(05:17):
16 different tools just to monitorone company's apps and infrastructure.
They actually needed to besuperheroes to deal with that.
I mean, I've lived thatcomplexity in past roles.
One team watching infrastructure tools,networking team monitoring their world.
Another team tracing apps andanother, I mean, come on, nobody

(05:41):
agreeing on the source of truth.
It's the perfect recipe for fingerpointing and slow response times.
And Jay's team saw that clearly theirsolution was to consolidate onto a single
platform, which would be Dynatrace.
So, so that everyone from the developersto the ops teams to the DBAs could finally

(06:05):
work off the same data and insights.
And I loved how they did theirhomework in the observability space.
I mean, last week Dynatrace was namednumber one in the magic quadrant
for observability for the 15thstraight year, but, but what does
that actually mean for customers?
Honestly, it's a great starting pointin the research, looking at Gartner's

(06:30):
research and then bringing in the topchoices and say, okay, now prove it to me.
Prove it in my environment.
I really can't stress that enough.
Everybody's environments are different,so have vendors prove it to you
in your environment, and that'sexactly what Kroger saw firsthand.

(06:52):
Now, in Kroger's case, the Dynatrace'sone agent technology was key.
Okay.
One agent automatically instrumentedthe entire stack replacing those
16 separate monitoring agentsand eliminating the blind spots.
See, when you deploy one agent,there's no manual configuration

(07:12):
or tagging that's needed.
It auto discovers everything in real time.
And the results is what Jaycalled one Clear view across
their entire environment, acommon language for all teams.
Now adopting a new platform is, isonly worth it if it delivers results.

(07:33):
So what happened afterKroger rolled out Dynatrace?
Well, here's Jay describing theimpact In an enablement team,
now our team is two people andwe're staying up on top of work.
We don't have any, we'renot working late hours.
We're able to do what we need to do.
We're able to provide that value.
I don't have to be involved.
The other teams don't have to be involved.

(07:53):
It's seamless, it's superdocumented, and it takes teams
less than 30 seconds to fix.
So it's fantastic and the teamsare allowed to self-serve.
It's been absolutely smooth, andmy job's gotten so much better
and so much more relaxing.
We brought in Dynatraceone for a premium offering.
I now do onboarding.
I do a half an hour.
I sit down with the team.
I tell 'em aboutfunctionality capabilities.
I tell 'em what's out and availableand show 'em what their code looks

(08:15):
like and how their code is runningand how different things are there.
And there's this premium support offering.
There's the live chat.
When you have questions,go talk to the live chat.
They're there to help and they havethese one hour coaching sessions
and they come out of them withtheir foot on the gas and the go.
It enables us to go and enables us tomove primarily hybrid cloud right now.
But it's gonna be multi hybrid cloud.
So we're going, we're talking about Azure,we're talking about GCP, we're talking

(08:37):
about keeping things on prem becauseyou're gonna have to keep things on prem.
And being able to utilize that new AItwo engine and see what it provides
and see how the extra stuff gets alittle bit deeper, some of the custom
metrics, some of the networking metrics,being able to marry all that stuff
together into a single platform and getto closer to a single pane of glass.
It will be fantastic and it will.
It will absolutely proliferatethroughout the entire environment

(09:00):
and the entire organization.
It'll be huge.
There's a lot to unpack in that onestatement, and it hits on our key themes.
First, full stack observabilityin one clear view.
That's huge.
It means Dynatrace is capturingeverything from the end user experience
in a mobile application throughall of the service calls and cloud

(09:24):
functions, behind the scenes, downto the CPU in memory on the hosts.
Having that end-to-end visibility in asingle pane of glass is a game changer.
No more jumping between disparatedashboards, trying to correlate
an app slowness with a spikein say database latency.

(09:45):
It's all correlated for you.
Second Jay mentions Dynatrace'sAI delivering precise answers
about performance anomalies.
This refers to Dynatrace's AIintelligence Davis ai instead of
flooding your inboxes with 500alerts, which nobody needs, Dynatrace

(10:07):
uses AI assistance to automaticallyeliminate false positives and pinpoint
the root cause behind a problem.
In other words, it doesn't justsay, Hey, there's a high error rate
on service A and high CPU on hostB leaving you to play detective.
It goes way further and it tellsyou service A is slow because

(10:30):
host BCPU is maxed out due toprocess X all in one alert.
So as Jay experienced, that level ofprecision is what lets you resolve issues
faster and avoid firefighting all day.
And in fact, Dynatrace helpedKroger cut the number of support
tickets after that new app launch.

(10:51):
Yeah, like I said earlier, from 700 tojust seven per week, within a month,
fewer tickets and alerts mean lessnoise for the team, which means more
time to focus on what really matters.
And I love this quote, Jay says,Dynatrace has given us back so much time.

(11:15):
We're now able to truly focus oninnovation and doing cool things.
Instead of worrying about monitoringand finding the root cause of
performance problems and get this,he even noted that personally.
Dynatrace improved his worklife balance with the platform,
proactively catching issues.

(11:36):
Jay isn't getting dragged into those2:00 AM war room calls anymore.
That's a benefit.
You just can't put a price on.
I mean, I remember being on, beingthe on call guy and I remember one
night having to rebuild an applicationserver and it was like midnight.
And the Progress bar said itwas going to take at least three

(11:59):
hours, uh, for it to be completed.
So me and one of my engineers, weactually went to a midnight showing
of Clover Field back in 2008.
So at like two 30 in the morning afterhaving our head spinning from that
first person aspect ratio of thatmovie, which, which was really crazy.

(12:20):
Um.
You know, we get back to theoffice and there's still 20
minutes left on the progress bar.
So while it was a memorable night,I would gladly have traded that for
sleep, uh, rather than finishingup at like 4:00 AM only to have to
return to work a few hours later.
That was a crazy movie though.

(12:41):
I don't know if youhaven't seen Cloverfield.
I mean, it's a little bit oldnow, but you should check it out.
So let's summarize thetransformation tool.
Consolidation and clarity replaced aweb monitoring tool and 16 different
agents with one unified platform givingall teams a single source of truth.

(13:04):
This eliminated cross team confusionand shaved off the overhead of
maintaining multiple systems,
noise reduction.
Dynatrace's AI filtered out false alarmsand correlated issues automatically.
So the team only saw actionableproblems during one app launch.

(13:26):
Support tickets dropped 99% from 700to seven per week after Dynatrace was.
In place faster.
Troubleshooting with precision rootcause insights down to code level
anomalies and impacted components.
War room calls became rare andresolution times plummeted.

(13:48):
The team could identify and fix issuesbefore customers felt pain, which also
meant fewer customer found defects.
I mean, that's all amazing.
And ultimately, and this is the point thatI keep trying to get people to understand,
it allows for more time for innovation.
By eliminating routine firefightingdevelopers and IT engineers can get

(14:13):
back significant time to devote toinnovation instead of remediation.
As Jay said, Dynatrace gave thembandwidth to do cool things.
Again, it even freed up personal time.
That they had lost because theyhad to keep fixing problems
overnight and on the weekends.

(14:34):
So not a bad story, right?
Kroger showed it's like a masterclassin making applications run smoothly
with minimal ops effort by leveragingthe right observability approach.
Now, you might be wondering how oneplatform can actually do all of that.
So let's break down the Dynatracemagic that powered these results.

(15:00):
Dynatrace often gets described as anall-in-one observability platform, and for
good reason under the hood, it combinesseveral powerful capabilities that work
in concert, and from my perspective.
Four standout features make applicationobservability essentially hands off for

(15:20):
your team while delivering deep insights.
The first one is one agent.
Automatic instrumentation.
You install a singleagent and you're done.
Dynatrace automatically discoversand instruments every component
of your application ecosystem.
There's no manualconfiguration, no code changes.

(15:42):
One agent injects itself.
Into all of your hosts, containers,services, even serverless functions,
and immediately starts monitoring.
This means whether your app iscalling on AWS Lambda or Kubernetes,
um, or it's a traditional database.
You've got coverage with zero effort.

(16:04):
One agent is the foundation that feedsdata to everything in the platform.
See, this eliminates the needfor multiple monitoring tools.
We're frankensteining things together.
Next up is Smartscape.
It's real time topology mapping.
See, dynatrace's Smartscape technologyautomatically understands how every part

(16:27):
of your tech stack is connected from userclicks and business transactions all the
way down to the underlying infrastructure.
It's like getting a living interactive mapof your entire application environment.
And at a glance you can see what talksto what, which microservices calling

(16:48):
which API, which database that API hits.
Which host it's running on and so forth.
Crucially.
Smartscape is always up to date, nomanual diagrams, so when something
goes wrong, you can drill right downin and see high level services and
even log details in just a few clicks.

(17:11):
And that context is golden.
When you're troubleshooting complexapplications, you never have to play find
a dependency in the middle of an incident.
Next up is Pure Path, which isend-to-end distributed tracing.
Pure Path is Dynatrace's patenteddistributed tracing technology, which
follows each user transaction end to endacross every tier of your application.

(17:36):
With code level visibility, every timea user interacts with your app, say
adding an item to their cart, Dynatracetags that session with unique ID and
traces it through all the services, theprocesses and database calls it touches.
The beauty is that this happens withno manual code instrumentation, pure

(17:58):
path captures, method level timings,the SQL statements, web service calls.
And more automatically for, for even themost modern cloud native architecture,
think Kubernetes microservices,serverless service mesh pure path
still gives you a coherent transactiontrace across it all in practice.

(18:24):
This means when there's a slowdownor an error, you can pinpoint not
just which service is failing, butthe exact line of code, the query
or the external call causing it.
As someone who used to hunt log files forhours to find why that app was broken, I

(18:44):
can't overstate how much time this saves.
It's like having x-ray vision intoyour whole application's behavior.
And then finally, Davis ai.
AI is huge, right?
Well, Davis AI is a causal AI engine.
All that data from one agentSmartscape Pure Path, it all

(19:07):
feeds into Dynatrace's AI engine.
And Davis continuously analyzes billionsof dependencies and metrics in real time.
Its job is to detect anomalies anddetermine the root cause behind system
degradations and failures effectivelydoing all the heavy analysis work that the

(19:28):
whole team of humans might struggle with.
What you get is an AI poweredalert that says, for example,
transactions are slowing down due toa database connection pool issue on
service X complete with the proof.
Stacks traces metrics.

(19:48):
Davis also prioritizes issues by businessimpact, meaning it'll tell you if an
anomaly is critically affecting userexperiences or if it's just a minor blip.
Perhaps most importantly, Davisdramatically reduces noise by
automatically correlating relatedevents and ignoring benign functions.

(20:11):
Um, it frees you up fromlike the alert fatigue.
And in Kroger's case, this AI assistedinsight was pivotal in cutting down
those support tickets and lettingthe team focus on the real problems,
not false alarms or guesswork.
All of these capabilities areunified within the platform.

(20:33):
One UI and one data model.
That's the secret sauce.
Instead of juggling separate tools for aPM infrastructure, monitoring, logging, et
cetera, Dynatrace brings it all togetherso that these features amplify each other.
The One Agent plus smartscapePlus Pure Path plus Davis AI combo

(20:55):
means you get precise answersfrom unified data in context.
That's the key.
As a result, you can drive automationlike self-healing or alert workflows
with confidence and your team spend lesstime stitching together information.
So in essence, Dynatrace'sapplication observability

(21:18):
gives you clarity and control.
You've been missing in achaotic cloud-centric world.
Okay, so the big takeaway hereis that application observability
isn't just a fancy dashboardor a buzzword when done right?
It's about making your life easier.
It's about ensuring that whensomething goes bump in the night, you

(21:42):
already have the answers by the timeyou grab your coffee or Red Bull.
In my case.
Kroger's story showed us how eliminatingcomplexity and noise translates
into real world business value,less downtime, happier customers
and teams that can spend their timeinnovating instead of firefighting.

(22:04):
And I'll tell you, as a former ITdirector, I can tell you there's nothing
more satisfying than seeing your smartestpeople freed up to tackle new projects.
Rather than swatting the same old bugsDynatrace's approach to observability
made that possible by watching theapps for them 24 by seven with a

(22:27):
level of detail and intelligencethat humans alone simply can't match.
Before we wrap, here's alittle food for thought.
Does your team find out about applicationissues before your users do or.
Are you still hearing about problemsfrom customers with complaints or

(22:48):
frantic slack messages at odd hours?
If it's the latter, it might be timeto rethink your observability game.
The tools and the methods wediscuss today can seriously tilt the
balance from reactive to proactive.
Now on the next episode of our Dynatraceseries, we'll shift our focus from

(23:12):
what's happening inside your applicationsto how users are experiencing them.
We'll dive into digital experiencemonitoring everything from real
user monitoring and sessionreplay to synthetic checks.
Um, mobile, web, and even, um,internet of Things applications.
So if you've ever wished you could seeexactly what your users saw before they,

(23:36):
like rage, quit your apps, um, you'll beable to get ahead of those performance
issues before they even happen.
So episode three will be for you.
If you've enjoyed this discussionor it sparks some ideas, do me a
favor, subscribe to the podcast soyou won't miss upcoming episodes,

(23:58):
and I love to hear your feedback.
Feel free to reach out or leave acomment you know how you're handling
application observability in your work.
Have you tamed the alert stormor are you still fighting fires?
Send in your questions or war stories.
I might give you a shout out in afuture episode, and until next time,
stay curious, keep innovating, and asalways, keep thriving in ambiguity.

(24:24):
Thanks for listening.

All Episodes

Episode Transcript

Popular Podcasts

Stuff You Should Know

My Favorite Murder with Karen Kilgariff and Georgia Hardstark

The Joe Rogan Experience

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}#19 No Capes Required: How Kroger’s Captain America Team Got Superpowers

Episode Transcript

Popular Podcasts

.css-r6mb8g{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:1;overflow:hidden;}Stuff You Should Know

My Favorite Murder with Karen Kilgariff and Georgia Hardstark

The Joe Rogan Experience

All Episodes

#19 No Capes Required: How Kroger’s Captain America Team Got Superpowers

Stuff You Should Know