Could AI End Human QA?

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
(00:04):
The original title for this episode was AIKilled the QA Star, which would make more
sense if you knew the eighties lore ofthe very first music video played on MTV.
Um, that's a music television TV channel.
Back, back in the days, how youwatched videos before the internet.
That was in 1981 and it was calledVideo Killed the Radio Star.

(00:25):
But I decided that a Deep cut titlewas too obscure for this conversation.
Yet the question still remains, couldthe increased velocity of shipping
AI generated code cause businessesto leave human based QA behind,
presumably, because we're not gonnahire any more of them, and we don't
want to grow those teams and operationsteams just because we created AI code.

(00:47):
And would we start relying more onproduction observability to detect code
issues that affect user experience?
And that's the theory of today'sguest, Andrew Tunall, the President
and Chief product Officer at Embrace.
They're a mobile observabilityplatform company that I first
met at KubeCon London this year.

(01:07):
Their pitch was that mobile apps wereready for the full observability stack
and that we now have SDKs to let mobiledev teams integrate with the same
tools that we platform engineers andDevOps people and operators have been
building and enjoying for years now.
I can tell you that we don't yet haveexactly a full picture on how AI will

(01:30):
affect those roles, but I can tell youthat business management is being told
that similar to software development,they can expect gains from using AI to
assist or replace operators, testers,build engineers, QA and DevOps.
That's not true, or at least not yet.
But it seems to be an expectation.

(01:51):
And that's not gonna stop us from tryingto integrate LLMs into our jobs more.
So I wanted to hear from observabilityexperts on how they think this
is all going to shake out.
So I hope you enjoyed thisconversation with Andrew of Embrace.
Hello.
Hey, Bret.
How are you doing?
Andrew Tunal is the president andchief product officer of Embrace.

(02:14):
And if you've not heard of Embrace,they are, I think your claim to me the
first time talked was, you're the first.
Mobile focused or mobile onlyobservability company in the
cloud native computing foundation.
Is that a correct statement?
I don't know if in,probably in CNCF, right?
Because like we're, we were certainlythe first company that started going

(02:38):
all in on OpenTelemetry as the meanswith which we published instrumentation.
obviously CNCF has a bunch ofobservability vendors, some of
which do, mobile and web run,but, completely focused on that.
Yeah.
I would say we're the first.
Yeah.
I mean, we can just, we cansay that until it's not true.
the internet will judge us harshly,for whatever claims we make today.
Yeah.

(02:59):
well, and that just meanspeople are listening.
Andrew.
Okay, I told people on the internet thatwe were going to talk about So, the idea
that you gave me that QA is at risk offalling behind because if we're pretty
If we're producing more code, if we'reshipping more code because of AI, even
the pessimistic stuff about, you know,we've seen some studies in the last

(03:20):
quarter around effective use of AI, it isanywhere between negative 20% And positive
30 percent in productivity, dependingon the team and the organization.
So let's assume for a minute it's onthe positive side, and the AI has helped
the team produce more code, ship morereleases, have, even in the perfect world
that I see teams automating everything,there's still usually, and almost always,

(03:45):
in fact, I will say in my experience,100 percent of cases, there is a human at
some point during the deployment process.
Whether that's QA.
Whether that's a PR reviewer,whether that's someone who spun
up the test instances to runit in a staging environment.
There's always something.
So tell me a little bit about wherethis idea came from and what you
think might be a solution to that.

(04:06):
Yeah.
I'll start with the, the belief thatI mean, AI is going to fundamentally
alter the productivity of softwareengineering organizations.
I mean, I think the, CTOs Italk to out there are making
a pretty big bet that it will.
you know, there's today and thenthere's, okay, think about the, even
just the pace that, that AI has evolvedin, in the past, the past year and
you know, what it'll look like giventhe investment that's flowing into it.

(04:29):
But if you start with that, the claimthat AI is going to kill QA is more
about the fact that, we built, Oursoftware development life cycle under
the assumption that software was slow tobuild and relatively expensive to do so.
And so if those start to change,right, like a lot of the, systems
and processes we put around, oursoftware development life cycle,

(04:50):
they probably need to change too.
because ultimately, If you say, okay,we had 10 engineers and formerly we were
going to have to double that number togo build the number of features we want.
And suddenly those engineers become, moreproductive to go build the features and
capabilities you want inside your apps.
I find it really hard to believe thatthose organizations are going to make

(05:11):
the same investment in QA organizationsand automated testing, etc. to keep
pace with that level of development.
The underlying hypothesis was likeproductivity allows them to build
more software cheaper, right?
And like cheaper rarely correlateswith adding more humans to the loop.
yeah, the promise.
Like I make this joke and it's probably.
Not true in most cases, cause it's alittle old, but I used to say things

(05:34):
like, yeah, CIO magazine told them todeploy Kubernetes or, you know, like
whatever, whatever executive VP level C,CIO, CTO, and you're at that chief level.
So I'm making a joke about you, but,
funny.
but that's when you're not inthe engineering ranks and you
maybe have multiple levels ofmanagement, you get that sort of.
overview where you're reading and you'rediscussing things with other suits.

(05:56):
You're reading the suits magazines,the CIO magazines, the uh, IT
pro magazines and all that stuff.
yeah, the point I'm making here isthat, people are being told that AI
is going to save the business money.
I, I think I've said this to severalpodcasts already, but, you weren't here.
So I'm telling you that I was in amedia event in London, KubeCon, and,
they give me a media pass for somereason, because I make a podcast.

(06:18):
So they act like I'm a journaliststanding around all real journalists
and pretending, but I was there andI watched an analyst who's, from my
understanding, I don't actually knowthe company, but they sound like the
Gartner of Europe or something like that.
and the words came out of the mouthat an infrastructure conference,
the person said, my clientsare looking for humanless ops.

(06:39):
And I visibly, I think,chuckled in the room because
I thought, well, that's great.
That's rich that you'reat a 12, 000 ops person.
Conference telling us that yourcompanies want none of this.
these are all humans here doing this work.
the premise of your argument about QAis my exact same thoughts was nobody

(07:00):
is budgeting for more QA or moreoperations personnel or more DevOps
personnel, just because the engineersare able to produce more code with AI,
in the app teams and the feature teams.
And so we've all got to do betterand we've got to figure out where
AI can help us because if we don't.
They're going to, they're justgoing to hire the person that

(07:21):
says they can, even though maybeit's a little bit of a shit show.
my belief is that softwareorganizations need to change.
The way they work for an AI future, sothat might be cultural changes, it might
be role changes, it might be, the, youknow, the word, the words like human in
the loop get tossed around a lot whenit's about engineers interacting with
AI, and the question is, okay, whatdoes that actually look like, right?

(07:44):
Are we just reviewing AI's PRs and,you know, kind of blindly saying, yep,
like they wrote the unit tests forsomething that works, or is it like
we're actually doing critical thinking,critical systems thinking, unique
thinking that allows us as owners ofthe business and our user success?
to design and build bettersoftware with AI as a tool.
and it's not just QA, it's kind oflike all along the software development

(08:07):
life cycle, how do we put the rightpractices in place, and how do we build
an organization that actually allows us,with this new AI driven future, you know,
whether it's agents, doing work on ourbehalf, or whether it's us just with AI
assisted coding, to build better software.
And, yeah, I'm interested in what thatlooks like over the next couple of years.
That's kind of the premise of my,the new Adjetic DevOps podcast.

(08:30):
And also as I'm building out this newGitHub Actions course, I'm realizing that
I'm having to make the best practices.
I'm having to figure out what is riskyand what's not because no one has really
figured this out yet in any great detailand, in fact, at, KubeCon London in April,
which feels like a lifetime ago, therewas only one talk around using AI anywhere

(08:53):
in the DevOps and operations path.
To benefit us.
It was all about how torun infrastructure for AI.
And granted, KubeCon is an infrastructureconference for platform engineering
builders and all that, so it makes sense.
But the fact that really only one talk,and it was a team from Cisco, which
I don't think of Cisco as like thebleeding edge AI company, but it was a

(09:13):
team from Cisco, simply trying to getAnd workflow or maybe you would call it
like an agentic workflow for PR review,which in my case, in that case, I'm
presuming that humans are still writingtheir code and AI is reviewing the code.
I'm actually, I was just yesterday, Iwas on GitHub trying to figure out if
I, if there was a way to make branchrules or, some sort of automation

(09:35):
rule that if the AI, wrote the PR,the AI doesn't get to review the PR
Yeah, yeah, right.
we've got this comingat us from both angles.
We've got AI in our IDEs.
We've got, multiple companies,Devon and GitHub themselves.
They now have the GitHubcopilot code agent.
I have to make sure Iget that term, right.

(09:55):
GitHub copilot code agent,which will write the PR and
the code to solve your issue.
And then they have a PRcode copilot reviewer.
Agent that will review the code.
It's the same models, differentcontext, but, it feels like that
doesn't, that's not a human in the loop.
So we're going to need these guardrailsand these checks in order to make sure

(10:19):
that code didn't end up in productionwith literally no human eyeballs
ever in the path of that code beingcreated, reviewed, tested, and shipped.
cause we can do that now.
Like we did, we
Yeah, totally, you're totally good.
And I mean, you can easily perceive themistakes that could happen too, right?
I mean, before I, I took this role, atEmbrace, I was at New Relic for four and a

(10:39):
half years, and before that I was at AWS.
And so, obviously I've spent alot of time around CloudFormation
templates, Terraform, etc.
You can see a world where, youknow, AI builds your CloudFormation
template for you and selects an EC2instance type because the information
it has about your workload isoptimal for this EC2 instance type.
But in the region you're running,that instance type's not freely

(11:00):
available for you to autoscale to.
And pretty soon, you go tryto provision more instances,
and poof, you hit your cap.
Because, that instance type justdoesn't have availability in Singapore.
And, us as humans and operators, we learna lot about our operating environments, we
learn about our workloads, we learn aboutthe, the character, the peculiarities
of them that don't make sense to acomputer, but are, like, based on reality.

(11:25):
over time, maybe AI gets reallygood at those things, right?
But the question is, how do we buildthe culture into kind of be guiding our,
our, you know, army of assistance tobuild software that really works for our
users, instead of just trusting it to godo the right thing, because we view, you
know, everything as having a true andpure result, which I don't think is true.
a lot of the tech we build isfor people who build consumer

(11:47):
mobile apps and websites.
I mean, that is what thetech we build is for.
And you can easily see, you know, someof our, our engineers have been playing
around with using AI assisted coding toimplement our SDK in a consumer mobile
app that, and it works quite well, right?
You can see situations where, anengineer gets asked by somebody, a
product manager, a leader to say, Hey,you know, we got a note from marketing.

(12:09):
They want to implement this newattribution SDK that's going to go.
build up a profile of our usersand help us build more, you know,
customer friendly experiences.
You have the bots go do it, it teststhe code, everything works just fine.
And then, for some reason that SDK makesa uncached request out to US West 2
for users globally that, you know, foryour users in Southeast Asia instead

(12:33):
of adding an additional six and a halfseconds to app startup because physics.
And, what do those users do?
if you start an app and it's.
sits there hanging for four, five,six seconds, and you don't have to use
the app because it's you know, you'rewaiting for your boarding pass to come
up and you're about to get on the plane.
You probably perceive itas broken and abandoned.

(12:53):
And to me, that's like a reliabilityproblem that requires systems thinking
and cultural design around how you'reengineering organization to avoid.
one that I don't think isimmediately solved with AI.
right.
observability is only getting more complexeven, you know, it's a cat and mouse game.
I think in all, in all regards, and Ithink everything we do has yin yang I'm

(13:13):
watching organizations that are fullsteam ahead, aggressively using ai, and.
I'd love to see some stats.
I don't know if you have empiricalevidence or if you sort of anecdotal stuff
from your clients, but where you see themaccelerate with AI and then almost have a
pulling back effect because they realizehow easy it is for them to ship more bugs.

(13:36):
and then sure, we could have the AIwriting tests, but it can also write
really horrible tests, or it canjust delete tests because it doesn't
like them because they keep failing.
it's that to me multiple times.
In fact, I think, Oh, what's his name?
Gene, not, it wasn't Gene Kim.
There's a, a famous, it might've beenthe guy who created Extreme Programming.
I think I heard him on a podcast talkingabout he wished he could make certain, all

(13:58):
of his test files have to be read only.
Because he writes the tests, and thenhe expects the AI to make them pass,
and the AI will eventually give upand then want to rewrite his tests.
And he doesn't seem to be able to stopthe AI from doing that, other than just
denying, deny, or cancel, cancel, cancel.
And there's a scenario wherethat can easily happen in
the automation of CI and CD.

(14:19):
Where, you know, suddenly it decidedthat the tests failing were okay.
And then it's going to pushit to production anyway, or
whatever craziness ensues.
Do you see stuff happeningon the ground there?
That's
did you read that article about, the,AI driven store, a fake store purveyor
that Anthropic created named Claudius?
That, when it got pushed back fromclo, the Anthropic employees around

(14:42):
what it was stocking, its fake fridge.
Its fake store with it called securityon the employees to try to displace them.
I mean, it's, I think what you'repointing to is like the moral compass of
AI is not necessarily, it, like that'sa very complex thing to solve, right?
Is is this the right thing to do or not?
Because it's trying to solve theproblem however it can solve the
problem with whatever tools it has.

(15:03):
Not whether the problemis the right one to solve.
And right is, I mean, obviously, youknow, even humans struggle with this one.
Right.
I've just been labeling it all as taste.
like the AI lacks taste
Yeah, that's
like in a certain team culture,if you have a bad team culture,
deleting tests might be acceptable.
I used to work with a teamthat would ignore linting.

(15:24):
So I would implement linting.
I'd give them the tools to customizelinting and then they would ignore
it and accept the PR anyway.
And did not care.
They basically followed thelinting rules of the file.
Now they were in a monolithicrepo with thousands and thousands
of a 10 year old Ruby app.
But they would just ignore the linting.
And I always saw that as aculture problem for them.

(15:46):
That I as the consultant can't,I would always tell the boss that
I was working for there that Ican't solve your culture problems.
I'm just a consultant here tryingto help you with your DevOps.
You need a full time person leadingteams to tell the rest of the team
that this is important and that thesethings matter over the course of
years as you change out engineers.
You can't just go with,well, this file is tab.

(16:07):
This file is, spaces.
And this one follows, especiallyin places like Ruby and Python and
whatnot, where there's multiple, sortof style guys and whatnot like that.
But I just call that a tasteissue or culture issue.
AI doesn't have culture or taste.
It.
It just follows the rules we give it.
we're not really good yet asan industry at defining rules.
I think that's actually partof the course I'm creating is

(16:29):
figuring out all the differentplaces where we can put AI rules.
We've seen them put in repos.
We've seen them put inyour UI and your IDEs.
Now I'm trying to figure out how do Iput that into CI and how do I put that
into even like the operations if you'regoing to try to put AIs somewhere in your,
somewhere where they're going to help withtroubleshooting or visibility or discovery

(16:49):
of issues, like they also need to have.
Taste, which I want to change allmy rules files to taste files.
I guess that's my platform.
I'm standing.
Yeah.
Cause it,
Yeah, I mean, at the same time, youprobably don't want to be reviewing,
hundreds of PRs that an AI robot isgoing through, just changing casing of
legacy code to, meet your, to the, Changeobviously introduces the opportunity

(17:11):
for failure, bugs, etc. And if it's,you know, it's somewhat arbitrary
simply because you've given it a newrule that this is what it has to do.
I mean, I had a former co workermany years ago, like 15, 20 years
ago, who, was famous for just, goingthrough periodically and making
nothing but casing code changesbecause that's what he preferred.
And it was just this, endless streamof, trivialized changes on code we

(17:33):
hadn't touched in months or years.
That's it.
that would inevitably lead tosome sort of problem, right?
And because,
The toil of that.
I'm just like, yeah, that you'regiving me heartburn just by
telling me about that story.
yeah, it's, well, I'vegotten over the PTSD of that.
It's a long time ago.
So, okay.
So, what are you seeing on the ground?
do you have some examples ofhow this is manifesting in apps?

(17:57):
I mean, you've already given mea couple of things, but I'm just
curious if you've got some more,
I kind of alluded to itat the very beginning.
obviously, in my role, I talk to a lotof senior executives about directionally
where they're going with their engineeringorganizations, because I think, the
software we build, it does a bunch oftactical things, but at a broader sense,
it allows people to measure reliability asexpressed through, you know, our customers

(18:20):
staying engaged in your experience byvirtue of the fact that they otherwise
liking your software are having a bettertechnical experience with the apps you
build, that the front end and ultimatelymeasuring that and then thinking through
all of the underlying root causes is acultural change in these organizations
and how they think about reliability.
It's no longer just like arethe API endpoints at the edge of

(18:41):
our data center delivering theirpayload, you know, responding in
a timely manner and error free.
And then the user experience is welltested and, you know, we ship it to prod.
And.
That's enough.
it really isn't a shift in how peoplethink about it, but when I talk to them,
I mean, a lot of CTOs really are takinga bet that AI is going to be the way

(19:01):
that productivity gains, I won't say maketheir business more efficient, but they
do allow them to do more with less, right?
You know, like it or not, especiallyover the past few years, I mean, in B2B,
we're in the B2B SaaS business, there'sbeen, you know, times have definitely
been tougher than they were in the early,2020s, for kind of everyone, the, I think
there's a lot of pressure in consumertech with tariffs and everything to do

(19:24):
things more cost effectively, and firstand foremost on the grounds, this is a
change we are seeing, whether we likeit or not, and, you know, we can argue
about whether it's going to work andhow long it'll take, but the fact is
that, like you said, you mentioned thatthe CIO magazines, leadership is taking
a bet this is going to happen, and Ithink as we start to talk about that
with these executives, the question islike, is the existing set of tools I have

(19:47):
to cope with that reality good enough?
And, yeah, I guess my underlyinghypothesis is it probably isn't
for most companies, right?
if you think about the, the world of,you know, a web as an example, like
a lot of, a lot of companies that areconsumer facing tech companies will
measure their core web vitals, inlarge part because it has SEO impact.

(20:08):
And then they'll put a, exception handlerin there, like Sentry that grabs, you
know, kind of JavaScript errors andtries to give you some level of impact.
around, you know, how many are impactingyour users and whether it's high
severity and then you kind of have tosort through them to figure out which
ones really matter for you to solve.
so take, existing user frustrationwith, human efficiency of delivering

(20:31):
code and the pace, the existing pace.
users are already frustrated thatCore Web Vitals are hard for them to
quantify what that really means interms of user impact and whether a user
decides to stay on the site or not.
so, yeah.
and the fact that they're overwhelmed withthe number of JavaScript errors that could
be out there because it's really, I mean,you go to any site, go to developer tools.
look at the number of JavaScript errorsyou see, and then, you know, take your

(20:53):
human experienced, idea of how you'reinteracting with the site, and chances
are most of those don't impact you, right?
They're just like, it's a, it's ananalytics pixel that failed to load, or
it's a library that's just barfing someerror, but it's otherwise working fine.
So take that and put it on steroids nowwhere you have a bunch of AI assisted
developers doubling or tripling thenumber of things they're doing or

(21:15):
just driving more experimentation,right, which I think a lot of
businesses have always wanted to do.
But again, software has been kindof slow and expensive to build.
And so If it's slow and expensiveto build, my thirst for delivering
an experiment across three customercohorts when I can only, deliver one,
given my budget, just means that, Ionly have to test for one variation.

(21:36):
We'll now triple that, or quadrupleit, or, and, you know, multiply that
by a number of, arbitrary user factors.
It just gets more challenging,and I think we need to think about
how we measure things differently.
once the team has the tools in place tomanage multiple experiments at the same
time, that just ramps up exponentiallyuntil the team can't handle it anymore.
But if AI is giving them a chanceto go further, then yeah, they're

(21:58):
just, they're going to do it.
I mean,
yeah, I mean, you're going to getoverwhelmed with support tickets
and bad app reviews and whatever itis, which I think most people are.
Most business leaders would bepretty upset if that's how they, are
responding to reliability issues.
I was just gonna say, we've already haddecades now of pressure at all levels
of engineering to reduce personnel.

(22:18):
you need to justify every new hire prettysignificantly unless you're funded and
you're just, you know, in an early stagestartup and they're just growing to the
point that they can burn all their cash.
having been around 30 years intech, I've watched operations
get, you know, Merged into DevOps.
I've watched DevOps teams get, whichwe didn't traditionally call them that.
We might have called them sysadminsor automation or build engineers

(22:40):
or CI engineers, and they getmerged into the teams themselves.
And the teams have to takeon that responsibility.
I mean, We've got this weird culturethat we go to this Kubernetes conference
all the time and one of the biggestcomplaints is devs who don't want to
be operators, but it's saddled on thembecause somehow the industry got the
word DevOps confused with Something.
And we all thought, Oh, thatmeans the developers can do ops.

(23:03):
That's not what the word meant, but we've,we've worked, we've gotten to this world
where I'm getting hired as a consultantto help teams deal with just the sheer
amount of ridiculous expectations asingle engineer is supposed to have,
that not the knowledge you're supposedto have, the systems are supposed to
be able to run while making features.
And it's already, I feel like adecade ago, it felt unsustainable.

(23:26):
So now here we are having togive some of that work to AI.
When it's still doing randomhallucinations on a daily basis,
at least even in the best models.
I think I was just ranting yesterdayon a podcast that SWE bench, which is
like a engineering benchmark websitefor AI models and how well they
solve GitHub issues, essentially,and the best models in the world.

(23:49):
Can barely get two thirds of them.
Correct.
And that's, if you're paying thepremium bucks and you've got the premium
foundational models and you are on thebleeding edge stuff, which most teams
are not because they have rules orlimitations on which model they can use,
or they can only use the ones in house,or they can only use a particular one.
and it's just, it's one ofthose things where I feel like

(24:10):
we're being pushed at all sides.
and at some point, it's amazingthat any of this even works.
It's amazing that appsactually load on phones.
it's just, it just, it feels likegarbage on top of garbage, on top
of garbage, turtles all the waydown, whatever you want to call it.
So where are you coming in here tohelp solve some of these problems?
yeah, that's fair, yeah.
I'll even add to that, All of that'seven discounting the fact that new tools

(24:33):
are coming that make it even simplerto push out software that like barely
works without even the guided handsof a software engineer who has any
professional experience writing that code.
some of the vibe coding tools, likeour product management team uses,
largely for rapid prototyping.
And I can write OK Python.
I used to write OK C sharp.

(24:55):
I have never been particularlygood at writing JavaScript.
I can read it OK, but like whenit comes to fixing a particular
problem, quickly get out of my depth.
That's not to say, I couldn'tbe capable of doing it.
It's just not what I do every day,nor do, I particularly have the energy
when I'm, you know, done with a fullworkday to, go teach myself JavaScript.
And, I'll build an app withone of the Vibe coding tools

(25:15):
as a means of communicatinghow I expect something to work.
So if you have any questions, feelfree to reach out to me, and I'll
be happy to answer them, and I'llbe excited to hear back from you.
And for coming.
and I'm like, ah, it's betterto just delete the project

(25:38):
and start all over again.
And, you know, if you can makeit work, that doesn't necessarily
mean it'll work at scale.
It doesn't necessarily mean that therearen't like a myriad of use cases
you haven't tested for as you clickthrough the app in the simulator.
and so, you know, I think thequestion is, okay, you know, given
the fact that we have to accept moresoftware is going to make its way
into human beings hands, because webuild software for human beings, it's

(26:01):
going to get to more human beings.
How do we build a, a reliabilityparadigm where we can measure
whether or not it's working?
And I think that stops focusingon, kind of, I guess to go back
to the intentionally inflammatorytitle of today's discussion, it
stops focusing on like a zerobug paradigm where I test things.
I test every possiblepathway for my users.

(26:22):
I, you know, have a set ofrequirements, again, around
trivialized performance and stuff.
And, trying to put up these kind ofbarriers to getting code into human hands.
And I just accept the fact more codeis going to get into human hands faster
at a pace I can't possibly control.
And so, therefore, I have to putmeasurements in the real world.
So, I use a lot of different tools aroundmy app so that I can be as responsive

(26:44):
as possible to resolving those issueswhen I find them, which is I guess my,
you know, charity majors who co foundedHoneycomb, she, her and I were talking a
few months ago and big fan of stickers,she shipped me an entire envelope of
stickers and they're all like, youknow, ship fast and break things, right?
I tested production, stuff like that.
And somehow I feel like, you know,in our world, because we build

(27:05):
observability for front end and mobileexperiences, like web and mobile
experiences, I feel like that messagejust hadn't gotten through historically.
part of it's because like releasecycles on mobile were really slow.
you had to wait days foran app to get out there.
Part of it was software is expensiveto build and slow to build and so
getting feature flags out there whereyou can operate in production was hard.

(27:26):
Part of it was just the observabilityparadigm hadn't shifted, right?
Like the, the paradigm of measureeverything and then find root cause
had not made its way to frontend.
It was more like measure the knownthings you look for, like web exceptions,
like core web vitals are on mobile,look at crashes, and that's about it.
And the notion of, okay, measurewhether, users are starting the app

(27:48):
successfully, and when you see someunknown outcomes start to occur or users
start to abandon, how can you then siftthrough the data to find the root cause?
Hadn't really migrated its way.
And that's what we're trying to do.
we're trying to bring that paradigmof how do you define the A, for
lack of a better term, the APIs.
The things that your humans interactwith, with your apps, the things they
do, you don't build an API in your appfor human beings, you build a login

(28:10):
screen, you build a checkout screen,a cart experience, a product catalog.
How do we take those things and measurethe success of them and then try to,
attribute them to underlying technicalcauses where your teams can have, better
have those socio technical conversations?
so that they can understand and thenresolve them, probably using AI, right?
as we grow.
but like it, it allows better,system knowledge and the interplay

(28:33):
between real human activities andthe telemetry we're gathering.
Yeah.
Have you been, I'm just curious,have you been playing with, having
AI look at observability data,whether it's logs or metrics?
have you had any experience with that?
I'm asking that simply as a genericquestion, because the conversations

(28:54):
I've had in the last few months, itsounds like AI is much better at reading
logs than it is at reading metrics ordashboards or anything that sort of
lacks context or, you know, it's notlike we're putting in alt image messages
for every single dashboard graph.
And that's probably all coming fromthe providers just because if if

(29:15):
they're expecting AI to look at stuff.
they're gonna have to give more context,but it sounds like it's not as easy
as just giving AI access to all thosesystems and saying, yeah, go read the
website for my dashboard, Grafana,and figure out what's the problem?
I've seen it deployed in two ways, oneof which I find really interesting and
something we're actively working on,because I think it's just a high degree

(29:36):
of utility and it's, You know, giventhe kind of state of LLMs, I think it's
probably something that's relatively easyto get right, which is the, the notion of,
when you come into a product like ours,or a Grafana, or, you know, a New Relic,
you probably have an objective in mind,maybe you have a question you're trying
to ask, what's the health of this service?
Or, I got page on a particular issue.

(29:58):
I'm going to, I need to build a chartthat shows me like the interplay between
latency for this particular serviceand the success rate of, you know,
some other type of thing, more likedatabase calls or something like that.
Today, like people broadly have tomanually create those queries and it
requires a lot of human knowledge aroundthe query language or schema of your data.
And I think there's a ton of opportunityfor us to simply ask a human question

(30:21):
of You know, show me a query of allactive sessions on this mobile app for
the latest iOS version and the numberof, traces, like startup traces that
took greater than a second and a half.
And have it just simply pull up yourdashboard and query language and build
the chart for you quite rapidly, whichis a massive time savings, right?
And.
It also just makes our tech, whichyou're right, can get quite complex, more

(30:46):
approachable by your average engineer.
Which, you know, I'm a big believer thatif every engineer in your organization
understands how your systems work.
And the data around it, you're goingto build a lot better software,
especially as they use AI, right?
Because now they, they understandhow things work and they can better
provide instructions to the robots.
so I think that's reallya useful, interesting way.
And we've seen people start to roll thattype of assistant act, functionality out.

(31:10):
The second way I've seen deployed, Isee mixed results, which is I mean, an
incident, go look at every potentialsignal that I can see related to this
incident and try to tell me what'sgoing on and get to the root cause, and
more often than not, I find it's just asummarization of stuff that you, as an
experienced user, probably would come tothe exact conclusions on, I think there's
utility there, certainly, it gets you awritten summary quickly of what you see.

(31:33):
But I do also worry that, it doesn'tapply a high degree of critical thinking.
and you know, an example of where lackingcontext, I mean, it wouldn't be very
smart, right, is you've probably seenit, every traffic chart, around service
traffic, depending upon how it runs,tends to be pretty lumpy with time of day.
Because Most companies don'thave equivalent distribution

(31:57):
of traffic across the globe.
Not every country across the globehas an equivalent population.
And so you'd tend to see these spikesof where, you know, you have a number
of service requests spiking duringdaylight hours or the Monday of every
week because people come into the officeand suddenly start e commerce shopping.
And you see it taper throughout theweek, or you taper into the evening.

(32:19):
I think that's normal.
You understand that as anoperator of your services, because
it's unique to your business.
I think the AI would struggle, lackingcontext around, your business to
understand, somewhat normal fluctuations,or the fact that, You know, a marketing
dropped a campaign where there's nodata inside your observability system
to tell you that that campaign dropped.
It's not a release.

(32:39):
see your AI is lacking context.
It's lacking the historical contextthat the humans already have implicitly.
Yeah,
and I mean that context might be in aSlack channel where marketing said we
just dropped an email and you know anemail drop so expect an increased number
of requests to this endpoint as you knowpeople retrieve their, their special
offer token or whatever that will allowthem to use it in our checkout flow.

(33:02):
today like just giving, if we providedthat scope to a, an AI model within our
system we would lock that type of context.
Yeah, if I'm not creating any ofthese apps, I'm sure they've already
thought of all of this, but, thefirst thing that comes to mind is
well, the hack for me would be giveit access to our ops slack room,
Right.
we're probably all having thoseconversations of oh, What's going on here?

(33:23):
And someone in someone's who hadhappened to get reached out to for
marketing was like, well, you know,yesterday we did send out a, you
know, a new coupon sale or whatever.
So, yeah, having it read all thatstuff might be, necessary for it to
understand the, because you're right.
it's not like we have a dashboard inGrafana that's number of marketing
email, you know, the email sent per dayor the level of sale we're expecting,

(33:46):
based on in America, it's the 4th ofJuly sale, or, you know, some holiday
in a certain region of the world.
Or a social media influencer droppinglike some sort of link to your
product that suddenly, you know, thatit's a new green doll that people
attach to their designer handbags.
I don't know like anything about whatthe kids are into these days, but
it seems kind of arbitrary and likeI, I would struggle to predict that.

(34:09):
Let me put it that way.
they're into everything old.
So it's into everything that I'm into.
Yes.
all right.
So if we're talking about this at ahigh level, we talked a little bit
about before the show around howembrace is thinking about, observability
and particularly on mobile, butyou know, anything front end there.
the tooling ecosystem for, engineerson web and mobile is pretty rich.

(34:34):
but they all tend to be just likehammers for a particular nail, right?
It's you know, How do we give you abetter craft reporter, better exception
handler, how do we go measure X or Y?
some of the stuff that we're thinkingabout, which is really how we define
the objective of OB observabilityfor, the coming digital age, right?
Which is you know, ascreators of user experiences.

(34:54):
I think my opinion is that weshouldn't just be measuring
like crash rate on an app.
We should be measuring, are usersstaying engaged with our experience?
And when we see they are not, sometimes,crashes, I mean, the obvious, the
answer is obviously they can't, right?
Because the app explodes.
But, I think, you know, I wastalking to a senior executive at

(35:16):
a, a massive food delivery app.
And it's listen, we know, anecdotally,there's more than just crashes that make
our users throw their phone at the wall.
you're trying to You're trying toorder lunch at noon and something's
really slow or you keep running intoa, just a validation error because we
shipped you an experiment thinking itworked and you can't order the item
you want on the two for one promotion.

(35:37):
you're enraged because you really wantthe, you know, the spicy dry fried chicken
hangry.
and you want two of them because Iwant to eat the other one tonight.
and you've already suckered meinto that offer, you've convinced
me I want it, and now I'm havingtrouble, completing my objective.
And broadly speaking, the observabilityecosystem on the front end really
hasn't measured that, right?

(35:58):
We've used all sorts of proxymeasurements in, out of the data
center because the reliability storyhas been really well told and evolved.
Over the past 10 to 15 years in the datacenter world, but it just really hasn't
materially evolved in the front end.
And so, a lot of that's like shiftingthe objective from how do I just measure
counts of things I already know are bad tomeasuring what users engagement looks like

(36:21):
and whether I can attribute that to changein my software or defects I've introduced.
So, that's kind of the take.
just about anybody who has ever builta consumer mobile app has Firebase
Crashlytics in the app, which isa free service provided by Google.
It was a company a long time ago that.
Crashlytics that got bought by Twitterand then got reacquired by Google.
it basically gives you rough cutperformance metrics and crash reporting.

(36:44):
Right.
I would consider this like thefoundational requirement of any level
of, app quality, but to call this NSAIDapp quality, I think, you know, our
opinion is that would be a misnomer.
So we're going to go kind of gothrough what this looks like, right.
Which is it's giving you things.
You would expect to see, a number ofevents that are crashes, et cetera, and,

(37:04):
you can do what you would expect here,crashes are bad, so I need to solve a
crash, so I'm going to go into a crash,view stack traces, get the information I
need to actually be able to resolve it.
and, you know, I think we see a lotof customers before we talk to them
who are just like, well, I have crashreporting and I have QA, that's enough.

(37:25):
there's a lot of productsthat have other features.
So like Core Web Vital measurementson a page level, this is Sentry.
It's a lot of data, but I don'treally know what to do with this.
beyond, okay, it probablyhas some SEO impact.
There's a, you know, Core, bad CoreWeb Vital or slow, you know, slow
something for a render on this page.
How do I actually gofigure out root cause?

(37:49):
But again, right, this is a single signal.
So this is kind of, you don't know that,whether or not the P75 core web vital here
that is, considered scored badly by Googleis actually causing your users to bounce.
And I think that's important because Iwas reading this article the other day
on this like notion of a performanceplateau, like there's empirical science

(38:09):
proving that like faster core webvitals, especially with like content
paint and interaction to next paint.
et cetera, improve bounce ratematerially, like people are less likely
to bounce if the page loads really fast.
But at some point, like if it's longenough, there's this massive long
tail of people who just have a rottenexperience and you kind of have to
figure out, I can't make everyoneglobally have a great experience.

(38:33):
Where's this plateau where, I knowthat I'm improving the experience for
people who I'm likely to retain andimprove their bounce rate versus I'm
just, you know, going to live with this.
And so we kind of have a differenttake, which is like we wanted to
center our experience less on justindividual signals, and more on
like these flows, these tasks thatusers are performing in your app.

(38:53):
So if you think about the key flows,like I'm breaking these down into
the types of activities that Iactually built for my end users.
And I want to say, okay, how manyof them were successful versus
how many ended in an error, likesomething went truly bad, right?
You just could not proceed.
Versus how many abandoned, andwhen they abandoned, why, right?

(39:14):
Did they abandon because they, theyclicked on a product catalog screen, they
saw some stuff that they didn't like?
Or did they abandon because the productcatalog was so slow to load, and images
slow, like slow, so slow to hydrate?
That they perceived it asbroken, lost interest in the
experience and ended up leaving.
And so, the way you do that is youbasically take the telemetry we're

(39:38):
emitting from the app, the exhaustwe collect by default, and you create
these start and end events that allowyou to then, we post process the data.
We go through all of these sessionswe're collecting, which is basically
a play by play of like linearevents that users went through.
And we hydrate the flow to tellyou where people are dropping off.
and so you can see like their actualcompletion rates over time, you know,

(40:00):
obviously it's a test app, so there'snot a ton of data there, but what gets
really cool is we start to, we start tobuild out this notion of once you see.
the, issues happen, well how can I nowgo look at all of the various attributes
of those populations under the hoodto try to specify which of the things

(40:21):
are most likely to be attributed tothe population suffering the issue?
So that could be an experiment.
it could be a particular mobile,a particular version they're on.
It could be, an OS version, right?
You just shipped an experiment thatisn't supported in older OSs, and those
users start having a bad experience.
And then each of those gets you down towhat we call, this, user play by play

(40:45):
session timeline, where you basicallyget a full Recreation of every part of
the exhaust stream that we're gatheringfrom you interacting with the app or
website just for reproduction purposes.
once you've distilled here,you can say, okay, now let me
look at that cohort of users.
And so I can do patternrecognition, which I think is pretty
Hmm.
So for the audio audience that didn't,that didn't get to watch the video,

(41:09):
what are some of the key sort of, ifsomeone's in a mobile and front end
team and this is actually going backto a conversation I had with one of
your Embrace team members at KubeCon inLondon, what are some of the key changes
or things that they need to be doing?
If I guess if I back up and say thepremise here is that if I'm it's

(41:32):
there's almost like two archetypes.
What am I trying to say here?
There's two archetypesthat I'm thinking about.
I'm thinking about me, the DevOpsslash observability system maintainer.
I've probably set up.
Elk or, you know, I've gotthe key names, the Loki's, the
Prometheus, the, the Grafana's.
I've got all these thingsthat I've implemented.
I've brought myengineering teams on board.

(41:53):
They like these tools.
They tend to have, especially formobile, they tend to have other
tools that I don't deal with.
They might have platform, like the, the
Traditionally, right?
We're obviously trying to change that.
But yeah, traditionally, theyhave five or six other tools that
don't play into the ecosystem,observability ecosystem you have set up.
Yeah.
So, so we're on this journey totry to centralize, bring them

(42:16):
into the observability world.
you know, traditional mobileapp developers might not even.
Be aware of what's going on in thecloud native observability space and
we're bringing them on board here.
Now suddenly they get even more codecoming at them that's slightly less
reliable or maybe presents some unusualproblems that we didn't anticipate.

(42:40):
So now, you know, we're in aworld where suddenly what we have
in observability isn't enough.
yeah,
you're a potential solution.
What are you looking at for behaviorsthat they need to change to things
that people can take home with them?
And
I mean, I guess the way I think aboutit is, right, the way, the reason
observability became so widely adoptedin server side products was because in

(43:04):
an effort to more easily maintain oursoftware and to avoid widespread defects
of high blast radius, we shifted from aparadigm of like monoliths deployed on
bare metal to virtualization, which was,you know, various container schemes kind
of that has right now most widely beenaround, Kubernetes and microservices

(43:25):
because you could scale them independentlyand you could deploy them independently.
Right.
And that complexity of the deploymentscheme and the different apps and services
interplaying with each other necessitatedan x ray vision into your entire system
where you could understand system wideimpacts to your, the end of your world.
And the end of your world, from themost part became your API surface

(43:48):
layer, the things that servedyour web and mobile experiences.
And, you know, there arebusinesses that just serve APIs.
Right, but broadly speaking, the brandswe interact with as human beings serve us
visual experiences that we interact with.
right.
It's the server team managingthe server analytics, not so much
the client device analytics that

(44:11):
right.
the world has gotten a lot morecomplicated in what the front
end experience looks like.
And you could have a service thatconsistently responds and has a nominal
increase in latency and is well withinyour alert thresholds, but where the
SDK or library designed for your frontend experience suddenly starts retrying

(44:32):
a lot more frequently, deliveringPerceived latency to your end user.
And, and so I think the questionis, could you uncover that incident?
Because if users suffer perceived latencyand therefore abandon, what metrics do
you have to go measure whether or notusers are performing the actions you
care about them performing, whetherthat's attributable to system change?

(44:55):
In most instances, I don't thinkmost observability systems have that.
and then the second question is, right,so, and by the way, Bret, that's the
underlying supposition that in a realobservability scheme, mean time to detect
is important, is as important, if notmore so, than mean time to resolve.
the existing tooling ecosystem forfrontend and mobile has been set up to
optimize mean time to resolve for knownproblems where I can basically just

(45:17):
count the instance and then alert you.
So, And, you know, the lack ofdesire to be on call, like I've
heard this stupid saying thatthere's no such thing as a front end
emergency, which is like ridiculous.
If I'm, you know, if I'm a major travelwebsite and I run a thousand different
experiments and a team in, EasternEurope drops an experiment that affects

(45:39):
users globally, 1 percent of usersglobally in the middle of my night.
That, makes the calendar controlbroken and that some segment of that
population can't book their flights.
That sounds a lot like aproduction emergency to me.
I, that has material businessimpact in terms of revenue.
Or the font color changesand the font's not readable

(45:59):
yeah, I guess I am imploring theworld to shift to a paradigm where
they view like users willingnessand ability to interact with your
experiences as a reliability signal.
And I think the underlying suppositionis that this only becomes more acute of a
problem as the number of features we ship.

(46:20):
I guess I'm starting with thebelief that from what I hear, people
are doubling down on this, right?
They're saying we need to make softwarecheaper to build and faster to build.
Because it is a competitive environmentand if we don't do it, somebody else will.
and as that world starts toexpand, the acuity of the

(46:40):
problem space only increases.
Yeah.
I think your theory here, matchesare well with some other like we're
all kind of in this little bit ofwe've got a little bit of evidence.
We hear some thingsand we've got theories.
We don't have years of facts ofexactly how AI is affecting a lot of
these things, but other compatibletheories I've heard recently on this,

(47:03):
actually with three guests on the show.
that I'm just coming to mind.
One of them is, because of the velocitychange and because of the, it is
this increase in velocity, it onlyis going to increase the desire for
standardization in CI and deployment,which those of us in, if you've been
living in this Kubernetes world, we'veall been trying to approach that.
Like we're all leaning into Argo CDas one of the, as the number one way

(47:27):
on Kubernetes to deploy software.
you know, we've got.
This GitOps idea of how to standardizechange as we deliver in CI.
It's still completely wild, wild west.
You've got a thousand vendorsall in a thousand ways you
can create your pipelines.
And hence the reason I need to makecourses on it for people, because
there's a lot of art still to it.

(47:48):
We don't have a checkbox ofthis is exactly how we do it.
And in that world, the theory is.
Right now that maybe AI is going to allowus to act where it's going to force us
to standardize because we can't have athousand different workflows or pipelines
that are all slightly different fordifferent parts of our software stack,
because then when we get to production andwe have started having problems, or if the

(48:10):
AI is starting to take more control, it'sjust going to get worse because the AI
Yeah, and at the end of the day,right, standardization is more
for the mean than the outliers.
And I think a lot of people assume they'relike, oh, we can do it right because we
have hundreds of engineers working on likeour, you know, our automation and stuff.
It's do you know how manycompanies there are out there
that are not technology companies?
They're a warehousecompany with technology.

(48:32):
Right.
And like standardization allows themto more often than not build good
technology, to measure correctly, todeploy things the right way, like we're
all going to interact with it in someway, right, and as the pressure for more
velocity and them building technologyspeeds up, the need for you know, a
base layer of doing things, where it'sconsistently right, but only increases.

(48:55):
And I think that's, you know,for the world it's a challenge,
for us it's an opportunity.
Yeah.
Awesome.
I think that's a perfectplace to wrap it up.
we've been going for a while now.
but, Andrew, io, right?
That's,
Yeah.
www.
embrace.
io.
let's just bring those up.
We didn't show a lot ofthat stuff, but, the,

(49:16):
in case people didn't know, I made a shorta few months ago about Embrace or with one
of the Embrace team members at KubeCon.
which we had touched a little bit onthis show, but we talked about that.
Observability is here for your mobileapps now and your front end apps and
that those people, those developerscan now join the rest of us in this,
world of modern metrics, collectionand consolidation of logging tools

(49:38):
and bringing it all together intoideally one, one single pane of glass.
if you're advanced enough to figureall that out, the tools are making
it a little bit easier nowadays,but I still think there's a lot of.
A lot of effort in terms ofimplementation engineering to get all
this stuff to work the way we hope.
But, it sounds like that you allare making it easier for those
people, with your platform.

(49:58):
We're definitely trying.
Yeah.
I think it'd be a future state that wouldbe pretty cool would be the ability for
operators to look at a Grafana dashboardor a, you know, dashboard or Chronosphere
or New Relic or whatever it is.
see that, you know, they see a disen a,an engagement decrease on login on a web
property, and immediately open an incidentwhere they page in the front end team and

(50:23):
the core team servicing the off APIs ina company and have them operating on data
where the front end team can be like.
We're seeing a number of retrieshappening after we updated to the
new version of the API that youserve for logging credentials.
even though they're all 200s, they'reall successful requests, what's going on?

(50:46):
And that backend team says, well, itlooks like our, you know, we were slow
rolling it out and the P90 latencyis actually 250 milliseconds longer.
Why would that impact you?
And they say, well, the SDK retriesafter 500 milliseconds and our P50
latency before this was 300 milliseconds.
So, 10 percent of our users orsomething are starting to retry

(51:06):
and that's why we're seeing this.
You know, the answer here is to increase.
The resource provisioning for theauth service to get latency back
down and or change our SDK to havea more permissive retry policy.
and, you know, have teams be able tocollaborate around the right design
of software for their end users andunderstand the problem from both

(51:27):
perspectives, but be able to kickoff that incident because they saw
real people failing to disengage,not just some server side metrics,
which I think would be pretty neat.
Yeah.
And I should mention you all,if I remember correctly in cloud
native, you're a lead maintainerson the mobile observability SDK.

(51:49):
Is that, am I getting that right?
I'm trying
we have engineers who areprovers, on Android and iOS.
we have the, from what I'maware of, the only production
React native, OpenTelemetry SDK.
we are also participants in a newbrowser SIG, which is a, a subset
of the former JavaScript SDK.
So our OpenTelemetry SDKfor web properties is.

(52:11):
Basically a very slimmed down chunk ofinstrumentation that's only relevant
for browser React implementations.
so yeah, working to advance the kind ofstandards community in the cloud native
environment for what, Instrumentingin real world runtimes where Swift,
Kotlin, JavaScript are executed.
Nice.

(52:32):
Andrew, thanks so much for being here.
So we'll see you soon.
Bye everybody.

All Episodes

Episode Transcript

Popular Podcasts

Stuff You Should Know

Dateline NBC

CrimeLess: Hillbilly Heist

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}Could AI End Human QA?

Episode Transcript

Popular Podcasts

.css-r6mb8g{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:1;overflow:hidden;}Stuff You Should Know

Dateline NBC

CrimeLess: Hillbilly Heist

All Episodes

Could AI End Human QA?

Stuff You Should Know