All Episodes

February 28, 2025 35 mins

On this episode of the VOID podcast, I’m joined by Nick Travaglini, who is a Technical Customer Success Manager at Honeycomb. Nick wrote up a near miss that his team tackled towards the end of 2023, and I’ve been really wanting to discuss a near miss incident report for a very long time. What’s a Near Miss you might ask, or how is that an incident, or is it? What IS an incident? Keep listening, because we’re going to get into those questions, along with discussing whether or not it’s a good idea to say nasty things about other companies in your incident reports. 

Related Resources

Mark as Played
Transcript

Episode Transcript

Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Courtney (00:30):
Greetings fellow incident nerds.
On this episode of The VOIDPodcast, I'm joined by Nick
Travaglini, who is a TechnicalCustomer Success Manager at
Honeycomb.
Nick wrote up a near miss thathis team tackled towards the end
of 2023, and I have really beenwanting to discuss a near miss
incident report for quite sometime now.
What's a near miss, you mightask?

(00:53):
Or how is that an incident?
Or is it?
What IS an incident?
Keep listening, because we'regoing to get into those
questions, along with discussingwhether or not it's a good idea
to say nasty things about othercompanies in your incident
reports.
Nick, super excited to have youjoining me on the VOID podcast

(01:13):
today.
for those listening, I've knownNick for some time.
so this guess, in some ways, anextra special treat to have you
joining me on the VOID podcastfor folks who might not know
you, lucky them.
They get to now know you.
Why don't tell us a quick bitabout yourself dive into this
report.

Nick Travaglini (He/Him) (01:32):
Great.
Thank you, first of all, so muchfor inviting me on to the show.
I really appreciate it.
My name is Nick.
He, him pronouns.
I work at honeycomb as atechnical customer success
manager.
I have a background inphilosophy and organizational
studies and science andtechnology studies.

(01:53):
I like to dabble in a bunch ofdifferent things.
I started working in the techindustry, my first big kid job
after graduating from, afterfinishing my undergrad, working
at a SAS CICD business, ended upgetting acquired by GE, worked
there for a while, worked atanother company before doing a
master's, and then from there,came to work at Honeycomb in the

(02:15):
DCS patent program.

Courtney (02:17):
I love how people who are drawn to resilience
engineering, learning fromincidents, all this stuff, have
these unique backgrounds,philosophy, psychology, all of
these things, it's a, in someways, a hallmark, I feel like
not a requirement, but maybe ahazard of caring about the
things we care about.

(02:38):
and it comes through in thisreport that you've written up,
actually, and we're going to getinto some of the details.
So you wrote a post forHoneycomb called Preempting
Problems in a SociotechnicalSystem.
we get into Some of, especiallythe one big word in there, can
you, this is so unfair, but I dothis to everyone.

(03:00):
Can you do the sort of TLDRversion what happened?
and then we'll, like I said,we'll dive into the details.
Oh, The blog post

Nick Travaglini (He/Him) (03:11):
about an almost incident that we had
at Honeycomb a couple years ago.
This is 2023.
And the situation was that theopen telemetry project, open
source project, creating anindustry.
The goal is to create anindustry standard way of

(03:31):
instrumenting your services thatthey emit telemetry data, then
you can send it to your,Analytics platform of your
choice.
they were updating theirsemantic convention.
So how this the syntax of thingslike how HTTP requests are
recorded by the instrumentationand then put into a file and

(03:53):
sent to that analytics,analytics platform, that was
getting updated.
And there had been anannouncement that this was
happening.
Several months before the hardcutover was set to occur, and
all of the libraries andeverything that were involved
that had to make this change forthat interim period before the

(04:15):
before the hard cutover had todual send with both the old
syntax and the new syntax.
And then on a particular date,they were permitted to stop
sending the old syntax.
Now, what happened was that thisdate came and actually went and

(04:36):
several engineers at honeycombwho participate in the project.
We contribute quite a bit to it.
they knew that.
This was happening and thatthere were going to be, it was
going to be required to be inrelease notes that this was
happening, the sort of quoteunquote normal mechanisms of
letting people know that like achange is happening would be

(05:00):
conveyed to engineers and it'sall well and good.
a couple of days after thatdate, a member of my team over
in the customer successdepartment.
Flagged it to the rest of ussaying, Hey, I see that this
could be really problematic forpeople because we have a lot of

(05:21):
people.
A lot of customers of honeycombrely on a sampling service.
We have open sourced a projectcalled refinery.
It's a means of sampling thistelemetry data.
So you only need to keep, say, 1in 10 of.
data that conforms to specific,classifications, and you define
those classifications and whatyou how you want to define your

(05:42):
sampling rules in a config file.
It's just the YAML file that'spart of refinery, but you have
to hard code in the syntax.
So when you're defining thingslike, hey, I want to keep 1 in
10 of my HTTP requests.
It's got a particular likesyntax that it's looking for.
And so if that changes, thenRefinery doesn't know to do

(06:05):
that.
And that means it's probablyjust going to let through
everything with the new syntax.
And,

Courtney (06:13):
just say hard coded YAML?
I'm like, I'm just like, Oh boy.
Sounds fun.

Nick Travaglini (He/Him) (06:21):
yeah, I'm, you know, I'm not a
software engineer.
I don't pretend to play one onTV.
Being a YAML engineer is aboutthe best I can do.
So if anyone's looking for aYAML engineer.

Courtney (06:33):
I'm here with love and respect for that.
I just feel like I've been, Ihave broken an entire website
with one wrong line of YAML.
sorry, I did not mean tointerrupt though.

Nick Travaglini (He/Him) (06:41):
Yeah, so Mike, my colleague, foresees
that people will have problemswith this.
They won't catch that this haschanged in their
instrumentation.
They've upgraded, butpotentially they've upgraded to
a new version that supportsthis.
They don't realize what they'vedone, and they start sending a
flood of traffic to honeycomb,and they have to pay for that.

(07:03):
Whereas, you know, if they weresampling at a 1 in 10 rate,
they're now sending 100%.
And their potential bill, or atleast the, you know, the rate
that they're sending trafficgoes up correspondingly.
and

Courtney (07:15):
I

Nick Travaglini (He/Him) (07:16):
were really concerned.

Courtney (07:16):
for a second here?
First of all, a lot of companieswould see that as a feature, not
a bug.
but right.
so I'm just, I'm not sponsoredby Honeycomb.
I have no affiliation, but Iwould just like that little bit
in the report also caught myeyes as kudos that you all were
trying to do the right thing byyour customers for this.

Nick Travaglini (He/Him) (07:33):
Yeah.

Courtney (07:33):
again,

Nick Travaglini (He/Him) (07:34):
So we, we figured that would be a, that
would be a poor customerexperience.
other things that Mike didn'texplicitly call out was that.
Are the alerts that you canbuild in honeycomb based off the
data that we're receiving arealso hard coded.
So like triggers are burn alertsbased on SLOs.
These are all going to beaffected by this.

(07:55):
And so people may not get alertsthat something is up with their
system where they'reanticipating it.
So, like, there's, there's morethan just the, like, Financials
at stake.
There's also like the abilityfor them to understand and for
us to really provide a goodservice, right?
Like in the sense of being ableto do observability.

(08:15):
So we, myself included, go andask.
Engineering.
Hey, are you familiar with this?
Do you know what's going on?
and they say, yes, this is athing that was announced a while
ago.
And we're like, oh, my God, thisis going to cause problems for
people.
and.

(08:36):
I declare an incident and so I,we use Jeli now part of pager
duty.
at the time they wereindependent company, I use the
Slack bot, the Jeli Slack bot todeclare an incident and that
starts rallying the troops.
We get.
Engineers, folks in CS, folks inproduct, particularly, I want to
shout out my colleague, Mary,who works in docs, who starts

(08:58):
pulling together like, hey, howare we going to communicate
that?
This is a potential problem tocustomers.
while we're investigating 1 ofthe things here is that these
libraries and what not have beenpermitted to upgrade.
To a version that has thebreaking change, but they may
not have done so.
So that's the thing that we, werealized pretty quickly that we

(09:19):
need to go check to just seewhat's the latest version.
And when did they release thatlatest version?
Right.
Pretty simple to actually gocheck.
You just go look in the GitHubrepositories, but we need to
start communicating this so thatin case something has gone out.
One of these has upgraded.
We need to let customers know weneed to forewarn them in case

(09:40):
they haven't upgraded yet, butthey should be on the lookout
for this.
so myself and other TCSMs sendquick notes to accounts that we
work with customers that we workwith that.
Hey, go take a look at this.
We're checking on our side, butwant to give you a heads up so
you can, you can also do this.

(10:01):
You can also take some actionhere.
Turns out, fortunately, none ofthem had actually upgraded since
the date.
It was only just a couple dayslater, so we got really lucky.
so what we, what did we do fromthere?
We Obviously sent out thatinitial quick communication.
We drafted up a longer forminstructions like, hey, these

(10:21):
are what you should be lookingfor.
It's actually there's a blog onthe honeycomb blog that that
details some of this stuff wereached out to more folks in the
Otel community.
including like maintainers andwhatnot that like, Hey, we got
to do more to publicize this.
Like, we're really concernedthat folks are using tools like,
say, Dependabot, which willautomatically generate a PR that

(10:43):
engineers can accept when one oftheir dependencies releases a
new version.
But depending on the productionpressure and how busy they are,
like, there's a lot ofdependencies out there.

Courtney (10:55):
There's a lot in here and you, like, you refer to this
as a near miss, a, what you saidsomething very early on, a near
miss.
Near, an almost incident, a nearmiss, what does a near miss mean
to you?
I of want to get specific onthat terminology from you, at
least I know other people canhave different meanings for
this.

Nick Travaglini (He/Him) (11:12):
Sure, so a near miss, or as George
Carlin would say, a near hit,is, when one is looking at
trajectories, I'm going to say,of various forces, just to be as
abstract as possible, and itlooks like they're going to

(11:32):
intersect each other in a waythat would make your life
unhappy.
You would be very unhappy ifthese things intersected.
and then you get lucky, or maybeyou've done something, maybe
you've taken some action, sothat they don't actually end up
hitting each other.
And so you've avoided thiscollision, you've avoided this

(11:53):
problem.
So it assumes that things aregoing well, and you either luck
out, or you, you know, throughyour own through an active
agency are able to avoid aproblem.

Courtney (12:10):
It sounds to me like what you're saying here is that
avoiding this near miss wasdefinitely not luck, but it was
anticipation and expertise offolks who are very intimately
familiar with these aspects ofhow the system works.

Nick Travaglini (He/Him) (12:25):
Right?
So we, we did get lucky in thesense that none of these
libraries had actually upgraded

Courtney (12:30):
yeah.

Nick Travaglini (He/Him) (12:32):
during the couple days since they had
been permitted to do so.
So I totally want to say that,like we did, there was an aspect
of luck here, but I'm not goingto pretend that, but we in
customer success understand ourcustomers' behavioral patterns
that they want to sample, butthey use alerts and use various
features of our product and howthose are technically

(12:57):
implemented.
Such that we can trace out, wecan anticipate how various
aspects of the technical partswill behave, given this social
change, which is the change to aconvention, a naming convention,

(13:18):
and the specific syntax that'sused, not to mention the social
aspects of the productionpressure that engineers may be
under.
when things like, if they'reusing something like Dependabot
or reading through a PR, likemaybe they read the release
notes, but they don't read theactual code, like the diffs
themselves, and that, you know,they may miss something, they

(13:38):
may introduce a bug where, wherethere hadn't been.

Courtney (13:45):
I have a question about that.
What I like, about what you justexplained there and the way you
write it up is.
I think it lays bare what wemean by a socio technical
system, which is trying to drivehome the fact that it's not that
we work in software, but we ashumans work in with software, we
are a part of that system.
And so the and the successes andthe failures and all those

(14:08):
things are not inherently one orthe other.
It's this.
It's how humans work withmachines.
And the central thesis early onthat you have is that This stuff
can't work, and we won't, andwhat we tend to is have these
near misses because ofadaptability of humans in, a
joint system with machines.

(14:29):
Is that, would you say that's afair restatement of your central

Nick Travaglini (He (14:33):
Absolutely.
Absolutely.
So when it comes to technicalartifacts, technical objects,
they are typically considered tobe at best.
I'm going to use a little bit ofjargon here.

(14:54):
They're complicated.
So they are things that arebuilt up from, they're composed
or decomposable according tosort of discrete, say, digital,
digital components.
There's a part wholerelationship, and that whole can
be broken down cleanly.
That's what sort of complicatedmeans.

(15:15):
A complex system, is notdecomposable nicely.
And that's sort of likediscrete.
You've got a certain number ofdiscrete components that you
can.
aggregate into back into thesame whole.
if you decompose it, youactually get things that are
qualitatively irreducible toeach other.
That's that's the sort of ideaof complexity.

(15:36):
And certainly in an organizationlike a business You've got
organic beings living thingslike people, and then you've got
technical artifacts, which areoften considered to be at most
complicated, at least simple.
To sustain itself, you requireboth because people need

(16:00):
technical objects to do things,to extend our own powers and
capacities.
And there's a lot of things thathave to work together that were
not built to work together in anorganization, a bunch of
technical objects that were notbuilt to work together.
And humans are the things thatare able to mediate that and
make them work together.

Courtney (16:22):
and not just mediate that, but anticipate and
machines cannot anticipate andadapt and plan based on
accumulated expertise, that isan inherently human capacity.
By looking at this near miss,you're looking at normal work,
right?
Like somebody could argue, well,why are you calling this a near

(16:42):
miss?
This is just stuff we do all daylong.
It's just stuff that productteams and engineering teams do.
That's just our daily work.
Why would that be a near miss?

Nick Travaglini (He/Him) (16:51):
So one of the interesting things I
think about this is that we hadan incident and nothing was
broken, right?
There's no technical componenthere that wasn't functioning
properly, or the way that we,the way that we really wanted it
to, where the sort of breakdownwas, was in the communication
portion about the change fromone syntax from the in the

(17:15):
semantic convention.
Right, so refinery workingexactly as you would want,
honeycomb ingesting data exactlyas you would want.
the dependabot, you know, ifthat, if that is being used,
that's working precisely.
And so this is understanding howthese technical components work
when they're functioningappropriately, when they're

(17:37):
functioning properly.
So that's understanding normalwork, because most of the time
they are up.
We're able to keep them up.
and then understanding, having asense for human culture and
sociology and how people work inorganizations and the sort of
pressures that they face intheir day to day activities, in
a Business that is for profitbusiness.

(17:59):
You know, you've got toinnovate.
You've got to make money so youcan reinvest that so you can say
your competition and so on andso forth.
And so just thinking about thecollisions of these or possible
collisions of these forces.
Is where understanding normalworks allows you to get ahead,
you can anticipate you're notjust like heads down, just

(18:20):
focused on your work in themoment.
You're not immediately engaged.
You're also somewhat temporally.
And this is where the philosophystuff comes in.
you're, you're temporallydisengaged.
You're actually of multipletimes.
You're of the immediate present.
Where I have to know, like, whateach of these things do in their

(18:42):
normal operations, and I'm ableto take a step back and reflect
on where is this going?
Right?
And that's a totally differentdimension to time than the
immediacy of the present.
And that's, that's somethingthat only humans can do.
And so that's where we providethe adaptive capacity to use
some language from resilienceengineering, to make sure that

(19:05):
as we understand normal work, wecan actually deflect and
redirect and adapt and change tomake sure that things continue
to work.

Courtney (19:14):
What was particularly interesting to me about the way
you talk about productionpressures in this, is in the
notion of causality It’ssomething that comes up in lots
of incident reports, but also ina, in a near miss.
I have seen, and I have onecompany in mind who I'm not
going to name right now.
but incident reports that slagother companies in the incident

(19:38):
report, as a, as the cause or,or the problem or what have you.
I'm generally speaking, not afan, of that approach,
regardless of how egregious thesituation was, but you but go

(20:08):
the opposite direction.
Instead of throwing Otel underthe bus, you identify production
pressures on your end for thisincident.

Nick Travaglini (He/Him) (20:21):
What I put into the blog post is that,
you know, it's totally possibleto say that, Oh, Otel didn't do
enough to let people know thatthis change is coming.
They had one post about it.
That was issued months beforetheir blog, how many people are
reading this blog, they shouldhave known that, this is a sort

(20:43):
of hindsight, now that thisproblem has arisen, we know
something that back at the time,they probably didn't know, they
probably weren't thinking inparticular about, for example,
how honeycombs refinery systemwork, right?
They may have no, they may havehad no idea that it even
existed, right?

(21:04):
Whoever wrote this post, like,it decides on on announcing
these things like, right?
So, like, but it's totallypossible to say, well, they
should have issued another one 2weeks before the change and a
week before and 24 hours andlike, It's all their fault.
Like if they had just done more,we could have avoided this whole

(21:24):
situation.
And I think that that'sproblematic.
And I think that there's a lotof, there's a lot of folks in
resilience engineering and youwill appreciate that there's
more going on here.
There's the way that Honeycombhas designed refinery.
To use a yaml file to do thishard coding of the syntax in

(21:45):
there, right, like the what wehave done and we also like we
built refinery not knowing thatthis change was coming.
So like, it's also not ourfault, right?
so we needed to.
We need to take someresponsibility, not more than
our responsibility, like, I'mnot going to claim that, like,
we, Honeycomb had made anydecision about the semantic

(22:09):
convention change, like, orthat, the point is not to cast
aspersions, the point is tounderstand how things work and
to use our wet brainedcapacities as humans to, like,
just think about the situation,take a step back, from the
immediacy and do something thatonly we can do, which is to

(22:31):
really think, to think about thesituation.
Like that's, that's really whatthinking is, is the ability to
like step back and be like, OhF, this is going to happen.

Courtney (22:40):
Cause it's something you brought up that, that, and I
think you linked to another talkfrom the learning for incidents
conference that happened acouple of years ago, but I was
also thinking about some workthat Sarah Butt and Alex Ellman
had done on this front how do wethink about incidents and and
our systems in context of thirdparty or other organizations
that we work with, which isprobably the reality for I'm

(23:04):
going to just go out on a limband say almost every single
software company out there now,right?

Nick Travaglini (He/Him) (23:10):
So the thing about engineering is that
because it involves people,People are these open, energetic
systems, like where thesebiological organisms, we have to
ingest food in order to survive.
We talk to people like very,very communicative.

(23:31):
Organizations are not hardboundaries.
Right, I work with customers allthe time.
It's literally my job, right?
And we have engineers whointeract with engineers at our
customers all the time.
This is a thing that is actuallysuper important to the
engineering process generally,is that organizations are not

(23:52):
hard, closed systems.
There's a fantastic book calledHitting the Brakes by a woman
named Ann Johnson.
And it's a study of actually howthe anti lock braking system was
created.
And one of the things in herstudy of this that she points
out is that the engineers at thevarious organizations that were

(24:14):
competing with each other to getto market with anti lock braking
systems and dealing with aproblem in cars and vehicles.
it required them to communicatewith each other outside with
other with employees at thesecompetitors, right?
There had to be an industryconversation at a higher level,

(24:35):
at the industry level in orderfor the engineers at a given
organization in a given businessto have the ideas to make
progress on the problems thatthey were dealing with in their
local environment.
And so this idea that businessesare like hermetically sealed,
and that you can just like, whenthings are good, I can get
resources and help from others.

(24:56):
Like if you're listening to thispodcast, you're, you're doing
this, you're participating inthis, right?
Like you're part of the softwareindustry.
You're listening to this.
You're, you're literally engagedin this.
And when things go well for me.
I will give them all the creditin the world, but when things go
bad, then I'm gonna throw themoverboard.

(25:16):
That is, that's just totalhypocrisy, right?
Like, take, you've got to takeresponsibility here.
That's, that's one of the thingsthat I really like actually
about resilience engineering andthis sort of safety science
that's coming out of is that itreally is about: let me take
responsibility for what I did ina reasonable way like I
contributed to this I willtotally cop to that.

(25:38):
I will totally say yes.
I did this don't then throw meoverboard

Courtney (25:44):
Yeah.

Nick Travaglini (He/Him) (25:44):
and Put the blame and in an
inordinate excuse me in in Aninappropriate, inordinate, that
was the word I was going for, aninappropriate, yeah, an
inappropriate amount of blame onme.
If something goes wrong, Ialready feel bad, like I want to
do something about that becauseI want to take responsibility.

(26:07):
So like, help me to takeresponsibility to make this
better, right?

Courtney (26:13):
As a system, as a piece of a whole system.
This makes me think of one othernote I had on the report.
And so I'm going to loopbackwards a little bit, because
of want to talk about soft andhard boundaries if we have time.
But the thing I want to loopback to is in the report,
you're, you talk about Weweren't sure if this was an
incident or not an incident.
Is it an incident?
I don't know.
"I cut the Gordian knot anddeclared an incident." And there

(26:36):
is like an iceberg of contextunderneath that, those couple of
sentences in this report,because I feel like anybody who
has been involved in incidentresponse, in incident
management, understands theexistential question of“Is this
an incident?” Being the personwho pushes the this is an
incident button is a prettyscary place to be and When

(26:58):
you're not sure I think it's thescariest Maybe I could be wrong.
People could please come anddisagree with me on that,
actually.
what was that process like?
And what does that look like atHoneycomb?
If you have other experiences tocompare it to, that's great.
But I think that's a lessexamined piece of this world
than has given enough attention.

Nick Travaglini (He/Him) (27:17):
Let me say, first of all, I'm very
grateful to Jeli for having Theslack bot, the ability for
anybody to just do it in slack,you know, that was super helpful
for, for us getting together andbeing able to, to treat this
with the severity that I thoughtit merited, you know, maybe

(27:39):
other, other people wouldn't,but, you know, I'm, I'm at least
grateful for that.
I'm also grateful to mycolleagues for actually
including me in the training ofhow to, how to declare it and
use the Jeli bot and do all thissort of stuff.
That read the Howie guide likeall that sort of material.
It was great.
so one of the things that I'mgoing to go ahead and define an
incident.
Which is that there is a certainexpectation of how a system will

(28:06):
perform.
And an incident is where thebehavior of that system changes
in a way that somebody doesn'tlike.
That's it.
That's an incident.
And, you know, you start gettinginto all kinds of contextual
questions.
Who is this person?
When did this change happen?
Under what other conditions?

(28:27):
You know, what else is going on?
It requires a lot of context.
And that's the point.
There's not going to be a singledefinition of an incident, it's
always going to be a judgmentcall.
So that, for me, is what anincident is in terms of My
thinking about declaring this isan incident.
One of the things that I'm alsovery grateful about honeycomb

(28:50):
for a honeycomb is that we don'tcount the number of incidents
that we had.
We don't try to track thingslike MTTR they're not helpful.
My colleague Fred has written ablog post about how counting
forest fires is not a great wayof understanding whether or not
your firefighters are doing agood job.
Right?
It may be indicative ofsomething else, a bigger

(29:13):
systemic issue like climatechange.
it's not, it's not good forlike, are your firefighters
doing a good job?
And so I felt no compunctionabout declaring this in an
incident like it.
In fact, it's beneficial todeclare it an incident because
then we get to practice it.

(29:34):
It's a low severity incident,which, by the way, honeycomb has
moved away from severity levelsand categorizing incidents as
severities.
We have a typology a differentclassification system.
we get to practice communicatingand handling the unexpected and
learning how our colleagues workand updating our prior

(29:55):
understandings of how the systemworks.
There are things about honeycombthat I learned from
participating in this incidentand from attending and reading
incident reviews about incidentsthat have happened at honeycomb.
It's a real learningopportunity.
And we get to practice workingtogether in an ambiguous

(30:16):
situation during the incident.
And then afterwards, we get thechance to learn things.
So it's actually beneficial todeclare this as an incident.
And to pay attention to andrecognize, acknowledge that
something about how things aregoing has changed and
understanding that things havechanged that something has

(30:36):
changed and getting into thedetails of what has changed is
good for me going forward,right?
Because now I can take that intoaccount like now.
I know.
Hey, customer, look out for thischange.
It's coming down the pipe.
Eventually they're going toupdate.
There's going to be thisbreaking change.
I can advise my customers hownot to get caught in this trap.

Courtney (30:57):
Yeah, and, and this is the notion again from
resilience, engineering of workas imagined versus work as done.
The short version of that isthere's ways we think about
maybe we're not even aware ofthat we think about how our
organization functions, and thenthere's opportunities like this
incidents near misses that throwinto focus how it actually

(31:19):
functions.
And it's a really uniqueopportunity to be able to
understand that distinction.
There are things you learnedabout how your own company works
honeycombs not that big, but thesystem As a whole is complex
enough that none of you canpossibly know enough about all
of it.
And this is one of the ways thatyou all begin to learn those

(31:42):
pieces and how things actuallywork, right?

Nick Travaglini (He/Him) (31:45):
Yes, 100%.

Courtney (31:48):
Sorry.
This ends my TED talk on, workas imagined versus work as done.
I find is every time now I talkto somebody about an incident,
an incident report, anythinglike this, all of these other
things I've learned to come tomind because I love about this
field of resilience engineeringis it's not a fuzzy, academic,

(32:11):
know, unattainable field ofstudy, which not all academic
things are, but it's easy forpeople to feel that way about
them, that Resilienceengineering and the topics that
we talk about in this come fromthese lived experiences of
practitioners of previouslypeople who are like airline
pilots or, know, air trafficcontrollers or surgeons or what

(32:31):
have you.
And we're just bringing thatlens and to software and it
just.
It's, it stops me in my trackssometimes how prevalent these
papers I read in academicjournals are to like the
experiences that you'redescribing to me right now.
Nick is also, I'm going to do alittle pitch here, a member of
this group, which is theResilience in Software

(32:55):
Foundation.
So I'm going to put a little bitmore about that on the bottom,
because I feel like we'vecovered so much territory of
these things that we discuss inthat space a lot, but closing
thoughts.
you really talk a lot abouthumans being the pieces that
keep these systems together andrunning and handling edge cases.

(33:16):
do you have any closing thoughtson that before I wrap this one
up?

Nick Travaglini (He/Him) (33:20):
So the ability for people to make
technical artifacts, technicalobjects work together.
That really came to me as partof a part of my study is
reading- a philosopher again,surprise philosophy- named
Gilbert Samandhan.
He's got a great work called"Onthe Mode of Existence of
Technical Objects".
It's dense, just heads up dense,but it's really great because

(33:45):
he, he talks about how peoplereally are these transducers
like we're able to convey andtransmit information and
modulate it between technicalobjects that were not designed
to accept information from eachother in particular formats.
We handle the rough edges ofthings-- there's better and

(34:08):
worse ways to do that, you know,but it's what people do and it's
it's essential for collaborationand Just anything that we want
to do in any way that we want tobe productive.

Courtney (34:21):
It's funny because I will end with a Bluetooth rant.
I was on a call yesterday withsomeone and I had recently
applied for a home, like a HELOCand my phone just kept ringing
because it also, heads up folks,if you do an online application
for a HELOC even through yourown bank, that information gets
out into the internet and thenpeople just start calling you,

(34:42):
which is horrible.
But they started calling meduring this recording and I had
my Bluetooth headphones on and Iam.
Intimately aware of this problemabout how my phone and my
computer don't communicate verywell because if I am recording a
zoom thing and my phone rings,it hijacks the Bluetooth on my
headphones and then when I turnthe phone off the headphones
start playing the last thingthat my son was watching on my

(35:04):
computer into those headphoneswhile I'm on the zoom call and
my adaptive capacity for thatwas to put my phone in airplane
mode you and I got on recording,which Yeah, which is a perfect
bow on A) how much Bluetoothsucks and B) how much humans are
good at working around it.
So that's a great note to endon.

(35:26):
Thanks for joining me.

Nick Travaglini (He/Him) (35:28):
Thank you so much for having me.
Thank you for the work you dowith the VOID.
It's great.
People read it, read thereports.
Courtney's work is fantastic.
Advertise With Us

Popular Podcasts

Stuff You Should Know
My Favorite Murder with Karen Kilgariff and Georgia Hardstark

My Favorite Murder with Karen Kilgariff and Georgia Hardstark

My Favorite Murder is a true crime comedy podcast hosted by Karen Kilgariff and Georgia Hardstark. Each week, Karen and Georgia share compelling true crimes and hometown stories from friends and listeners. Since MFM launched in January of 2016, Karen and Georgia have shared their lifelong interest in true crime and have covered stories of infamous serial killers like the Night Stalker, mysterious cold cases, captivating cults, incredible survivor stories and important events from history like the Tulsa race massacre of 1921. My Favorite Murder is part of the Exactly Right podcast network that provides a platform for bold, creative voices to bring to life provocative, entertaining and relatable stories for audiences everywhere. The Exactly Right roster of podcasts covers a variety of topics including historic true crime, comedic interviews and news, science, pop culture and more. Podcasts on the network include Buried Bones with Kate Winkler Dawson and Paul Holes, That's Messed Up: An SVU Podcast, This Podcast Will Kill You, Bananas and more.

The Joe Rogan Experience

The Joe Rogan Experience

The official podcast of comedian Joe Rogan.

Music, radio and podcasts, all free. Listen online or download the iHeart App.

Connect

© 2025 iHeartMedia, Inc.