All Episodes

July 16, 2025 12 mins

Are our use of reported injury measures, like TRIFR or LTIFR, 'good enough' representations, or beset with foundational statistical flaws?


Today's report is from Hallowell et al., 2020, titled 'The Statistical Invalidity of TRIR as a Measure of Safety Performance'. From the CSRA.


Feel free to shout me a coffee to support my site &podcasts: https://buymeacoffee.com/benhutchinson


More research at SafetyInsights.Org

 

Intro/Output "Dark Synth Wave" by ElephantGreen(PixaBay.com)


Make sure to subscribe to Safe AF on Spotify/Apple, and if you find it useful then please help share the news, and leave a rating and review on your podcast app.


I also have a Safe AF LinkedIn group if you want to stay up to date on releases.

Mark as Played
Transcript

Episode Transcript

Available transcripts are automatically generated. Complete accuracy is not guaranteed.
(00:06):
The safety dashboard looks great, nearly everything is
green or amber with barely a hint of red and then you kill
somebody. Are our injury measures more a
curse with a poor statistical basis or a decent enough
predictor of harm? Good day everyone.
I'm Ben Hutchinson and this is Safe as a podcast dedicated to

(00:30):
the thrifty analysis of safety, risk and performance research.
Visit safetyinsights.org for more research.
This well known report from Hallowell ET al as part of the
CSRA studied the statistical basis of TRIR, or total
recordable injury frequency rate.

(00:50):
It's titled The Statistical Invalidity of TRIR as a Measure
of Safety Performance. So what is the TIR TRIFA?
So TIR? The total recordable injury
frequency rate is the rate at which a company experiences an
OSHA recordable incident scaled per 200,000 worker hours.

(01:12):
In other places around the worldit's scaled against 1,000,000
worker hours and probably other variations.
So this helps normalise the value to account for different
working hours and by extension different headcounts.
Now, even though the report refers to Trir, I'm going to
call it Triffa since that's whatwe call it in the land of Oz.

(01:33):
What were their methods? The researchers had direct
access to over 3 trillion workerhours of internally reported
incident data from partner organisations within the US,
covering a 15 year period. The data set included monthly
cancer, recordable injuries, fatalities and worker hours.

(01:54):
They used a range of statisticalanalytical approaches, both
parametric and non parametrical tests, and these serve different
purposes just very quickly. The parametric approach, in
particular the Poisson distribution, identified
recordable incidents as discreteevents, which means that they
either occurred or didn't occur and recognise that they vary

(02:16):
over individual worker hours. Therefore the truffle was
logically represented as a series of Bernoulli trials and
the Poisson distribution was deemed the most accurate
representation. This is pretty appropriate for
modelling these types of discrete events, in particular
discrete rare events that occur over a fixed interval of time.
This really well describes recordable injuries within

(02:38):
worker hours. They also use some non
parametric preaches, but I really recommend you check out
the report to get a description around what these tests are and
why they use them. The reason why I jumped into
that background around the parametric tests before,
particularly the Poisson distribution, is that several
people on social media were quick to argue that the Poisson
distribution in this report isn't appropriate for this type

(03:01):
of modelling, and that the authors should have used a
negative binomial distribution. First of all, some existing
research has already shown that incidents are appropriately
modelled via the Poisson distribution and that the
assumption of independence isn'tactually violated.
I'm not going to go into that here, though.

(03:21):
Further, the applied statistics in OHS textbooks from Jenny SAC
specifically describe the appropriateness of Poisson
distributions for this purpose, because it's well geared for
statistically rare events compared to the total exposure.
However, a recent review study of best practises for
underreporting accident researchfrom Bazoli and Probes in 2025

(03:45):
in Safety Science found that while both Poisson and binomials
were appropriate for this task, in fact, they even argued that
linear Poisson and negative binomial models returned similar
results in medium and large sample sizes, showing that it
actually is appropriate. However, based on that research,
negative binomials did return a higher accuracy.

(04:08):
But again, Poisson was a good estimator of negative binomials
and is appropriate for statistically rare events, which
is what the statistician Janisacin his OHS statistics textbooks
and how well ET al in this current report actually argued
to begin with. Anyway, I've got off my soapbox
now. So what did they find?

(04:29):
Well, let's jump into the core findings of this report.
One, no links between the Triffer and serious accidents
and fatalities was found. There was no discernible
statistical association between triffers and fatalities.
Therefore, the trends in the Triffer isn't statistically
associated with fatalities, suggesting that they happen for

(04:50):
different reasons and because ofthis lack of association, triffa
is not a proxy for high impact incidents.
And further, the authors argue that all of these safety
activities associated with improving triff performance may
not necessarily help to prevent fatalities. 2 The results

(05:10):
indicated that changes in triffaare due to 96 to 98% random
variation. The authors discussed that
recordables don't occur in predictable patterns and it's
slightly because safety is a complex phenomenon impacted by
many different factors. The models were tested to see if
historical Triffer performance predicted future Triffer
performance. Found was that at least 100

(05:31):
months of data was needed for reasonable predictive power.
It was argued that because Triffer is normally used to make
monthly or annual comparisons, this finding of around 100
months of data is like the basiclevel needed indicates that for
all practical purposes, Triffer is not predictive in the way
that it's used. In plain language, injuries are

(05:53):
largely due to chance and don't follow a consistent statistical
pattern. But what does random chance or
chance mean in this context? Because it's been taken out of
context by some practitioners, meaning that there's no
underlying cause or the contributing factors.
Is that what we're arguing? That there are no causes in
life? That's actually incorrect.

(06:14):
We're talking about statistical randomness, not some ontological
or positivistic sense of the world.
What this means is that the occurrence of recordable
injuries doesn't follow predictable patterns or occur at
regular discernible intervals. So while a safety system aims to
reduce risk, the actual manifestation of incidents over

(06:36):
short periods is highly variableand statistically behaves like a
random process. There's also the issue of
statistical noise, given that 96to 98% of the variation in TRIFA
was due to random variation. This implies that when you see
an organisation TRIFA go up or down from one period to the

(06:57):
next, it's overwhelmingly likelyto be statistical noise rather
than direct reflection of some sort of improvement or
deterioration in the underlying safety system.
You may as well throw dice to predict the next accident based
on tracking these data. And because they're quite
relatively rare events, this infrequency contributes with a

(07:19):
high degree of random variation within TRIFA, especially when
measured over typical short timeframes, month, or years.
So in other words, this doesn't mean there's no causality or
underlying contributing factors.It highlights that many causal
factors at play in a complex system lead to recordable
incidents that are, at a sort ofmacro statistical level over

(07:41):
typical reporting periods, highly unpredictable and random.
All right, so let's move on to the next finding. 3 The Triffer
isn't precise. It lacks precision and shouldn't
be communicated with multiple decimal points unless hundreds
of millions of worker hours are amassed.
The confidence intervals are just too wide for Triffer to

(08:02):
report it accurately to even 1 decimal point.
And perhaps my favourite part ofthis paper follows that, where
it's shown that if you were to report Triffer to 2 decimal
places, for instance, you know you have a Triffer of 1.29, you
would need approximately 30 billion worker hours of data to
support that claim. So on this point, the authors

(08:23):
state that Triffer for almost all companies is virtually
meaningless because they do not accumulate enough worker hours.
Four, Triffer is statistically invalid for comparisons in
nearly every practical circumstance.
It's statistically invalid to use Triffa to compare companies
or business units, projects, or teams, because most companies

(08:46):
again don't accumulate enough worker hours to detect
statistically significant changes.
On this point, the authors statethat Triffa shouldn't be used to
track internal performance or compare companies, et cetera.
And Triffa cannot be a single number by extension, because if
Triffa is largely random, a single number doesn't represent

(09:07):
the true sort of reflection of safety performance.
Instead, triffa should be expressed as a confidence
interval, a range of potential values over extended periods.
Therefore single point estimatesof triffa broken down into
decimal places over really shortperiods of time.
You know, months to years. Said to be statistically
meaningless for almost every organisation.

(09:29):
In plain language, instead of reporting our Triffa is 1.29,
companies should at least reporta range like our Triffer is
likely between 1:00 and 4:00. The next key finding is that
Triffer is only predictive over very long periods, predictive of
when you have at least 100 months, more than eight years

(09:50):
data. So for practical purposes, given
typical reporting periods in thea month to month, it's not
predictive. So therefore, what your Triffer
was last month or last year probably doesn't tell you much
about what will be next month ornext year, unless you have many
years of data behind it. And even by the stage when you
have enough data, it may not be relevant anymore to your current

(10:12):
operations. Therefore, Triver is inadequate
for measuring intervention impact.
It's entirely inadequate for attributing changes in safety to
specific interventions or investments.
What can we make of the findings?
Well, first of all, a misconception is that the
measures themselves are invalid.Therefore, you know, we're

(10:33):
saying Trifer is somehow invalid.
But this isn't actually what thereport indicated.
It indicates that the statistical basis of how these
measures are used are typically invalid.
Again, using the example of dice, you may as well throw
dice. We're making causal claims about
getting 2 sixes when statistically we can't

(10:54):
differentiate the sixes from chance.
We assume there's some sort of causal factor of why we keep
throwing sixes, but statistically we can't
demonstrate that. In any case, some practical
implications I think maybe instead of thinking about
abandoning these measures, some efforts to help them suck less
is probably warranted, and the report already gives some

(11:15):
suggestions. Use a RAID instead of single
point estimate. Carve off those decimals 1
instead of 1.1. Decouple the measures from
decision making. For instance, they say if an
organisation is using Triffa forperformance evaluations, then
they're likely rewarding nothingmore than random variation.

(11:36):
Find ways to increase the samplesize and hence the the
statistical power, the time, thenumbers that you're evaluating,
et cetera. Don't use them to track internal
performance, or at least betweenprojects or teams.
By extension, maybe caution whenused for gauging contractor
performance and their use in tendering.
And of course, they're using incentive programmes.

(11:59):
Also, I liked how the researcherDavid Oswald suggested on
coupling quantitative indicatorswith qualitative indicators.
For instance, every number that you present should have a
qualitative descriptor, a narrative.
One gives you the what, that's the number, the other gives you
the rich narrative on how and why it actually matters.

(12:20):
There's also other tools to helpimprove the use of injury
measures. Control charts are a favourite
of mine. I use control charts for lots of
stuff and they incorporate Poisson in negative binomial
thinking. There's also a range of
statistical methods like significance testing, confidence
intervals and more. Anyway, that's it on Safe as I'm

(12:41):
Ben Hutchinson. Please help share rate and
review and checkoutsafetyinsights.org for
more research. Finally, Bill Breeder supports
Safe as by Shania Coffey. LinkedIn, the show notes.
Advertise With Us

Popular Podcasts

Stuff You Should Know
The Joe Rogan Experience

The Joe Rogan Experience

The official podcast of comedian Joe Rogan.

Dateline NBC

Dateline NBC

Current and classic episodes, featuring compelling true-crime mysteries, powerful documentaries and in-depth investigations. Special Summer Offer: Exclusively on Apple Podcasts, try our Dateline Premium subscription completely free for one month! With Dateline Premium, you get every episode ad-free plus exclusive bonus content.

Music, radio and podcasts, all free. Listen online or download the iHeart App.

Connect

© 2025 iHeartMedia, Inc.