All Episodes

January 24, 2023 40 mins

In this week's episode Greg and Patrick discuss how we might flip the traditional null and alternative hypothesis testing procedures to move us from tests of literal equality to tests of practical equivalence. Along the way they also discuss tough love, horseshoes and hand-grenades, Patrick’s Driving School, Cheyenne Mountain, So Long and Thanks For All the Fish, isn't that convenient, why people hate us, systolic blood pressure, *real* doctors, I Can’t Drive 55, splash zones, Gallagher, Dilbert, and being precisely equal. 

Stay in contact with Quantitude!

Mark as Played
Transcript

Episode Transcript

Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Patrick (00:04):
Welcome, my name is Patrick Curran and along with my
dolphin loving friend GregHancock, we make up quantum tune
we're a podcast dedicated to allthings quantitative ranging from
the irrelevant to the completelyirrelevant. In this week's
episode Greg and I discuss howwe might flip the traditional
Knoll and alternative hypothesistesting procedures to move us
from tests of literal equalityto tests of practical

(00:27):
equivalence. Along the way, wealso discuss tough love
horseshoes and hand grenades.
Patrick's driving schoolCheyenne Mountain So Long and
Thanks for All the Fish. Isn'tthat convenient? why people hate
us systolic blood pressure, realdoctors, I can't drive 55 splash
zones, Gallagher, Dilbert andbeing precisely equal. We hope

(00:51):
you enjoy this week's episode.

Greg (00:57):
It would be an understatement to say that you
have had a lot going on lately.
Am I right? That is right.

Patrick (01:03):
We've got two root canals a lip surgery and a major
shoulder reconstruction.

Greg (01:09):
Poor guy but if I may, that's not all.

Patrick (01:12):
No, I got some sad news listeners is a presents over the
last three and a half years onthe podcast was my 90 year old
mom. And indeed mom was on anearly episode with Aunt Joanne.
Mom through the beauty of yourIrish accent kept sending me for
a pack of smokes, and was withAunt Dottie as part of the

(01:36):
funeral for MANOVA. But in thefirst part of December, my mom
passed away. So sorry, thankyou. She was one of seven born
in a period of nine years. Wow.
on a dairy farm in Wisconsin inthe 1930s. She and her two
sisters went to college, thethree farm girls pooled their

(01:56):
money, bought a car and we'regoing to drive to California,
two or three B teachers. And inDenver had a car accident. My
mom was pretty seriouslyinjured. They had to stay in
Denver for her to recover andthe three of them never left.
She was a high school teacher,retired, became a published
author and was an absolutelyremarkable woman and she will be

(02:21):
missed.

Greg (02:23):
Well, I will always remember that she desk rejected
the limerick that I wrote forher to do on our episode.

Patrick (02:32):
Early on wanted her to read the limerick as part of
that episode and Greg wrote itout. I printed it. I was in
Denver, my aunt Joanne was thereas well gave it to my mum read
it and said, Yeah, this is nogood. I'm gonna rewrite it come
back later this afternoon. And Ithought Welcome to where I came
from as a human being.

Mom (02:53):
And oh, Patrick, this is your mother. I have a limerick I
would like to share with you andGreg, There once was a
quantitative mother whose twosons outdid each other that Pat
knew he was through when shesaid, Why can't you get a real
job like your brother? Don'tforget to call on a Sunday.

(03:14):
Honey, I have a question aboutMy Computer.

Patrick (03:18):
No offense to your Limerick but I think it came out
in

Greg (03:22):
well, what would we do?
Would we raise a pint ofGuinness to your to your mom? Is
that what we would

Patrick (03:27):
do? And then you can send me for a pack of smokes. So
anyway, thank you for yourthoughts, and she will be deeply
missed.

Greg (03:36):
So now that you yourself are getting a little bit older,
do you see some of her and youlike how you do things? How you
parent your kids?

Patrick (03:43):
Absolutely. There are two things that I see one that
lock stock and barrel I got is asense of tough love. I've told
this story before where my kidwas running barefoot stubbed
their toe ran crying mom brushthe hair out of their eyes and
said, Well, honey, that's whyGod made shoes. I got that.

(04:04):
Which as you are aware of thisstory translated where my kids
were training for a 5k one of mykids was complaining about how
her leg hurt. I said yeah, youhurt when you run, take an Advil
go to bed and two days later wefound out she had a fracture. So
there's that element. Butanother one that I really

(04:26):
admired with her is you can makea lot of progress with close
enough that notion of don't letthe perfect be the enemy of the
good. That that is close enoughthat we can do that and move on
to other things. My Limericknotwithstanding, okay, there is
a catalogue for here that it canbe within the caliper.

Greg (04:47):
You're saying it wasn't even close enough. Limerick

Patrick (04:50):
was outside of the caliper man but a really funny
thing happened a while ago whereI saw this with my own kids. My
kids are 18 now they've beendriving for good You're a year
and a half. But when we wereteaching them to drive, there's
the letter of the law and how doyou drive, but there's also the
spirit of the law. Hmm. And itinvolves speed limits. And I

(05:12):
made a comment at one point whenI was teaching the kids how to
drive that there's the postedspeed limit. But if you have
3545 55, you've got 689 miles anhour that you can go over that
before a cop is gonna botherpulling you over to be clear,

Greg (05:28):
you're offering advice, and you're the one who could
wallpaper your entire officewith traffic tickets. Am I
correct? On that point,

Patrick (05:35):
I could open my own driver's ed school based on how
many I have attended,

Greg (05:42):
okay, by all means, give your kid advice.

Patrick (05:44):
The problem was they encoded that as that was
actually a law. If you had aposted speed limit, you could go
10 miles an hour over illegally.
And this would not have been aproblem if one of them had not
raised this in driver's ed classas a fact, and they came home
and I got excoriated becausethey got laughed out in the

(06:09):
class is that if it's posted 45Well, legally, you can go 55
Because my dad said so my dadsaid so. So yes, that notion of
how close is close enough?

Greg (06:26):
What's the old saying close only counts and what was
the version of that that yougrew up? Hearing close only
counts in

Patrick (06:32):
horseshoes, hand grenades and nuclear war. Wait,
you got

Greg (06:35):
nuclear war at the end that was added. Okay.

Patrick (06:39):
NORAD is the home of the United States Nuclear
tracking, and it's in CheyenneMountain. I grew up like 30
minutes from Cheyenne Mountain.
And there was always all thistalk of Oh, the Russians are
going to put a missile throughthe front door of Cheyenne
Mountain. And when the SovietUnion fell, and all the secret

(06:59):
documents became available, theSoviet Union couldn't have hit
Colorado, much less the frontdoor of Cheyenne Mountain. So
yeah, close enough forhorseshoes, hand grenades and
nuclear war. Okay. You know,

Greg (07:15):
believe it or not, all of this actually relates to
hypothesis testing.

Patrick (07:18):
Really? All right, I'm gonna refill my coffee here, and
I'm gonna see how you're gonnarelate my mother to speeding to
nuclear war? Go? Okay.

Greg (07:32):
Let me start with this particular question. When you
reject a null hypothesis, likefor the difference between two
means, what do you actuallyconclude? What are the words
that you teach your class, whenyou reject the null hypothesis,

Patrick (07:44):
if you get a P value less than alpha, which we will
say is Oh five, I would say itis unlikely that you would have
observed a difference betweenyour sample means this large or
larger, if there was truly nodifference in the population,
and therefore you infer that Iwould have a probabilistic basis

(08:06):
to reject the null that they areprecisely equal in the
population and therefore aredifferent.

Greg (08:15):
All right, so that there's some difference, because it's
kind of hard to believe thatthere's no difference. You just
rejected that. And you would dothe same thing for a
correlation, right? When you geta statistically significant
correlation coefficient orpredictor in regression, you
wind up saying something that,you know, if it were the case,
that that correlation, werereally zero in the population or
that predictor really had nopredictive value and

(08:37):
understanding why it would bevery unlikely that I would have
observed this magnitude ofrelation. And so I reject that
No. Or sometimes we say nilhypothesis, right? Because
there's nothing going on infavor of saying there is
something going on.

Patrick (08:53):
Well, that's the cowardly part. We've talked
about this before, which iseither the means are precisely
equal or they're not.

Greg (09:02):
Yeah. So then let me flip this on you. And I'm not trying
to pop quiz you here. I justwant to hear the words that you
say. So that was when we rejecta null hypothesis. What words do
you use when you don't showthat? That's a good one.

Patrick (09:20):
Well, from a statistical standpoint, there is
insufficient empirical evidenceto say that my observed
difference between means wouldhave been unlikely given the
null hypothesis, thatstatistical substantively what I
would say is from a Popperianstandpoint, we have insufficient

(09:43):
empirical evidence to falsify orno hypothesis

Greg (09:47):
Wow, that was a clinic on why people hate us. Did your mom
teach you that? Thanks mistersee mom

Patrick (09:55):
also Okay, so for talking parenting. Here's
another one I learned from mymom and folks for Those of you
have young kids, this is anawesome one. And you can think
Dolores, for this. Kids is allabout making their own decision.
So Greg, it's completely up toyou. Do you want to take a bath
before dinner? Or do you want totake a bath after dinner? You

(10:16):
decide cuz you're a big boy now.
I was like, 17, when I was likeson of a bitch, I'm taking a
bath either way.

Greg (10:25):
Well, you know, the cagey language that you use, in the
end, when you retain the nullhypothesis, we find ourselves in
this really weird place. So Ilike what you say that, well, we
didn't have enough evidence toreject the null hypothesis,
people are often tempted toconclude that there is no
difference, then write that ifwe retain the null hypothesis,
and even the language that I'musing is intentionally guarded.

(10:49):
Some textbooks will actually saythat we accept the null
hypothesis, and I hate thatlanguage. That's just incorrect.
Yeah. Right. Because that almostgives someone license to say,
well, I guess the means areequal, then, you know, I grew up
softening it to retain but therereally is merit in saying
clumsily, that we failed toreject the null hypothesis that
we failed to find evidence of adifference, which is not the

(11:10):
same as saying there's nodifference. But I will tell you,
there are other times when we doexactly that, we make that exact
inference. Think about if youwere doing a classic t test,
right, just a pooled variance ttest. Some people just plow
right ahead and do that t test.
Other people will say, Well, youknow, that rests on an
assumption of homogeneity ofvariance with a Okay, good. So

(11:32):
how are you going to test thatassumption of homogeneity of
variance? And the answer mightbe doing an F test ratio of the
variances of the two independentgroups, it might mean doing a
Levine's test. But what happenswhen someone doesn't reject that
test? What do they do? They plowright ahead with the pooled
variance t test thinking that,Oh, I guess the variances in

(11:54):
those populations aren'tdifferent. So I've met the
assumption that underlies thatparticular t test. And when they
were testing the assumption ofhomogeneity of variance, they
actually took a failure toreject as saying, Oh, I guess
things are equal. So we doexactly that.

Patrick (12:12):
And we also pick and choose, right is that if we fail
to reject the null ofhomogeneity of variance, that's
because the variances are equal.
But if we fail to reject theequality of the mean, it's
because we don't have sufficientpower to detect the effects.

Greg (12:27):
Isn't that convenient?
Yeah, we do that with invariancetesting, too, right? That we
wind up setting up thismeasurement model for these
different groups that we want toknow that these variables are
loading the same. And in theend, we wind up doing a test
where we might have loadingsconstrained across these
populations against a modelwhere the loadings are not

(12:47):
constrained. And if we get nosignificant difference, what do
we say they're equivalent acrossthe two groups? There you go.
And so we have this logicproblem, where we're using the
non significance of a test asevidence that there is no
difference. And we're doingexactly what you say we're sort
of picking and choosing rightthere are these times where we
are kind of sort of hoping forequivalence. So we take that

(13:10):
failure to reject as evidence ofequivalence. And there are other
times where we kind of sort ofreally want a difference, and we
failed to find it. So we say,well, we just didn't have enough
power to find it. How would

Patrick (13:23):
you differentiate equality from equivalents?
Because you've been like adolphin in and out of these
terms? Okay. Up, down, up, down,up, down. Have you seen the
movie Hitchhiker's Guide to theGalaxy? Read the books saw the
movie? Well, the greatestopening film sequence of any

(13:45):
movie is that, folks, if youhaven't seen this, either watch
the movie, which is brilliant,or go to YouTube and watch the
opening. It's a song aboutdolphins, and it's so long, so
long, so long. Thanks for allthe fish.

Insert (14:00):
The last ever dolphin message was misinterpreted as a
surprisingly sophisticatedattempt to do a double backward
somersault through a hoop whilewhistling The Star Spangled
Banner. But in fact, the messagewas this So Long, and Thanks for
All the Fish.

Patrick (14:16):
Thanks for all. So sad that you come to this. Anyway,
equality versus equivalence go.

Greg (14:24):
Oh, so you kind of saw through everything right there.
I don't know if you got thatfrom your mom, you must have
because you couldn't have comeup with that on your own. Yeah,
you're exactly right. So whenthings are equal, or have a
quality than they are dead onthose means are exactly the
same. That correlation isexactly zero. That predictor has
exactly no contribution. Thoseloadings are exactly the same

(14:45):
across the differentpopulations. But equivalence to
me has a little bit more fuzzand it feels a little bit more
like was it Annie? Who was theone who was driving yes and
right. And so if we say thespeed limited Is this 55? I
mean, is it exactly 55? Is it 55plus or minus five miles an hour

(15:08):
plus or minus 10 miles an hour,there's a very specific
definition of limit. But inpractice, there is this
practical equivalence to 55miles an hour.
And I think our hypothesistesting lives are so set up to

(15:29):
aim everything at rejection, weforget that there are a lot of
cases where we are moreinterested in the equivalence of
things, then we are actually inestablishing a difference. But
we can't use the hypothesistesting framework the way it's
currently set up to try to getat that. And these kinds of
ideas of equivalence come up allthe time. You know, like when
we're testing assumptions ofthings, or imagine we have

(15:53):
intact groups that we are doingas part of a study. But we want
to know whether or not thosegroups are equivalent at
baseline. More broadly, if weare in the pharmaceutical
industry, and we have a new drugthat's a lot cheaper, we might
want to know whether that drugperforms the same or at least
equivalent to some existingdrug, does it lower blood

(16:16):
pressure, if not exactly thesame amount as a more expensive
drug? Close enough. So thisconcept of equivalent exists all
the time and things that we do,but we haven't really turned our
hypothesis testing lens properlyon these kinds of research
questions.

Patrick (16:32):
And the end of the day is how big is big enough right
to say that there's a differencethat's meaningful and
worthwhile,

Greg (16:40):
which is something that we have to do all the time, when
we're talking about poweranalysis and planning for a
study in which our goal, ourhope is to be able to find
something. But here, we're goingto flip it the other way, and
talk about what it means for twothings to be equivalent. So both
of our kids took the LSAT lastyear. And if they wanted to take

(17:00):
a prep course, they could havetaken very, very expensive prep
courses, they could have takencheap online stuff, you know,
someone might make the claimthat our online materials are as
good as one of those reallyexpensive courses. Well, what
does that mean? Does that meanthat the scores you get under
one course are exactly the sameon average as the other? Or
might we call them equivalent?
If on average, they don't differby more than 10 points or 20

(17:23):
points, we could define whatthat threshold is for what we
might call practicalequivalence. Or if there are two
medications that are supposed tolower systolic blood pressure or
diastolic I get those mixed upwhich one is which one is which?
Yeah, which

Insert (17:43):
shivers the bad one. Hi, this is Dr. David justice, MD
board certified pediatrichematology oncology and
Transfusion Medicine Physicianat Boston Children's Hospital,
epidemiological and treatmentstudies suggest that systolic
blood pressure should be theprimary target of
antihypertensive therapy,although consideration of
systolic and diastolic pressuretogether improves risk

(18:05):
prediction. Come on, guys, youcan just google this stuff like
us

Greg (18:08):
real doctors do. Well, thank you real doctor. But if
there are two medications, wemight say, yeah, these
medications are practicallyequivalent if systolic blood
pressure is within 10 points, orif you are correlating two
variables, and I say I thinkthose variables correlate zero.
I mean, they probably don'tcorrelate exactly zero, but

(18:30):
maybe you define zero is withinplus or minus point one of zero
or plus or minus point oh fiveof zero. So the point is that we
can define what we mean byequivalence, we can define that
band around your speed, youryour speed limit, which seems to
only be an interval on the upperend, we don't tend to worry. No,

Patrick (18:51):
but that's a real thing. I like to think of it as
a caliper. Like if you're withinthis tolerance, however you
define that. We're just going toconsider those all to be the
same. You're going 6566 67. Ifwe had sufficient equipment with
sufficient accuracy, we coulddifferentiate 67 from 66 miles

(19:12):
an hour, but nobody cares. Ifyou're going above 56 miles an
hour and below 74 miles an hour.
Nobody cares. Yeah, thatcaliber, but if you're going
slower, you might get pulledover because you're impeding
traffic. If you're going faster,you're gonna get pulled over
because you're driving too fast.
But it seems that Howard WanerIt don't make no nevermind. If

(19:35):
you're under 10 miles an hour ofthe posted limit, you're
probably going to be fine.
Nobody cares.

Greg (19:39):
And so if we want to do a formal statistical test of this,
the null hypothesis significancetesting that we're accustomed to
doing, the way it is framed,we're always aiming to sort of
reject out right using the meandifference example or the
correlation example. We'realways aiming to reject zero or
whatever the value is that we'relooking at. But we Imagine

(20:00):
flipping that test and thinkinginstead about what I'm really
kind of clumsily calling,rejecting in. Imagine that you
have set up a null hypothesis.
It's a weird null hypothesis,right? The null hypothesis we're
accustomed to doing is somethinglike, I'll do one for
correlation. For example, ournull hypothesis is that row the
population correlation is zero.

(20:24):
But we could instead set up anull hypothesis that says the
correlation between twovariables is point one or
greater, or to compound No, Iapotheosis, negative point one
or lower, which means stronger.
So imagine the null hypothesisdefines this region that's
outside where we would say, oh,yeah, that's a real correlation.

(20:47):
And then the alternative to thatis that our correlation in the
population actually fallsomewhere between negative point
one zero and positive point onezero, a region where we might
say, that is practicallyequivalent to there being no
correlation. Well, if we flipthe null and alternative
hypotheses like that, we canactually use hypothesis testing

(21:08):
procedures to try to testwhether or not we could consider
that correlation to bepractically equivalent to zero,
where the difference between twomeans being practically
equivalent to zero,

Patrick (21:21):
and that map's exactly on to this issue by Irish factor
loadings are in O'Shaughnessy,Oh, 305. If you have a factor
loadings that has a leadingdigit of zero, nobody cares,
right? Oh, 703, negative Oh,four. It's like, yeah, it's just
hovering around zero. Nobodycares. But you make some caliper

(21:41):
where if the factor loading isabove point one, or below
negative point one as a crossloading, then I gotta figure out
what to do with that. So wethink about this all the time,
and indeed, we grouse about it alot. Because in CFA going to
that null hypothesis, we saythat factor loading is zero.
Yeah, and here we're shruggingyou work for State University I

(22:05):
work for a state university isto say, I don't believe it's
exactly zero in the population.
But as long as it has a leadingdigit of zero in the value, I
can sleep at night. So there

Greg (22:15):
are two things going on here. One is setting up what you
think that threshold is, orthose boundaries are for
defining practical equivalence,and then figuring out the
appropriate statistical test foryou to do or actually
statistical tests, in fact,right, because in the example we
were talking about, you could beconvinced there actually is a
nonzero correlation if it wassomething sufficiently positive,

(22:39):
or if it was somethingsufficiently negative. So how
would we go about figuring thatout? Well, there are hypothesis
testing ways to do it. Andthere's sometimes go by the name
equivalence tests. But I thinkthere's a very easy way to think
about it just in terms ofconfidence intervals. And the
confidence intervals wind up formost things, accomplishing this

(22:59):
idea of having two hypothesistests that are testing inward
from the left and inward fromthe right. So I want to think
about things from theperspective of a confidence
interval. First, let's thinkabout the difference between two
means. Let's think about theLSAT scores. And imagine we have
two LSAT prep courses, and wedecide if they produce scores

(23:22):
with mean differences of lessthan 10 points, positive or
negative, we will go ahead andcall them practically
equivalent. Well, so if we goahead and do a study and create
a confidence interval for thedifference between those two
means that confidence intervalactually is informative from a
hypothesis testing standpoint.
Now when I think about thisexample, I'm going to think

(23:43):
about it in a really weird way.
You know, I used to take Sydneyto dolphin show. I don't even
think dolphin shows existanymore. Like at our Baltimore
aquarium, which is a really niceaquarium. There's no more
dolphin show. I sticks Sydney tothe dolphin shows. I couldn't
take the boys to the dolphinshow. So whether we're thinking
about a dolphin show, or this isso dated a Gallagher comedy

(24:03):
show, do you remember Gallagherat all? Oh, yeah,

Patrick (24:05):
he smashed the watermelons. Exactly. helped me
to despatch this melodyIt's a little known fact is
early in his career, he actuallysmashed dolphins But Peter got
to water. Okay.

Greg (24:30):
That makes what I'm about to say that much more
disturbing. Imagine you imagineyou and I are sitting in the
front row of whether it's adolphin show or a Gallagher show
or an early Gallagher show.
Which is the best of bothworlds.

Patrick (24:50):
You did not want to sit in the front row for his early
shows.

Greg (24:55):
You and I are in the splash zone or at least
potentially in the splash zoneand I am going Have you seated
at a seat that is marked withzero, and that zero represents
no population difference betweenthese two SATs prep courses on
average, and I am going to gosit out at 10 points. And so we
have the dolphin show, theGallagher show or the

(25:18):
combination thereof. And in theend, there's a certain splash
that occurs as a result of this.
This is disturbing to thinkabout. The question is, how big
is the splash? Right? Does thesplash include zero? Does the
splash include 10 points? Doesit include one both neither. And
those lead us to differentpotential conclusions, just like

(25:38):
a confidence interval wouldwrite so we build a confidence
interval around the differencebetween two means, and it might
include zero, it might include10 points, it might include one
might include neither, there arefour different possibilities for
what we're talking about. Thefirst one is, let's imagine you
and I both get wet and wet.

(25:58):
Let's just keep it at water orwatermelon.

Patrick (26:01):
I caught the blowhole.

Greg (26:04):
I'm telling you, the art for this episode is just
creating itself. Alright, soimagine that you and I are both
in the splash zone. Imagine thatthe competence interval captures
both zero and 10 pointdifference. All right. So if
that's the case, essentially, wecan't reject zero as a
possibility using traditionalnull hypothesis significance

(26:24):
testing. But we also can'treject 10 points. So this is an
example that is not conclusivereally, in any way. We can't say
anything about practicalequivalence, we can't say
anything about difference. Sowe're kind of stuck. So now
imagine a different scenariowhere only you are in the splash
zone, the Zero gets wet. Thepoint that says there's no

(26:45):
difference between the twopopulation means, but I the 10
point difference am outside thesplash zone, I am outside the
competence interval, that'sequivalent to a regular
traditional null hypothesissignificance test not being
statistically significantbecause it contains zero. But
the equivalence test the test ofwhether or not it differs from
10 points, that is statisticallysignificant. So in that case,

(27:09):
what we can say is that we haverejected the idea of it being 10
points in favor of it beingsomething smaller, we have
determined no practicallyimportant difference between the
two. So let's flip that now. Andimagine that you didn't get wet,
you're not in the splash zone,but I'm in the splash zone. So
zero is outside the competenceinterval. But 10 points is in

(27:32):
the confidence interval. Thatmeans that a traditional null
hypothesis significance test isstatistically significant. And
it rejects zero, but hypothesistest does not reject 10 points.
So in that case, we would saysomething like, well, there is a
difference. But we can't say ifit's practically important,
right? And I have to be verycareful in my language, because

(27:53):
I didn't establish that it isn'tpractically important. I just
don't have enough evidence to beable to say that it is trivial
or not trivial. So in that case,I wasn't able to establish
anything or make any commentsspecifically about practical
equivalence. Exactly.
equivalence, yes, but notpractical equivalence. And then
the final case is when thesplashdown doesn't get either of
us wet, do you as zero aresitting outside the splash zone

(28:16):
I at 10 points. I'm sittingoutside the splashdown. So it's
a little splash that camebetween zero and 10 points. And
what that means is that atraditional null hypothesis
significance test rejected theidea of there being no
difference. So we believe thatthere is a difference. But we
also believe because theconfidence interval didn't
include 10 points, that it isnot a practically important

(28:37):
difference. So yeah, we haveevidence that there's a
difference. But we also havestatistical evidence that it is
not a practically importantdifference. And so from a
practical standpoint, we mightbe able to consider these two
essay T prep courses equivalent.
So it's a very easy way oftaking confidence intervals to
essentially be conductingdifferent types of hypothesis

(29:00):
tests, and then being able toreach conclusions about not
literally equality, butpractical equivalence.

Patrick (29:06):
And this is such great fun to think about, because one
is we're expanding the usualnull hypothesis testing in ways
that we sometimes don't thinkabout, or in fairness to folks
out there are not taught about,but also Greg has been very
careful in his language aboutwhat is meaningful, and this
becomes an inherentlysubjective, theoretically

(29:30):
motivated determination. Andthis gets really interesting
really fast. And it does make methink of Dilbert. So I'm a big
Dilbert fan. And I have thiscartoon on my door for a while
he goes bungee jumping, and theguy is tying them up. And the
guy says, How much do you weighand Dilbert says, Why do you
need to know? And he said, well,it determines how much tension I

(29:53):
put in and Dilbert says I weigh700 pounds but I love This topic
for these reasons, it's movingbeyond the Is it zero? Yes and
no. But then it moves us into,well, how big a difference is
big enough? And how are youjustifying that? Because you

(30:14):
might say a 10 point differenceis worth the $1,000 LSAT prep
course. And I might say, I want30 point difference. Yeah. Okay.
Well, both of those are equallydefensible. Yeah, exactly.

Greg (30:27):
One of the things I really like about this is that it puts
the onus on the researcher todefine what it means for things
to be equivalent. If that seemslike too onerous a task, it's
exactly what we asked you to dowhen you're doing a power
analysis for a study to decide,what is that minimum detectable
effect that you actually careabout, we're doing the same
thing, it's just that we'redoing it for a purpose going the

(30:49):
other way, right? Looking atbeing able to statistically
establish something that isunder that rather than over that
one, although we don't have to,we can use confidence intervals
to try to get out that.

Patrick (31:00):
And I have not always been a big fan of confidence
intervals. And the reason is,many people treat them as if
there's something unique andnovel ly different than a
critical ratio. So you have apoint estimate a standard error,
that gives you a critical ratio,and you look it up in a table
and get a P value. And peoplesay no, no, no, or you're a

(31:20):
horrible person, you aresingularly responsible for
everything that's bad in thesocial sciences. And instead of
dividing by the standard error,we should do plus or minus two
times the standard error andthen see if it contains zero.
And I grouse like a grumpy oldman, because of course, that is

(31:41):
the same thing. You can look atyour P value and accept or
reject your null hypothesis witha critical ratio. Or you can
compute a confidence interval,see if it contains zero, and
then accept or reject your nullhypothesis is exactly the same
thing when used that way. Now,what I'm fascinated with and you
actually need to throw me a boneand help me with this. Because

(32:03):
as you've been talking, I'mseeing well, there actually is
not one test. But there'sactually two tests. There's one
above and there's one below. Areyou in the dolphin Gallagher
splash zone? Right, I'm notgonna sleep well tonight, given
this whole visualization. ButI'm starting to wonder, does our

(32:24):
standard confidence intervalkinda tell us about both of
those at the same time? Or am Ithinking about that wrong?

Greg (32:30):
No, I like how you're thinking about it. When I talked
about the confidence interval, Iwas really just using it as a
shortcut way to try toaccomplish two hypothesis tests
at once. And even thoughhonestly, I talked about it in
my example, as informing usabout zero, right informing us
about that traditional nullhypothesis significance test,

(32:52):
the two tests that I'm actuallyreferring to are the one above
zero and the one below zero. Inthe LSAT example, we could
imagine a positive 10 pointdifference, or a negative 10
point difference. And really,the competence intervals purpose
in the context of equivalencetesting, is to conduct a one
sided significance test of thepositive 10 points in the

(33:13):
direction towards zero, and aone sided test of the negative
10 points in the direction ofzero. So the confidence
interval, whereas we're used toit serving some purpose, that is
not really different from a nullhypothesis test, here, it
actually stands to accomplishboth tests of the boundaries of
what we would call practicalequivalence. So in that sense, I

(33:35):
think it's actually reallyuseful as this proxy for those
two hypothesis tests. The trickis that the P value that we're
accustomed to is the p valueassociated with that typical
test of the null hypothesis ofzero. In this particular
example, we could imagine twoother P values, we could imagine
a p value associated with thetest of the upper boundary aimed

(33:58):
in toward zero, or we couldimagine a p value of the test of
that lower boundary of practicalequivalence, aimed upward
towards zero as well, I think inthe sense of the confidence
interval offering us somethingdifferent from null hypothesis
significance testing, notreally, but as a vehicle for
helping us to conductequivalence tests to conduct
multiple hypothesis testssimultaneously. I think it

(34:20):
actually is insightful. Andthat's how I was trying to use
it here. I don't know if thatmakes sense.

Patrick (34:24):
That makes a lot of sense. Thank you for that. Yeah.
Where do we go from here? How dowe incorporate this into our own
work?

Greg (34:30):
Well, first of all, you don't just incorporate it unless
you have a research question orsome other type of question that
is specifically aboutequivalence. So it might be the
case that you want to test someassumptions prior to doing some
other method, whether thatassumption is normality, or the
assumption is homogeneity ofvariance or homogeneity of
dispersion, like we test withboxes M test, rather than just

(34:53):
saying, Well, I didn't get asignificant departure from that.
So I guess that assumption holdswith regard to testing Assam
Questions, what we can and maybeshould do is figure out what the
boundaries are have normalenough or homogeneous enough,
and then conduct our test inthis flipped way to see whether
or not we have violated thatassumption. Of course, that

(35:14):
requires work on the front endto decide what departures are
troublesome, whether it's fromnormality, or homogeneity of
variance. But that's our job. Sothat's when the tests have to do
with testing the assumptionsthat are associated with other
things. But like I said, at thebeginning, their actual research
questions about equivalence? Isthis drug good enough? Is this

(35:36):
teaching method good enough, isgroup therapy as effective or at
least close enough to aseffective as individual therapy,
boy would want to know thatbecause group therapy could be a
heck of a lot cheaper? So thereare a lot of research questions
that are about equivalence. Andthen it just comes down to the
researcher defining in thecontext of whatever the field

(35:56):
is, what stands for equivalence,but you and I know how to embed
this in the context of darn nearanything, right? We could talk
about whether this structuralpath is zero, or at least close
enough, not saying oh, it's notsignificant, I guess it's zero.
But is it practically zero? Oryou and I within a structural

(36:17):
model could say is thisvariables value and
understanding why practicallyequivalent to this other
predictors value andunderstanding why that's
something you and I know how todo, because we can encode the
difference between those twothings. Within a structural
model. We talked about doingthis in our Lego episode, just a
few episodes ago, you can codedarn near anything you want as a

(36:37):
parameter. And then once youhave done that, you are able to
talk about it in terms ofequivalence, we could do it in
terms of invariance testing,even if we don't believe that a
set of loadings are exactly thesame across two populations, is
there a way that we couldquantify close enough and there
are ways that we can quantifyclose enough, we could quantify

(37:00):
it in terms of these loadingsare no more than plus or minus
10% of those loadings, that'ssomething we actually could do.
These are things where we can reengineer our hypothesis tests to
be able to say, All right, Iwant to see whether or not I can
statistically fall within thatclose enough zone, that
practically equivalent zone. Andso it has widespread

(37:22):
applications. And then we canpower for those tests, just like
we power for other things. So Ithink this is a nice complement
to the types of tests that wealready do very commonly.

Patrick (37:32):
And we'll put up show notes on this for some of these
readings. James Rogers, KennethHoward, John Vesey, using
significance tests to evaluateequivalence between two
experimental groups in 93. Sothis is 30 years old at this
point. And I found this to be areally nice overview of a lot of

(37:56):
the things that you've helpedwalk us through of what do we
mean by exact what if it's notexact, what is big enough to be
big, and I would recommendlooking at that, but really is
just thinking about things alittle bit differently. I'm
looking at the paper, and theyhave a wonderful figure in here,
that corresponds to what youwere describing about how you

(38:19):
could have a one tailed test onthe lower part. And you can have
a one tailed test on the upperpart. And he shows two
distributions, and then combinesthem into the single
distribution. This is reallycool stuff.

Greg (38:33):
I totally agree. So that's equivalence tests. In a
nutshell, just taking your goodold hypothesis testing skills
and turning them in Word onquestions of equivalents. It
doesn't use any new skills, butit helps you to look at things a
little bit differently, and planto be able to answer those kinds
of questions. So it's very, verycool stuff.

Patrick (38:50):
And I think it's really fun to juxtapose equality with
equivalence, because those arenot the same. And in the spirit
of my middle school Englishteacher, mom a couple of times
during this episode, I have saidprecisely equal that is sloppy

(39:10):
thinking. It is precisely equalis in memory of mom. They are
equal. Thanks, everybody.

Greg (39:19):
Thanks, Mrs. C. Take care, everybody. Take care. Bye bye.
Thanks so much for joining us.
Don't forget to tell yourfriends to subscribe to us on
Apple podcasts, Spotify, orwherever they go for 100%
Dolphin friendly content. Youcan also follow us on Twitter
where we are at quantity pod andvisit our website quantity
pod.org where you can leave us amessage find organized playlists

(39:41):
and show notes. Listen to pastepisodes and other fun stuff.
And finally, you can get coolquantity merch like shirts,
mugs, stickers and spiralnotebooks from Red bubble.com.
We're all proceeds from nonbootleg authorized merch go to
donorschoose.org to help supportlow income schools. You've been
listening to quantitative thepodcast where equivalent isn't
equal and equal is only equal ifit's precisely equal.

(40:08):
To close today's episode, ratherthan having our usual sponsors,
I would like to offer a limerickin honor of Mrs. Curran. There
once was a mother named Doloresof whose praises We could surely
sing a chorus and even withoutmeeting face to face, she has
left us all in a better place,giving us Patrick, who will
continue to bore us, Mrs. C. I'msure this Limerick would have

(40:30):
been better if you'd been theone to edit it. Cheers. And as I
say in Gaelic slung chair
Advertise With Us

Popular Podcasts

Dateline NBC

Dateline NBC

Current and classic episodes, featuring compelling true-crime mysteries, powerful documentaries and in-depth investigations. Follow now to get the latest episodes of Dateline NBC completely free, or subscribe to Dateline Premium for ad-free listening and exclusive bonus content: DatelinePremium.com

24/7 News: The Latest

24/7 News: The Latest

The latest news in 4 minutes updated every hour, every day.

Therapy Gecko

Therapy Gecko

An unlicensed lizard psychologist travels the universe talking to strangers about absolutely nothing. TO CALL THE GECKO: follow me on https://www.twitch.tv/lyleforever to get a notification for when I am taking calls. I am usually live Mondays, Wednesdays, and Fridays but lately a lot of other times too. I am a gecko.

Music, radio and podcasts, all free. Listen online or download the iHeart App.

Connect

© 2025 iHeartMedia, Inc.