All Episodes

January 10, 2023 54 mins

In this week's episode Greg and Patrick revisit a topic they addressed in their 2nd-ever episode: statistical power. Here they continue their discussion by attempting to clarify the power of what, and they explore ways of obtaining meaningful power estimates using the structural equation modeling framework. Along the way they also discuss tearing arms off, German dentists, booby prizes, Dr. Strangelove, making it look like an accident, shrug emojis, the whale petting machine, baseball and war, where's Waldo, whale holes, the big R-squared, throwing reviewers against the wall, DIY power, in fairness to me, eggplants, and screw you guys, I'm going home. 

Stay in contact with Quantitude!

Mark as Played
Transcript

Episode Transcript

Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Patrick (00:05):
Welcome, my name is Patrick Curran and along with my
dark Kherson Campbell lightningfriend Greg Hancock, we make up
quantity. We're a podcastdedicated to all things
quantitative ranging from theirrelevant to the completely
irrelevant. In this week'sepisode, Greg and I revisit a
topic we addressed in our secondever episode statistical power.
Here we continue our discussionby attempting to clarify the

(00:27):
power of what and we exploreways of obtaining meaningful
power estimates using thestructural equation modeling
framework. Along the way, wealso discuss tearing arms off
German dentists, booby prizes,Dr. Strangelove, making it look
like an accident. Shrug emojis,the whale petting machine,

(00:48):
baseball and war. Where's Waldo?
Whale holes, the big R squared,throwing reviewers against the
wall. DIY power. In fairness tome, egg plants. And screw you
guys. I'm going home. We hopeyou enjoy this week's episode.

Greg (01:08):
I'm looking at you to see how you're doing. But I can't
really tell you're very stoic.

Patrick (01:12):
Oh, dude, I am falling apart at the seams. What's going
on? You know, from when I cameup to visit you over the summer,
were you surprised to me bysaying I'm going to take you to
this place, you're going to loveit. And I was foolish enough to
say where we going and you saidyou'll see when we get in the
parking lot. After we drive. 45minutes, we drive 45 minutes. We

(01:36):
pull into the parking lot. Andit's one of those parachute
places where they blow a fan andyou ride on the air. I have two
immediate thoughts. One is thiswas so thoughtful of you. And
the second is you're going totear my arm out of its socket
because I have a severed rotatorcuff tendon which I had failed

(01:57):
to tell you about. I am finallysucking it up and then a couple
of weeks I'm going to havereconstructive surgery on the
shoulder as a booby prize acouple of days after I scheduled
the surgery, I had to have anemergency root canal.

Greg (02:13):
Oh boy,

Patrick (02:14):
I don't want you to picture something. All right,
you and I are producers on amovie. We say we've got a couple
of minutes scene, but I need youto send a brilliant endodontic
surgeon who has a German accentand seems to way overly enjoy
her job. And central castingsays, Okay, I got it. This

(02:39):
wouldn't be the person who Iwent to about 45 minutes into
it. She starts talking toherself. It's all done through a
microscope. So she has this bigmicroscope that goes off a
mirror as she's working on mytooth. I can't do the German
accent. Maybe I can say somelines. And you can say, okay, so
I want you to say you're theperfect patient, you don't move

(03:02):
at all, you just leave me alonewith my microscope,

Greg (03:05):
you as a perfect patient.
You just leave me alone with mymicroscope to do this work that
I'm doing on your tools rightnow.

Patrick (03:15):
That was pretty much it. Okay, I have never had so
much confidence in a health careprovider. She was amazing. But
oh my god, she just scared theliving crap out of me. And at
one point, she stops now I'mgoing to try Okay, ready, I'm
gonna buy there she stops andshe says too complex. You must
come back. She throws in atemporary filling. And I have to

(03:38):
go back next week. That was myOh, you're about to have
shoulder surgery. Why we havesomething for you. Wait, don't
answer yet. Dr. Strangelove, isgoing to do a dental procedure
on you.

Insert (03:54):
The whole point of the Doomsday machine is lost if you
keep it a secret, right? Ididn't tell them.

Patrick (04:01):
So yes, thank you for asking.

Greg (04:04):
I mean, at some point, we're just going to have to have
you put down

Patrick (04:06):
Almost verbatim that's what one of my kids said. And
the other one said make it looklike an accident or else we
won't get the insurance money.
But what it did make me thinkabout and it's what we're going
to segue into today'sconversation, you can start
something and then at somepoint, you say, This is too
complex, and you just stop. Wedid that with our power episode.

(04:29):
Ooh, taken out of context, Imight have advocated for using
emojis for power analysis, oneof which was a poop emoji. And
somehow an eggplant came intoplay as well. Came off the rails
a little bit. But we in someways were misconstrued because I

(04:50):
have heard multiple people say,Well, I know Greg and Patrick
are anti power. Yeah, right.
Right. And that is patentlyuntrue. Uh, I am not anti power,
you are not anti power. But whatthe punch line of that episode
was that came up multiple times,because we're really bad at

(05:11):
editing. And if we'd redid itnow, it only would have come up
once is the power of what Yes.
And I expressed frustration inbeing part of grant reviews
where there is a multiple groupbivariate latent curve model
with structured residuals. Andthe reviewer says, The applicant
must demonstrate adequate power.

(05:33):
And I don't even know what thatmeans, because it is too complex
to complex.

Greg (05:39):
Let me see if I can pull all this together, you are
likening power analysis to aroot canal and that we didn't
finish the job. The first timearound, we got into power
analysis, but was like episodetwo.

Patrick (05:52):
I know I just I have repressed the first two years.

Greg (05:57):
I would characterize that as what I might call a curse the
darkness episode, we weregrousing about power analysis,
power analysis. First of all ishard. It is complex. And to try
to distill it down into onething is just foolish. I think
we talked about the complexitiesof it a lot. But we didn't light
the candle, because at the endof the day, that root canal

(06:17):
needs to be finished or you'restill going to have a problem.
So I think what we could do topair with that maybe some
suggestions for actually how toconduct a power analysis, how to
think about it, because we haveto do it. And to be crystal
clear. You and I support doingpower analysis.

Patrick (06:34):
You mean there's more to academia than cursing the
darkness? Dude, I've been in thegame for like 30 years and
nobody has ever talked to meabout lighting a candle. What
kind of boat is that?

Greg (06:46):
Let's put on some acoustic guitar Patrick and light some
candles.
And that probably is going torequire us to do a speed recap
of power analysis. In anutshell,

Patrick (07:02):
I think that this is going to involve the whale, pedi
machine.

Greg (07:07):
If anybody gets that reference, you've either taken a
class with Patrick and God helpyou. Or you listen to one of our
very, very early episodes. Andagain, I repeat that, yeah,
either way, you're kind ofscrewed. All right, tell us
about your whale.

Patrick (07:21):
Every example I ever use either uses baseball or war.
It turns out that those workreally well as examples. Unless
you have no interest inbaseball, or war. The backstory
briefly is I talked about typeone and type two error. And I
talked about an enemy submarineand you're in a submarine. And

(07:43):
you have to know whether theenemy sub is out there. You
don't want to miss it if it'sthere, because it puts you in
danger. But you don't want togive away your position by
responding in a way to somethingthat's not there. Some students
very good naturedly said Isthere any example that you can
use that doesn't involve oneperson trying to kill another

(08:05):
person? I came up with the whalepetting machine, which is you
are underwater and you're notlooking for an enemy submarine.
But you're looking for a whale.
Everybody knows whales like tobe petted. You don't want to
miss a whale that goes bywithout petting it because it's
going to make it sad. Now in theprior conversation, you told us
what a sad whale was. So goahead, or was that me? I forgot
you did.

Greg (08:28):
You. Alright, let's hear what remind us Patrick. What a
sad whale sounds like thatsounded better before you needed
a root canal. Oh, you

Patrick (08:38):
want me to make the whales out? Actually, I was just
living my two. So you don't wantto miss a whale? That's really
there. But what's superimportant is you don't want to
extend the Petit machine if awhale is not there, because it
scares away the other whales.
That's power.

Greg (08:57):
Okay, it's crystal clear everybody to clarify that a
little bit. Do you have anyother language you could use?

Patrick (09:04):
Or Starburst? Or Starburst or whale man God now
we're gonna hear from Tove aboutthat one, aren't we? Spear
horned bears here.

Tove (09:14):
Hi, this Tove Larsen faculty member in applied
linguistics and quantitativeSwedish consultant. I am happy
to report that neither listenerin Sweden was offended. Right?
Hendrick nodded.

Patrick (09:25):
Okay, here's the deal, folks. This is going to be a
whirlwind tour through afrequentist perspective on no
hypothesis testing. Right. Levypay attention. All right, maybe
you'll learn something out ofit.

Roy Levy (09:37):
Hey, this is where we live in Arizona State
University. When you startthinking about statistical
power, a Bayesian perspectivecan really be

Patrick (09:44):
Nope. Null hypothesis testing is like Where's Waldo.
Okay, we have two conditions inthe population that exist.
Imagine that we have a treatmentgroup and a control group and
you put 1000 hours of bloodsweat and tears into your
dissertation. You do anintervention with kids in

(10:04):
schools to try to improvereading and you want to know,
was your reading interventionsuccessful? Did it improve
reading to a greater extent thanyou would expect the by chance
alone? The null hypothesis nowwe're going to super stressed
this in the population is thetwo population means are equal.

(10:24):
Okay? There is no difference inthe population. The alternative
hypothesis is we're just aroomful of cowards. And we say,
well, I'm going to die on thehill of the null hypothesis that
they're equal. But if the nulldoesn't hold, I'm going to say
the two population means are notequal. Ooh, yeah, very good.

(10:47):
See, so either they are equal,or they are not equal. So that's
the null hypothesis and thealternative. Now, that's what
exists in the population that webelieve to be there. But that we
don't have access to we want tomake a probabilistic inference
about that. So now think aboutthe decision that we're going to
make you do your 1000 hours ofblood, sweat and tears, you find

(11:12):
the treatment group had higherreading skills than the control
group did. So now you have tomake a decision? Are the group
means different from what youwould expect by chance alone? Or
are they not picture a littletwo by two contingency table?
The columns are the population,the no holds, or the Nolde does
not hold? And the rose is, whatdid you decide? Did you accept

(11:35):
the null? Or did you reject thenull? And that's the whale
petting machine? Is there awhale there? Is there not a
whale there? Did you extend thepetting machine? Or did you not
extend the petting machine? Doyou understand? Now, Greg, I
could not have been more clearwith the whale petting machine.
I'm not sure I needed all thisother crap.

Greg (11:56):
I'm sad having to listen to that explanation. But go
ahead. Yeah.

Patrick (12:00):
We actually have in that two by two table to correct
decisions. And we have twoincorrect decisions or errors.
And if there are two of them, Igot a hankering to call them a
type one error, or a type twoerror. That's where those terms
come from. All right, well, whatare the correct decisions? The

(12:21):
correct is you conclude there isnot an effect when there really
is not an effect, or youconclude there is an effect when
there really is an effect. Butwhat are those two errors, and
this is what we pay a lot ofattention to in classes and in
work is an error of the firstkind type one is what our
standard p value is focusedaround, which is what is the

(12:45):
probability you're going to saythere really is an effect when
there is not a false positive,you extend the whale petting
machine, and there's not a whalethere, and you scare all the
other whales away? Alright, whatdoes it mean for your
dissertation? What you concludedthe reading intervention was
successful, but it wasn't Yeah,type two error is you say that

(13:05):
there is not an effect whenthere really is an effect. And
that makes the whale sad,because there is a whale and it
wants to be petted, and you'renot going to pet it. And whales
are highly ruminative. And it'sgoing to go back to its little
whale hole that it lives in onthe ocean floor. And it's just
going to ruminate for the restof the day.

Greg (13:23):
Where did you go to school

Patrick (13:25):
Colorado?

Roy Levy (13:27):
Not a lot of whale holes,

Patrick (13:30):
what we focus a lot on is type one error, what is the
probability that you're going toreject the null if the null is
actually true? Alright, so yousay there's an effect when
there's not. What power is,though is what is the
probability we're going toreject the null if the null is
false? What that means is,what's the probability that

(13:53):
you're going to find an effect?
If an effect really exists?
That's power?

Greg (13:59):
Well, it's sort of power in the following way, right?
When you set up that dichotomyof the null hypothesis is true,
or the null hypothesis is false.
That all hypothesis being trueis one case. The null hypothesis
being false is many, many, manycases, it's all the cases where
mu one and mu two are not equalto each other. And when we talk

(14:19):
about power, it's not the nullhypothesis being false it is the
null hypothesis being false in aspecific way to a specific
degree that seems to get lostsometimes in the way we talk
about things. So power has to dowith the probability of
rejecting the null hypothesiswhen not only is it false, but
it is false to a very specific

Patrick (14:41):
degree power is easiest to think about and to learn and
to get our head around in thistwo group kind of scenario. We
have two population means arethey equal or not? We have two
sample means are theysufficiently different that we
would not attribute thosedifferences to chance alone? If
For the null hypothesis wastrue, we don't do the field a

(15:04):
service because what we do is weteach power in these hyper
contrived situations, you haveto group means you have a
correlation, you have a multipler squared. And I feel like
that's the framework that we getfor power. All right now all of
you as you're driving or mowingthe lawn or cooking dinner, or
whatever you're doing right now,think about the work that you're

(15:26):
doing right now, nothypothetically, not sometime in
the future that you're doingright now, how many of you are
doing your entire researchproject comparing two means?
Nobody? Nobody out there

Greg (15:36):
sit on a grant panel? When is the last time that you saw a
two sample T test? When is thelast time you even saw a
multiple regression? And theanswer is never. But those are
the things for which we are atbest trained to think about
power. But even if I take amultiple regression, it is
almost always the powerassociated with the omnibus R

(15:57):
squared associated with thewhole multiple regression model.
How many subjects do you need tobe able to have enough power to
get a statistically significantR squared to be able to proclaim
that there is some nonzeropopulation multiple correlation
coefficient? That's all fine andgood. But my research question
is not about the big R squared.
My research question is almostnever about the big R squared.

(16:19):
My research question is almostalways about the individual
predictors, right? To whatextent is this helping me to
understand why above and beyondthese other things that I'm
controlling for? And already,we're asking you to think about
power in things that transcendthe level that you're typically
formally trained to do? You and

Patrick (16:38):
I have talked ad nauseam about the structural
equation model about thegenerality of that about how a
lot of things that we do t testto know Vancouver regression,
CFA, all of these fit into thatframework, not everything, but a
whole lot of things. And itturns out, there is 30 freakin
years, maybe 40 years of work onpower analysis within the SEM,

(17:03):
but you don't see that a lot.
And I think what we should do ispivot into the year though
candle

Insert (17:10):
light a candle, Patrick?
Can the burpees light a candle?

Patrick (17:17):
Let's revisit some of these classic concepts. But
through the perspective of theSEM, because what I was grousing
about on that earlier episode isa grant reviewer says the
applicant did not adequatelydemonstrate sufficient power to
conduct the analyses proposedhere, and I throw him against

(17:38):
the wall, not the reviewer. Onetime, the judge told me those
records there. We can't talkabout that. The reviews I throw
against the wall. And I say thepower of what? Well, what we're
going to talk about within thisSEM framework is it's going to
involve Coors Light Green Day inmy garage, I know

Greg (18:01):
Yeah.

Patrick (18:10):
You're gonna go out in the garage, and you're gonna
build the damn model to get thedamn power effect that is
specific to your hypotheses. Sowe can reach inside these really
complicated models, and say,This is what the power is, we
are going to use the SEMframework as a calculator to get

(18:31):
the power estimate for exactlywhat we want, not what G power
gives you. Not what colons tablesays, which are brilliant,
totally. And those are hugelyadvantageous, unless you're
doing any of the work that weall are doing.

Greg (18:47):
Those are great for stuff you don't do. Yeah, yeah. All
right. So to draw on theanalogy, from regression, when
we talk about fit in regression,and how successful a particular
regression model is, as a whole,the R squared is often our go to
write a big R squared, we goyay, good for us. I don't know
which predictors are doing it.
But yeah, good for us. And asmall r squared, and we are sad.
And so the power for regressionis often treated as power of

(19:09):
this whole rather than power ofthe different parts inside.
Within structural equationmodeling. We also assess the
model as a whole, but we don'tuse an r squared, we use Fit
indices fit indices, like acomparative fit index, or a root
mean square error ofapproximation or Hakka model chi
square, right? There are many,many different ways of assessing

(19:29):
fit of a structural equationmodel. And there's a lot of
question about how one shoulduse those if one should use
those. Now, if I just asked aquestion about power at this
omnibus level, one way to thinkabout it is how many subjects
would I need to be able to saythis is a good model? Right?
That seems like a reasonablequestion because we all want

(19:51):
good models. But the problemwith that generally speaking,
the chi square is a funny indexin the sense that bigger values
of the chi square test to youthat your model is doing worse,
and smaller values of your chisquare tell you that your model
is doing better. So if someoneused a chi square as a
characterization of fit of amodel as a whole, and said, Now,
how can I use that in some sortof a priori power analysis, the

(20:12):
answer I would have at the endof the day is just get a really
small sample, right, because ifyou get a really small sample,
you will tend to get a reallysmall chi square. And so you
will tend to say you've got agreat model. And that should
bother you on some level,deeply. There's sort of a logic
problem that we have here withthe chi square. And instead,
what we do is we kind of flipthe logic from the chi square,

(20:34):
the index that is used in partof this process Most commonly,
and it absolutely doesn't haveto be the one that's used is the
root mean square error ofapproximation. The root mean
square error of approximationhas a known distribution. And
here's the logic of poweranalysis at the omnibus level
using the root mean square errorof approximation. Let's imagine
that you set for yourself somestandard that you consider to be

(20:57):
the barrier between good andevil, right, bad fit, good fit
truth.

Insert (21:05):
What it means is Old Testament fire and brimstone
coming down from the sky, humansacrifice dogs and cats living
together enough, I get thepoint.

Greg (21:14):
What do you use? Patrick?
What do you think of is thecutoff for RMSEA. As

Patrick (21:17):
I wrote a paper that shows there is no universal cut
point, the impact on the fieldwas zero. So I'm just going to
cross my arms and sit heregrumbling,

Greg (21:29):
someone might say, oh, we'd like to use point o five as
the cutoff for the root meansquare error of approximation,
or we'd like to use point o six.
But the thing is, you have topick some value in this world.
And what Patrick said aboutmaybe there isn't a good value,
I think, is part of thechallenge with this. But imagine
you did lock in, let's say youlocked in and said point o five.
So what you do is you imaginewhat your true models RMSEA is,

(21:51):
how on earth would I know thattrue? RMSEA for my model, the
first thing I say is exactly.
But then the second thing I sayis that's not really different
from other power analyses, whereyou have to suppose something
about what's true in thepopulation. So we're not really
being held to a differentstandard here. It's just a
different numerical description.
And so imagine that you say, Ibelieve that my model is perfect

(22:14):
in the population, I say, okay,great. It's not but okay, good
for you. If you were to takesamples of size, let's say 50.
And do studies over and over andover, you will get a confidence
interval around your samples,root mean square error of
approximation. And when you havea smaller sample size, that
confidence interval would tendto be much wider. And when you
have a bigger sample size, thatconfidence interval would come

(22:35):
in and be tighter. Theoperational question is, what
sample size would you need? Sothat let's say 80% of the time
your confidence interval isentirely below, wherever you
have set that threshold, like anRMSEA of point oh, five. So if
you say I want 80% power? Theanswer is what sample size would
you need to shrink your RMSEAconfidence interval down so that

(22:57):
80% of the time it is under thatthreshold? And essentially, you
are rejecting badness of fit infavor of good fit. That's the
logic of it. There's therealistic part where you say,
but whose model really hasperfect fit whose model has an
RMSEA of zero and truth? And theanswer is probably nobody. And
so I say well inject a littlebit of badness. But imagine that
you're RMSEA. And truth is pointO two little bit of badness if

(23:20):
it kind of realistic, maybe andnow the question is what sample
size would you need to shrinkyour confidence intervals down
so that 80% of the time they arebelow whatever threshold you've
set for yourself, like point ohfive? Well, it's going to take a
bigger sample size now. Becausetruth isn't sitting down at zero
truth is sitting up at point Otwo. And so oh my gosh, I'm
gonna need a bigger sample sizeso that I'm under that point oh,

(23:43):
five 80% of the time. That'swhat the point oh two, what if
your model fit is point oh,four, right. And RMSEA of point
oh four, in truth characterizesthe fit of your model in the
population. Which by the way, Imight be pretty darn happy with
right, we can question whetheror not the RMSEA is the be all
end all characterization of fit.
But if someone told me, frankly,at an RMSE of point oh four, I'd

(24:04):
be pretty happy. Well imaginethat's the truth in the
population. What sample size areyou going to need so that your
confidence intervals around yoursamples? RMSEA are below that
threshold of point o five 80% ofthe time, and the answer is
probably you're going to need apretty darn big sample size.
Because your truth of point offour is standing right next to

(24:27):
that threshold of point oh five,you're gonna need a big old
sample size to get a tightconfidence interval around that
RMSEA. This is the logic ofpower analysis or sample size
determination when you're usinga fit index like the RMSEA. But
there are a lot of challengesassociated with this.

Patrick (24:43):
We'll put this on the show notes but McCallum Brown
and Sugawara have a veryimportant paper in sight methods
on all of this in 1997. Whatthey did and what Greg did here
in the telling of the story,we're actually turning two knobs
at once in what we Think about.
The first one is the exact fitversus close fit, right. So the

(25:05):
chi square in our usual way isthe null hypothesis is sigma
equals sigma theta. That is thepopulation covariance matrix is
equal to the model impliedcovariance matrix. And that is
the equivalent of mu one equalsmu two. Now, as Greg described,
and I won't reiterate, we cometo a different conclusion. On

(25:26):
that note, the t test is sayingyour intervention was not
effective. And you have to bringempirical data to demonstrate
that it was the Nolan the SEM issaying your model is correctly
specified. And you have to bringempirical information to
demonstrate that it is not KarlPopper is over there, mixing

(25:48):
drinks at this point shaking hisjaw, let's head out that little
intellectual Judo that we did.
But what we're doing is we'rethinking about one, we've got
this omnibus test of power forthe model as a whole. And to do
we do close fit or exact fit.
All right, and and what McCallumargues and just as Greg said,

(26:09):
rarely, if ever, does a modelfit exactly in the population.
And so we're going to be, whatare we going to give, we're
gonna give a little bit of ohtwo, we're gonna give a little
bit of both three. But let's goback to the R squared and
regression situation, where wesaid, oh, you have some power to
detect a given r squared. Butthat's for the model as a whole,
you could have all of yourpredictors be significant, you

(26:31):
could have one predictor besignificant. Indeed, you could
align things in a way where noneof your predictors are
significant, but you still havean r square to point to to Yep.
And you kind of sigh and say,well, crap, we are evaluating
the SEM as a holistic entity. Ifwe have complete data, we have a
sample covariance matrix, wehave a model implied covariance

(26:53):
matrix that the model says thisholds given the structure you
gave me, we subtract those two,and we get a matrix of
residuals. What we're doing inthe test of exact fit is, are
those residuals taken jointlylarger than we would expect by
chance alone under the nullhypothesis? Well, we could have

(27:13):
one bloody honkin, residual, andall the rest are near zero, or
we could just have drips anddrabs, right? The Irish
loadings. Remember Oh, Curran,O'Shaughnessy? Oh, 703, right.
If you're doing a CFA, we don'tbelieve why to only loads on 801
and done more than any otherfactors. It may have a cross

(27:36):
loading of O one or O two orthree, but we're fixing those to
be zero. Well, that omnibus testis just like the r squared, all
we can say is somewhere withinthe confines of the model, we
have a 78% chance of detectingand misspecification. I look at
that and think, okay, in oneway, super exciting. In another

(27:59):
way, that's not what we'rereally interested in. We're
interested in a treatmenteffect. We're interested in a
non linearity, we're interestedin the covariance between two
growth factors, those omnibustests don't give us that and the
other poke in the eye is is todo the entire RMSEA omnibus
test, as you already alluded to,is we have to pick a value that

(28:21):
we believe is indicative ofclose fit and RMSEA votes, you
are oh three or Oh 405, there'sbeen a fair amount of work that
shows we can't do that, becauseit varies over degrees of
freedom and model complexity anddeterminacy and all of these
things. So enter stage left oneof my heroes in the field,
Albert Centura. I think Albertshutaura is one of the most

(28:43):
important contributors to SEM.
If you use robust maximumlikelihood, you can thank
Albert, if you use adjusted chisquared, a robust chi square you
can think Albert, he has madeunbelievable contributions to
the field. One is he said, Waita minute. under the null

(29:05):
hypothesis, our test statisticfollows a central chi square
under the alternative, which hassome misspecification. In it, it
follows a noncentral chi squarethat Nan centrality parameter
represents the degree of misfitfrom that misspecification
damned if I can't turn that intoa power estimate. And that's the

(29:26):
Satorious Soros method,

Greg (29:28):
right? If we think about pivoting, then from power, and
sample size for the model as awhole to power and sample size
for the parameters, where ourhypotheses are actually residing
within a model, he laid thegroundwork for all of that

Patrick (29:44):
it's very similar to what we do when we compare
nested models within the sample.
But what we're doing using theSatorious RS method is we're
doing that at the level of thepopulation. Remember that in a
likelihood ratio test for nestedmodels, we have Model A ie that
has some parameterization. Wehave Model B, where we can
impose restrictions on Model Ato get to model B. So they're

(30:06):
nested, and they're each goingto have their own chi square, we
take the difference betweenthem, and we can test is there a
significant difference in modelfit? Well, what Albert did is
said, look, go up to thewhiteboard and draw out whatever
model you have. Now we'restarting to talk about the power
of what draw a growth model,draw a path model, draw CFA go

(30:27):
nuts, man, draw whatever youwant on that board, declare that
to be your population model.
Now, this is a little tricky,because you're gonna have to
give me every parameter value.
Yeah, your loadings are pointseven, your regression
coefficient is point five, yourcorrelations are point three.
Right now, this isn't nodifferent than what we do in

(30:47):
other kinds of things. When wesay I have an effect size of
point two, you're implicitlysaying these are the values in
the population. It's just we'reup at the whiteboard now. And
now it's power analysis onspring break, you got to write
in communality estimates, factorloadings, all of these things,
you're gonna get a model impliedcovariance matrix and mean
vector that correspond to thatmodel. Now, for your sample

(31:09):
planning, go up and say, I'mgoing to remove these three
parameters. And I'm going to getthe covariance matrix and mean
vector that that model implies.
Now keep in mind, we're still atthe level of the population, we
don't have any sample data.
That's right. But we have thecovariance matrix of your
population model, we have thecovariance matrix of A

(31:31):
misspecified model that'sdefined by a very specific Miss
specification, and he showed howyou can use the difference in
those chi squares to calculatewhat is the probability you
would detect that Missspecification, given the other
characteristics of your model,it's

Greg (31:52):
freaking brilliant, is what it is, you can do some
version of it mathematicallywithout running models, you can
do it with software by runningmodels, and just setting
parameter values, it translatesperfectly into the non
centrality stuff that Patrickand I talked about previously.
Because if you specify a properpopulation model, and then you
take out a parameter and run theanalysis on that population, you

(32:17):
will get an estimate of the noncentrality associated with that
misspecification given aparticular sample size, and
given the context of the rest ofthe parameters of the model that
translates just boom, right intopower into sample size stuff.
And he laid the groundwork foreverything really,

Patrick (32:33):
because it's embedded within the SEM. And all of our
traditional friends from highschool are members of the SEM,
right, we can use this approachfor a t test or an ANOVA. And
uncover a multiple regression ora multiple regression with an
interaction. If you want to knowwhat is the power to detect a
three way interaction in amultiple regression, you can do

(32:56):
it using this framework, but wecan then generalize it to all
the other things that we can do.
And so way back in the day,moutain, and I have a paper back
in 97, in Psych methods, wherewe use this method in a multiple
group latent growth curve modelto detect a treatment effect.
And I wrote a little do loop,I'm not talking like some

(33:21):
massive things, I wrote a littledoom loop in SAS, I just went
through sample size one at atime, up, up, up, up, up up from
one to 1000. And I plotted outpower curves, you can build
power curves across what eversamples you want, whatever
effect sizes you want. And thisis the Coors Light Green Day

(33:42):
garage. You can't say, well,Cohen, ADA doesn't have a table
for a bivariate growth model,where my hypothesis is about the
correlation between the twogrowth factors go into the
garage, write a couple of doloops, I'm not exaggerating,
because we're not doing MonteCarlo simulation. This is all
analytic. I just referenced anoncentral chi square given a

(34:05):
degree of freedom sample sizeand non centrality parameter and
plotted the function, and that'syour power. So when the grant
reviewer says, Well, you need togive power for your bivariate
growth model. You say here isthe power that I would achieve
with this sample size to detecta covariance between the growth

(34:29):
factors this large or larger,there you go, man, there's your
power.

Greg (34:35):
I remember reading your 1997 paper pretty close to 1997.
Honestly, I remember alsothinking Dang, that's kind of
hard to do. You know, if I werean applied person, am I
literally going to write a doloop to be able to grind through
this? I certainly could. And Iappreciate you know, when you
say yeah, it only took a fewlines to be able to do but for

(34:56):
some people that might still bea big ask a lot of the Torah And
Soros things can be done withthe loops that you talked about
where you try different samplesizes, it can be done using the
mathematics of power analysis orand this, I think is where
things have started to gotoward, we can hand everything
over to simulation techniques.
When you do the loops that youare talking about, or when you

(35:18):
do the stuff that's the Torahand Soros have come up with the
mathematics for, you actuallyhave to specify the entire
variance covariance matrix. Andif I do that, what am I assuming
I'm assuming a lot about theactual data that give rise to
that I'm assuming that all thecases or there might be making
some distributional assumptions,all of that it's a bit of a

(35:39):
leap, maybe to think that you'reholding the population
covariance matrix in your hand,even if we grant that you got
all the relations in the model,right? There are still issues
about the data that we're kindof sweeping away,

Patrick (35:51):
you were exactly right.
This is an asymptotic methodbased on a mean vector and a
covariance matrix. That meansyou have an infinite sample
size, you have continuity,independence, normality,
linearity, complete data, stopme when I've hit something that
might be in your own study. Sowhat we do is we say, well, wait
a minute, all of us either havebeen in a class or teaching a

(36:13):
class where we talk about if youwere to sample and fit your
model an infinite number oftimes, and gather all the
parameters together, we're gonnasay, oh, wait a minute, I
actually could do that and seewhat proportion I would say was
significant. And that's power.

Greg (36:33):
So let's give an example.
What I want to make clear fromthe start is that we could pick
any kind of model as an examplefrom something that is as simple
as the model equivalent of a twosample T test to a multiple
regression with three predictorsto a latent growth curve model
with structured residuals. Sowhat we're about to talk about
generalizes, to all of thosekinds of things. But let's still

(36:55):
pick a pretty simple example,let me pick an example where I
have a latent variable pathmodel. And in my latent variable
path model, I have threefactors, I am just going to call
them for simplicity, F one, Ftwo, and F three, the model that
I'm interested in is a latentpath model where there's a path
from f1 to f2, and a path from Ftwo to F three. And then I also

(37:17):
have a direct path from f1 tof3. Right? So simple structural
model. And then I haveindicators for all of those
factors, we could just say threeindicators for each of those.
And imagine that this issomeone's model, what they want
to do. And when I say that, whatI mean is what they want to be
able to do sample size planningfor so then the question is,

(37:38):
what are the steps that someonewould go through to be able to
do sample size planning for amodel of this type. So now
someone has come into my office,and they have said, this is the
study, I am planning, this iswhat it looks like, I want to
know about the indirect effectof f1 on f3. And maybe the
direct effect that it has aswell. So this is where my theory
is I say, great. So we need toplan your sample size, here are

(38:00):
the things that we need to do tostart the first thing now that
we have it drawn up on thewhiteboard, is I need you to go
up there, I need you to tell mewhat you think the numerical
values are of those structuralrelations that you're going to
have to detect

Patrick (38:17):
things get a little uncomfortable, but no more so
than with Satorious Rs,

Greg (38:22):
exactly. Right. So no matter whether you're doing the
mathematical version of this,that Satori and Sarris laid out,
or, frankly, you're doing thepower analysis for an ANOVA. And
I asked you to specify theeffect sizes for the differences
associated with the means. Powerof what you have to tell me the
what, in order for us to talkabout power. So you say, Okay,

(38:42):
well, what metrics should Iwrite it in on the board? So
let's just use a standardizedmetric and say, Okay, I think
this path here is a point five,I think this path here is a
point three, and I think thispath over here is a point six,
is that okay? And you might haveanswered that based on your
theoretical knowledge, you mighthave answered that based on sort
of the smallest value that youwould be interested in being

(39:02):
able to detect and even that isthe conversation that we have.
So are we ready to do this?
Well, there's some other thingsthat we're going to have to do,
because there's a measurementmodel associated with each of
these factors. And so now I'mgoing to ask you, what are the
loadings associated with each ofthese factors? You're like, what
I have to know the loadings ofthese,

Patrick (39:20):
you got to know how many items to put on the dam for
how

Greg (39:23):
many items do you put on there? That's right, three,
five, how many you got? Andhonestly, sometimes people don't
even know that when they enterinto the conversation. But let's
assume they've nailed it downand we'll go with the three
items per factor only becausethey've said that's what we
have, as past research told youwhat these loadings ought to be.
Is there reliability informationon any of those individual
indicators that might give ussome sense of how strongly it

(39:45):
would load on a common factor?
Okay, then by God, we might usethat or is there some value that
you think is sort of on theweaker side, but you're being
cautious and Okay, great. Let'stry these things. At least let's
put those in as placeholders fornow but I Have to get numbers on
those loadings, do we have allthe moving parts? Patrick,

Patrick (40:03):
we've still done nothing different than we did
with Satori. Oh, yeah, we arebuilding the model we believe to
exist in the population. Sonothing is different yet, you're
still at the whiteboard, you'vestill drawn your model, you've
still put in your hypothesizedvalues. So this is just a repeat
of what we've done so far. Butnow here's where we're going to

(40:24):
take a right

Greg (40:25):
turn. And this is where it gets cool, right? I can put this
into m plus, I can use a varietyof our packages that you can do
this, once you communicate yourmodel, this will now become a
simulation. And what I mean bythat is that once you specify
the nature of the population, ifI say something to the program,
like let's take samples of size100, what the software can do

(40:46):
with some assumptions is imaginereaching into the population
drawing out a sample of size100, running your model,
computing estimates for all ofthe parameters, the loadings,
everything, but you areparticularly interested in
tracking the behavior of thosethree key structural parameters
from F one, F two to F three, wedo that for the first sample of

(41:06):
size 100. The second sample, thethird sample, we do this some
reasonably large number oftimes, I don't know, let's just
say 10,000 times. And what Iwill get out of that will be an
empirical sampling distributionof each of those structural
parameters that I care about, aswell as all of the other
parameters that we have. So nowthe statistical question is, how

(41:26):
many of those would bestatistically significant? If I
take the path from f1 to f2? Howmany of those reached
statistical significance? And ifthe answer is 40%, then I'm
like, Oh, heck, that's anestimate of 40% power. That's
not what I want at all right? Iwant something more. So that
tells me that, at least withrespect to that parameter, I'm
going to have to increase mysample size. But while I'm at

(41:47):
it, I would check how many ofthe tests of the path from F to
F three were statisticallysignificant, how many of the
path from f1 to f3. So I'mgetting these empirically
generated values for power. Thatgives me a sense of whether or
not I need to turn the samplesize knob up, or the sample size
knob down. And what I learnedpretty quickly is that there is

(42:08):
going to be a weakest link inthis whole model, that's going
to be the one that holdseverything up, right, there's
going to be one parameter, theother parameters are going to be
like, Hey, we're okay, we've gotenough power. And you've got
this other parameter that'sholding everybody else up where
you have to keep turning up thesample size knob to get that
slowest one across the finishline. But that's essentially the
setup for things under whateverassumed distributional

(42:30):
conditions we have. Currently,I'll just say normal under the
assumption of complete data andall of that this is essentially
how it works, where we keeptweaking that sample size knob
until we get all the parameterestimates that we care about to
have statistical significance ofsome threshold power level of
something that we're happy with,let's say point eight, oh,

Patrick (42:50):
you may say cheese sounds an awful lot like what
you get out of Satorious sarisand you would be right, if you
meet assumptions, if you have alarge enough sample size, and
you have continuity, and youhave normality. And you do what
Greg just described in the MonteCarlo, and you do it 10,000
times. And you compute what isthe mean number of parameters

(43:12):
that I rejected at a P of ohfive that is going to converge
on the Satorious RS analyticmethod, they are one in the same
way are they separate is when westart to violate those
assumptions. So you're doing alongitudinal growth model in
Satorious Rs? Do you got acovariance matrix in the
population, a covariance matrixunder some misspecification in

(43:33):
the population? Well, in thissimulation, you shrug we do a
lot of shrugging, and you say,well, let's punch out 20% of the
observations in any given sampleunder an Mar assumption, let's
make them a little skewed. Let'sintroduce a touch of dependence,
let's put in a couple of Irishloadings. And now let's do it

(43:55):
10,000 times and see whatproportion we would reject.
Because now we have what'ssometimes called an empirical
estimate of power, whereSatorious Soros has an
analytical or asymptoticestimate of power. The empirical
is you literally say, of my10,000 regression coefficients

(44:19):
estimated between f1 and f2.
Under all of thosecharacteristics I described 68%
had a p value less than point ohfive, your empirical power
estimate is point six, eight, ifyou do the study, as you
described in your studycorresponds to the
characteristics as you generatedthe data. You have a point six,
eight chance of finding aneffect if an effect really

(44:43):
exists. Boom, in and out, nobodygets hurt. There is your power
of what

Greg (44:50):
sort of sorta sorta kinda sorta I'll tell you my kind of
sorta here and that is underthose specific To conditions,
that degree of missingness, thattype of missingness, with those
particular loadings with Saturnin Jupiter's fourth house.

(45:22):
Under those specific conditions,you have a beautiful, beautiful
estimate of power,

Patrick (45:28):
in fairness to me,

Greg (45:29):
which is what's important.

Patrick (45:30):
I said, if your sample corresponds to the
characteristic that you definedin your model,

Greg (45:37):
yeah. Oh, that's such a pregnant if isn't it at this
point, right? I'm of two minds.
One, mine is like, boss screwthis whole thing. I'm outta
here. Right? All right, thatdoes it.

Insert (45:49):
I'm going home.

Greg (45:53):
Seriously, because even if we just take the model that we
started with, how do I know thevalue of those paths? Well, you
don't but you make an educatedguess, how do I know the value
of those loadings? Or don't Ijust make an educated guess? How
do I know what the distributionsare? Like? I don't know it's
making. It's an this is such aJenga tower of assumptions that

(46:13):
you might feel like what theheck is the point of all of
this, so there's a part of methat would lead me to throw my
hands up. But the other side ofthis is that it sets the stage
so beautifully, for being ableto do the equivalent of a
sensitivity analysis, thinkabout all the different knobs
that are being turned here. Oneis just the values that you put

(46:35):
in structurally, the point five,the point three, the point four,
or whatever the numbers are thatwe put in there, you can try?
What would it be like if thosewere different, you can try?
Well, what about the loadings?
Well, you can try it if theloading values are different.
What if there are some Irishcross loadings? Like Patrick
said, yeah, you can put in some,you can put in a few more, you
can twiddle the knobs on themagnitude of those. What about

(46:56):
missingness? Yeah, that's right.
You can try different levels ofmissingness. What about non
normality? Yeah, you can trydifferent levels of non
normality. You can sort of askyourself, What is the worst
perfect storm that I can imagineof conditions? And maybe the
answer is, you've got somewicked non normality. And you've

(47:16):
got some missingness in aparticular pattern, and you've
got some low loadings, andyou've got some cross loadings,
and you've got some weakstructural pads, you're like,
Bring it on, right, you definethe worst case scenario, you run
the simulation, you turn thecrank, and you power your study
for that, then you are ready forjust about anything. So I would

(47:36):
say this framework gives us theability to understand how robust
our estimates are, and hope forthe best, but plan for the
worst. When I'm writing up thatgrant proposal, I don't just
say, I looked at the power forthe root mean square error of
approximation, because first ofall, it doesn't correspond to
any of my hypotheses, I can sayI did a sample size planning

(47:57):
endeavor where we essentiallydefined the worst case scenario.
And this is what we got here.
And if you can get yourreviewers to buy into your
conception of that worst casescenario, and that you have a
sample size to overcome that,then you are in phenomenal
shape. What I

Patrick (48:14):
love to see are these ranges as you talk about
sensitivity analysis is our bestfaith effort is we have a power
of point six, eight, to detectthis very specific effect. If
things really shift sideways onus the worst possible situation,
we're going to get as point sixone. And if things break our
way, point eight, three, givekind of a low medium high

(48:37):
estimate. And that conveys ahuge amount of information to
the reviewer. But you know, wealways talk to the reviewer,
screw the reviewer, it's to you,you have an ethical
responsibility to build a studythat has a fighting chance to
detect an effect if an effectreally exists. And this is how
you go about doing it. Yeah,dude, we're getting long in the

(48:59):
tooth. So let's walk to theexit.

Greg (49:01):
In that first power episode, episode two, season
one, we talked about thechallenges associated with power
analysis just as an endeavor,that it's very difficult to
characterize power in somesingle number, let alone with
any degree of precision. Andthat was, as we said before, the
curse the darkness of poweranalysis episode, and in this
episode, what we tried to do iswe tried to pair that with, you

(49:24):
gotta do the power analysis, youstill got to do the sample size
planning. So how do you do itand you and I both lean toward
the structural equation modelingframework, not because it's
power in structural equationmodeling because it's power in
darn near anything you want todo not literally everything, but
darn near anything that you wantto do. And it has a mathematical
side to it that you could do ifyou want and it also has what I

(49:47):
think is incredibly versatile,this simulation based side and I
think that for people out therewho are doing honest sample size
planning, this is a wonderful goto framework for them to use To
lay things out,

Patrick (50:01):
hopefully, we may have got you thinking a little bit
more about the power of whatdon't let your advisor don't let
an article reviewer, don't let agrant reviewer get away by
saying, Well, what is the powerof your model? Yeah, turn it
back on them and to say, Well,what do you mean? Do you mean
the power of an omnibus exactfit? Do you mean the power of an

(50:23):
omnibus close fit? Do you mean aone degree of freedom?
misspecification? Do you mean amultiple degree of freedom?
misspecification? Do you meanunder asymptotic assumptions,
the power of what and with thesimulation methodology, whether
you use canned stuff that existsin is very good, but also being

(50:45):
able to program this for anymodel you want? If you have some
weird three level crossclassified model, and you want
to know what is the power todetect a random effect, you can
write a Monte Carlo simulationthat you generate 10,000 random
effect estimates and just countup how many are significant. You

(51:06):
can, in principle, find theempirical power of any parameter
in any model that you can drawon the board. And

Greg (51:16):
I think as scientists, this is really what we ought to
be doing making this full,honest attempt at trying to get
at the power of the things thatwe care about. Right? These are
my research questions. This ishow I'm going to get at those
research questions. Here are thesample sizes I need to get at
those research questions in thisparticular context.

Patrick (51:34):
And you know what, I'm going to double down on what I
advocated in the initial episodehere because I still feel this
way about the precision of powerestimates you really nicely
described well, what loadings,what residual variances, what
there's so many moving parts andsubjective decisions. I still

(51:55):
think we should have a smileyface emoji, a shrug, emoji, and
OOP emoji you for power. But nowwe fix those emojis to the power
of what now it's not just I havea shrug emoji for my entire
model. I have somewhere betweena shrug emoji and a smiley face

(52:17):
emoji to detect my treatmenteffect and a bivariate growth
model over five time periods.
With 22% attrition and sixordinal level indicators. I
still advocate the emojis

Greg (52:30):
I feel an eggplant coming on.

Patrick (52:31):
That is for your work on.

Greg (52:34):
All right, thanks very much. Toodles. Everybody, take
care.
Thanks so much for joining us.
Don't forget to tell yourfriends to subscribe to us on
Apple podcasts, Spotify, orwherever they go. When life is
so overwhelming. They'd ratherlisten to a podcast about
statistics and root canals. Youcan also follow us on Twitter
where we are at quantity pod andvisit our website quantity

(52:56):
pod.org where you can leave us amessage find organized playlists
and shownotes. Listen to pastepisodes and other fun stuff.
And finally, you can get coolquantity merch like shirts,
mugs, stickers and spiralnotebooks from Red bubble.com
where all proceeds from nonbootleg authorized merch go to
donorschoose.org to help supportlow income schools. You've been
listening to quantitative thepodcast equivalent of that weird
second cousin who stays just abit too long at holiday

(53:19):
gatherings. Today's episode hasbeen sponsored by the bar chart.
When the rest of the world hearsthose words. They think of the
thing that tells them where togo next on their fun Saturday
night pub crawl with friends.
When you hear a bar chart, youthink of a tool to visualize
distributions of data. Pleasetell me you see the problem
here, and by Fisher's LSD, waymore powerful than two keys

(53:41):
magic mushrooms. And finally bystepwise variable selection
methods and regression, maybenot the best idea but still way
better than whale holes. This ismost definitely not NPR
Advertise With Us

Popular Podcasts

Stuff You Should Know
Dateline NBC

Dateline NBC

Current and classic episodes, featuring compelling true-crime mysteries, powerful documentaries and in-depth investigations. Follow now to get the latest episodes of Dateline NBC completely free, or subscribe to Dateline Premium for ad-free listening and exclusive bonus content: DatelinePremium.com

24/7 News: The Latest

24/7 News: The Latest

The latest news in 4 minutes updated every hour, every day.

Music, radio and podcasts, all free. Listen online or download the iHeart App.

Connect

© 2025 iHeartMedia, Inc.