Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Patrick (00:04):
Welcome, my name is
Patrick Curran and along with my
libertarian friend Greg Hancock,we make up quantity Dude, we are
a podcast dedicated to allthings quantitative ranging from
the irrelevant to the completelyirrelevant. In this week's
episode, Greg and I Explore allthe ways we lie about things
when we teach, not the least ofwhich is that there are actually
no individual growthtrajectories estimated in an
(00:27):
individual trajectory growthmodel, we discuss why this is
how individual trajectoryestimates can be obtained, and
how these might be of use inpractice. Along the way, we also
discuss the greenlight button,developmental milestones, love
for semi colons, Jack Nicholson,baskets of data. Oprah, it's all
(00:48):
crap, transparencies and dittos.
Yelling at your steering wheel,Libertarians versus socialists,
carts and donkeys, catchingsquirrels, enemies on the field,
being stung bit and chased andpersnickety models. We hope you
enjoyed this week's episode. Soboth of us have teenage kids.
And I don't know what yourexperience is up north. With
(01:11):
increasing frequency things seemto arise at the dinner table
where they say you told us thisat some point early on. And it
turns out that we just figuredout that Do you have some of
this happening in your ownhouse?
Greg (01:30):
I mean, there's two ways
of looking at it. Right? There's
the stuff I told them thatwasn't entirely accurate. And
then there's stuff I justwithheld, telling them. So if
you mean things like the worldisn't fair, you can't grow up to
be whatever you want. And I'mnot even sure I'm your dad. Then
there are some things that Imight have withheld just because
(01:51):
I didn't want to depress themthe rest of their lives. But are
Is there something in particularthat you're thinking of that you
have had to walk back or thatyour kids called your BS on
silly things
Patrick (02:01):
come up here at home,
and one just made me smile?
Because it made me think back tomy own dad when I was a kid. At
one point, I asked my dad, howdid stoplights work? He said,
Well, I've got a button in thecar and you push the button and
it changes the light. And sowe'd pull up to the stoplight.
We were little kids and mybrother and I would say push the
(02:21):
button, push the button. And itwas really sweet. Because there
was a life lesson in it. My dadwould say, Oh, we're not in a
big hurry. Let's let these otherpeople go first. I'm sure they
have more important places tobe. And then at some point, he'd
say, alright, well, Fair's fair,it's our turn, and he would
reach under the dash and thelight would turn green. Yeah, so
when I was five, but it wentinto like sixth and seventh and
(02:44):
eighth. At one point, my dad,somewhat incredulously said,
You're old enough to realizethat we don't have a green light
button, right? It became adevelopmental milestone is
worried about little Pj is hestill thinks at age 19, that I
can change the light. What itgot me thinking is all the
(03:09):
things that we teach kids wereeither it's just easier to
simplify it because you don'twant to deal with the
complexities, or moreimportantly, they're not ready
to learn the particular thing.
But the funny thing is, we dothat when we teach grabbed
classes
Greg (03:26):
all the time, right? And I
have mixed feelings about it.
But there are admittedly a lotof things that I teach early on
in a stat sequence that lateron, I have to walk back or later
on, I sort of have to admit,well, there's really more to the
story than that. And I findmyself in that situation all the
time. Part of the
Patrick (03:43):
problem is it really is
easier to teach something in a
particular way. Now, it's not anovert lie. Well, sometimes it
is. But it's a simplificationpicture in your mind's eye,
right? We always talk aboutthat. It's like what we're
trying to do here or you panback. The goal here is to, and
(04:05):
it's a simplification, buthere's part of the problem. We
teach semester long classes, Ido a little something. And then
I pass them off to Bower, Bowerdidn't know what I talked about,
because Heaven forbid we talk toone another and have a cohesive
curriculum. It starts gettingbaked into the system. I don't
(04:25):
go back a year later and say,okay, all of us were together
last year. And I described thisthis way. Now, let's dig a
little bit deeper into it. Yeah.
A funny example I had as I tooka wonderful workshop years ago
on the science of scientificwriting, his name is Goldman and
he actually has a whole body ofwork on how do you write well in
(04:45):
a scientific framework, and hehad a laugh out loud
conversation about thesemicolon. Now, you and I have
worked a lot together. And youand I have written together you
know that I have a very liberalpolicy on the Oxford comma. And
I love semicolons, he had thisthing where we should use
(05:08):
semicolons more. But whenthey're introduced in middle
school, we're not ready for it,we're not ready to learn how to
properly use the semicolon. Andso we're not taught it. But by
the time we are emotionallyready to use the semicolon,
nobody teaches it anymore. It'sbeyond, it's forgotten. It was
just a laugh out loud, funnyconversation, because what he
(05:31):
did is tied it to kind of likehealth class and your bodies are
changing. I think a really goodexample of that that is very,
very common. It's not just meand you it's very, very common
is the first introduction to themultiple regression model. When
we talk about Y and Y hat, andthe residual E. And the sum of
(05:53):
the squared E's I almost everyintroductory class teaches in a
way that the model doesn'tactually do it that way.
Greg (06:05):
I know that when I teach
regression, I do it very
colloquially, we've got somepoints there. And we say, hey,
try a line. And what would thepredicted value? What would the
Y hat be for everybody with thisparticular line? And how do we
gauge what the best fit line isin terms of how far the actual
points get above and below theline, and we can't just talk
about adding those up. And so wetalk about why we might square
(06:27):
those and sum them up, we wiggleand jiggle until we get that
best fit line, that line thatminimizes the sum of squared
vertical deviations of actual yvalues from predicted Y values
of y's from y hats. And then inthe end, I'm saying, See,
everybody has a y, but everybodyalso has a y hat. And I'm seeing
(06:49):
all of this stuff, almost likeanthropomorphizing the process.
But in the end, that's more thestatistical puppet show and not
exactly what's going on, right?
Patrick (06:59):
With regression, in
fairness to us, you can do it
that way, when I teach it, I'malmost like Oprah.
You get a residual, residual, itis true, we can do it that way.
Everybody has a y. Now everybodyhas a y hat. And there's a
(07:22):
distance between those. And wetake Y minus Y hat and get the
EES. And we gathered those alltogether, blah, blah, blah.
That's how we describe how we dothis. But the actual analytics,
that is the getting theregression coefficients, we can
do that entirely on covariancematrices. Yeah. But if we have
complete data, if you give me acovariance matrix of your
(07:45):
outcome, and your predictorswith no raw data at all, I can
do everything that we do in theregression model. And indeed,
however you do that yourself, itis not doing these individual
Why Why have E squared, addingthem up? It's all the level of
the covariance matrix and themean vector, that to me is kind
of like the semi colon, you'renot ready to think about x prime
(08:09):
x inverse x prime y, when you'revery first introduced to it is I
feel like Jack Nicholson,
Insert (08:14):
you want answers? I
think I'm entitled you want the
truth, you can't handle thetruth.
Patrick (08:23):
But the problem is, is
it gets baked in. And then
rarely, if ever, does somebodycome back and say, you know,
funny story, those individualobservations and the individual
reserves out those don'tactually ever exist within the
confines of the program, we getthe variances of the residuals
without ever computing theindividual residuals.
Greg (08:44):
This is all a closed form
system that is derived through
some optimizations. It's not awiggle and jiggle optimization
like we're accustomed to in alot of iterative procedures. It
is closed form, we do somecalculus, we go boom, here's the
formula that gives us that andwe never touch a Y hat. We never
touch a residual. And yet, wekeep talking about them as
(09:06):
though there's some criticalpart of the actual process of
getting that line in the firstplace.
Patrick (09:10):
And I can make it even
worse. Okay, please, you and I
both teach growth modeling.
We've had several episodesrelated to growth curve
analysis, I go full bore Oprahon the growth model is inter
individual differences in intraindividual change, who I say
each of you in the room has aset of repeated measures. Some
(09:33):
people have three, some peoplehave four, some people have
five, they're in a basket infront of you. Now what I'm going
to have each of you do is fitthe line that best characterizes
your set of repeated measures.
Now in your basket is anintercept in a slope. Now I'm
going to walk around the roomand I'm going to gather all
those intercepts together. I'mgoing to gather all those slopes
(09:56):
together and now I've got adistribution of intercepts.
Epson, yeah, no, I don't knowwhat that is crap.
Greg (10:02):
All of it is crap. And you
know what, I'm gonna do it till
the day I die. I know that it'scrap. I know that we don't
literally do that. But you can'thandle the truth.
Patrick (10:14):
I would have liked to
have seen Oprah and Jack
Nicholson.
slope, slope, you want a slope,you can handle a slope.
Greg (10:27):
But it makes total sense,
right? If you say, All right,
everybody, you've got your owndata, Patrick, you have a bunch
of repeated measures, I want youto fit your line, go ahead, fit
your line, use the regressionskills we lied to you about
layers, get your slope, get yourintercept, you give them back to
me. And as you said, I'll putthem back in a basket. I love
(10:49):
the pedagogical value of that.
But you are totally right.
That's not how it really works.
It's
Patrick (10:54):
the same answer the
semi colon it gets baked in. I
do my Oprah, as I'm talkingabout inter individual
variability in thesetrajectories. Some people start
higher, some people start lower,some people increase more
rapidly, some less rapidly. Andthen the big reveal, we can try
to predict those. I too am notgoing to change mostly because
(11:16):
I'm old and I can't be fired. Iknow the slides have already
been made. Yeah, I mean, mytransparencies are doing just
fine.
Greg (11:25):
I made the dittos.
Patrick (11:26):
But the problem is, is
you walk out of the class
without an understanding of Oh,funny story. Every time you've
heard somebody wax poetic aboutindividual variability and
developmental trajectories ofchange over time, those do not
exist. Yeah, what we do is welook at variances of intercepts,
(11:47):
variances of slopes means ofthose covariance between the
two. But it's actually not onlykind of like it is exactly the
same as the regression model,where we talk about having the
variance of the residuals wherewe never compute the residuals.
Here, it's even a littleweirder, because we have a
(12:08):
variance of the intercepts andthe slopes. When we never
compute the intercepts and theslopes,
Greg (12:14):
we tend to teach this
thing in a very forward way, you
know, imagining we go from thedata to the lines, to the
characteristics of the lines,the summary statistics, and then
doing fun things with thoselines. But it works almost
completely in the opposite waythat we actually have all of the
summary stuff to start. And thenthe question is, do we have a
(12:35):
reason? Do we have a need toback off that and talk about
individual level things like inregression, I don't have
people's residuals to start,although I talk about it as
though I do I do the regression,I fit it with that closed form
process. And then I can ask whatresidual each individual has, if
I'm interested in that, here weare in a longitudinal setting.
(12:56):
But we have this model where wesay, oh, yeah, I know what the
average slope is, oh, yeah, Iknow what the variances of
slopes and the same thing forintercepts. But now tell me what
that means. For an individual. Idon't even have that
information. It's not like Ilook at how far their point is
above or below a line. I don'teven have a line for that
individual.
Patrick (13:14):
And to be super clear,
if you're yelling at your
steering wheel while listeningto this and saying, but you can
get residuals estimates andregression, Oh, of course, you
can't. That's diagnostics,right. We have residuals and
studentized residuals andStudentized deleted residuals,
we can get distributions ofresiduals, we can save them out
and subset them, we can look forextreme residuals and outlier
(13:38):
detection. But the big point is,we have to do extra work to get
those, they do not come out ofthe standard x prime x inverse x
prime y, where we get all of ourusual anova tables and our
regression coefficients andstandardized coefficients and
all of that. And that's actuallywhere I want to pivot to is
(13:58):
going to the growth model. Thereis a reason why we teach it
about everybody has anintercept, and everybody has a
slope. First of all, that's thegreen light button in my dad's
car, which is you're not readyto think about lambda sy lambda
prime plus theta apps to getsigma theta hat. It turns out
(14:19):
that even though we do notestimate individual intercepts
and individual slopes, whetherit be an FCM, or an MLM growth
model to be super clear, theynever exist. In an analysis you
want to know a super weirdothing. All my Oprah you get a
trajectory you get a trajectory.
If you have complete data, I canestimate the means and variances
(14:42):
and covariances of individualtrajectories based solely on the
covariance matrix and meanvector of the repeated measures.
Greg (14:52):
You don't have to see any
raw data at all It
Patrick (14:54):
rubs a little bit of
the magic off the growth
modeling lamp which is Wait aminute, you talked about inter
individual differences in intraindividual change and blah,
blah, blah, I need a covariancematrix. And and
Greg (15:10):
you never saw an
individual, you never saw an
individual.
Patrick (15:13):
Here's the pivot I want
to make, just as there are
reasons why we want a residualout of the multiple regression
model, to do everything that wejust talked about, I argue that
there are many reasons why it'sincredibly helpful to also get
an estimate of the trajectoriesfrom the model. Because just
like the regression, we don'tcalculate individual intercepts
(15:36):
and slopes when doing a standardgrowth model. But we can use
that model to get estimates ofthe intercepts and the slopes.
Well, I
Greg (15:45):
am going to want to hear
this because there are some
places where I feel Yeah,absolutely. That makes sense.
And then there are other placeswhere I go, Wait, I thought
that's what we literally try toavoid with some of these
particular models. You know,when I talk about CFA, for
example, confirmatory factoranalysis, I will often talk
about having a model that is arepresentation of the reason
(16:07):
variables relate according someparticular theory, or that
theory involves one or moreunderlying latent variables, and
students will invariably say, sothen we get scores for those
factors. First of all, theydon't even understand right off
the bat that we don't havescores for those, and I remind
them, their latent That's whatthe word laden means. But then
somehow, at the end of theprocess, they're like, okay,
(16:28):
it's okay. Okay, so now we getscores. So we can use those for
other things? And my answer isusually no, usually, we don't
need to. So here we are in alatent growth model setting. And
I know that we could talk aboutthings in a multi level model
setting as well. And I fit agrowth model, I learned so much
just by getting thesecharacteristics of the
intercepts and slopes and howthey covary and how things might
(16:50):
predict them. So help meunderstand under what
circumstances getting estimatesbecomes useful in the way maybe
that getting residuals in aregression are useful. It
depends.
Patrick (17:00):
Are you a libertarian
or a socialist? Wow. You're just
the answer the question.
Greg (17:07):
I don't have to answer the
question. You can't make me
answer the question, because myfreedom it is my right not to
answer, you're
Patrick (17:13):
a libertarian.
Excellent. Let's pan back alittle bit. And I'm going to
make up some numbers as we havea hypothetical example. So we
have a sample of 200 people, wehave five repeated measures.
Now, anything that we talkedabout, were almost always in a
longitudinal design, differentpeople have different numbers of
measures. So maybe have somehave three, some have four, some
have five, whatever, as you'relistening picture in your mind's
(17:35):
eye, your own application, whereyou ever retained your data
matrix, where you have fiveassessments on your outcome. So
let's talk about development ofreading skills in children. So
we have something that webelieve to systematically
increase on average over time,but there's probably individual
variability around if you're alibertarian, which I was when I
(17:56):
started learning these growthmodels, which do an OLS
regression for each individualseparately. Now, I was all about
this in early days. Oh, yeah.
And it really was a libertariankind of thing. These are my
repeated measures. I want theline that best characterizes
(18:18):
these person one has fiverepeated measures on reading
ability, we're going to regressreading ability on time. Sure.
01234 is the predictor picture alittle scatterplot with your
five repeated measures, andwe're going to fit a line that
best characterizes thoserepeated measures. Well, does my
(18:39):
line impact your line? No, notat all. This is an individual
regression. That's thelibertarian aspect to it. I'm
not imposing any functional formthat's governed by anybody else.
Whatever you do is your rightyour freedom. You be who you
are, we get a regression forperson one, we get a regression
line for Person two. I'm a SASguy. So I use SAS terminology,
(19:01):
but you do whatever your homeprogram is, you do proc Reg, by
ID, we're going to do 200 OrWell, less regressions. And
we're going to gather togetherall the intercepts and the
slopes. I actually wrote a paperit was led by a wonderful
scientist named MadelineCarrick. And RJ worth was also
(19:23):
on this. We'll put this in theshow notes. She designed a SAS
macro that does thisautomatically. You just put what
is your outcome, what is yourtime, and it does all the
regressions and pulls them alltogether. It does graphics,
things like that. So this isvery easy to do. So this is the
Oprah Winfrey, you get anintercept, you get an intercept.
(19:44):
What is super important to getacross here is my intercept and
slope is not impacted by yourintercept and slope. That's
Greg (19:52):
right, because it's only
informed by your data, nobody
else's data anywhere.
Patrick (19:56):
That's the libertarian.
What about the socialist one?
Well, we're gonna spend a wholenother episode talking about
empirical Bayes estimation andFactor score estimation, because
it's a fascinating topic in andof itself. But if you run a
growth model in the structuralequation model, and we talked
about predicted scores as Factorscores, you run it in the multi
(20:19):
level model, and we talk aboutthem as a thing called empirical
Bayes estimation, we're going toget individual estimates of your
intercept and of your slope, butthey're going to be a joint
combination of your repeatedmeasures, but also the summary
statistics of the sample as awhole. And so now, my intercept
(20:42):
and slope are impart informed bythe means and the variances of
everybody else's trajectory,
Greg (20:50):
in the socialist view of
this thing, the intercept and
slope, they are factors. And inorder to get scores associated
with those, and really what wemean is score estimates, because
we're not going to be able toget some true intercept and some
true slope. In order to do that,I actually need the model and
the model itself couldn't haveexisted without the information
(21:10):
from everybody to get what thatmodel is in the first place. And
so now, when you estimate myFactor score for an intercept,
and a slope and your Factorscore for an intercept and
slope, it could never have beendone without the information
from everybody feeding into thatmodel in the first place. I am
Patrick (21:25):
very fortunate to have
a colleague of mine here at
Carolina, who actually was theguy who hired me back in the
day, but Dave Ellison, I gave atalk to the area, and I did a
libertarian kind of view on theindividual trajectory
estimation. And he, over timehas really drawn me to these
(21:45):
empirical Bayes estimates,because to me, it just seemed
inherently unfair, that yourtrajectory would somehow impact
my trajectory. Yeah, there aremany reasons why you would want
to do exactly that. There's athing called a shrunken
estimate. And that is, thefarther away you are the more
sparseness that you have in yourdata, you're pulled back toward
(22:09):
the mean, right? It's likewhatever the gangster movie, you
tried to leave, and they pullyou back in whatever it is. Just
when I
Insert (22:18):
thought I was out, they
pull me back in.
Patrick (22:23):
But you're drawn back
to the mean. And that always
bothered me deeply. But what youconceptualize is, look, you
hooked your wagon to a donkey,that you are sampling from a
homogeneous population that isgoverned by some growth process,
(22:44):
and you're observing individualvariability in those
trajectories around that growthprocess. And logically, it makes
perfect sense to use theinformation about the sample
when you're estimating theseindividual trajectories. And so
just know that if you do an OLStrajectory estimate, that's each
(23:06):
case, in isolation of all othercases, if you get a Factor
score, or an empirical Bayesestimate out of a model that you
fit into the data, that's ajoint contribution of the
individual data and thecharacteristics of the sample.
Greg (23:21):
And I'll tell you what, if
there's this libertarian inside
you that is bothered by that, itwould have to bother you with z
scores. Also, because we cannotcompute a z score without
knowing all the rest of thescores and the distribution to
figure out do I have a high Zscore? Do I have a low Z score?
So the idea of you doingsomething in reference to all
the other information is hardcoded in everything we do. From
(23:44):
the very first course it doesn'tbother me,
Patrick (23:47):
again, going back to a
one predictor regression where
you get the residuals out, yeah,you have Y minus Y hat, Y hat is
based on x prime x inverse xprime. Why? I mean, it's the
same gig totally. Here'ssomething that's really
important. It is a double edgedsword on this. We are not
arguing against the OLSestimates, because what is the
(24:09):
trailing edge of getting aFactor score or an empirical
Bayes estimate from a model?
Well, those means and thosevariances that you're using for
the sample to impart inform yourindividual trajectory estimate
are based on a model that youdefined. What if there's a
subset of individuals who aregoverned by a different growth
(24:32):
function, who what we want youto think about is what are you
trying to achieve when youconsider each of these
approaches in isolation? Andindeed, I gotta tell you in my
own work, I use both a lot oftimes you know, what I'll do is
go back to the 200 person fivetime point example. I don't know
what your workflow is on this,Greg, but what I often do is
(24:54):
before I fit any models before Iconsider whether to do an SEM or
an MLM or to consider to do alinear or quadratic or whatever
it might be, I will doindividual regressions, get
individual trajectory estimatesand plot them one person at a
time. And I will page down caseby case, and just see what the
(25:18):
data look like with thatindividual line. And they're
going to go up and they're goingto go down, and they're going to
tilt and they're going to dowhatever that they do. But it's
a model three insight into whatam I repeated measures look
like? It's just orienting to thedata, then a lot of times, I'll
just hold the page down key. Andit's like one of those old
(25:40):
degiro type movies, right, whereit's like, man doing somersault.
And a lot of times, it's like,oh, okay, I see most of these
are linear. I see most of theseare going up and down in some
generally random way. But a lotof times, you can say, Wow, some
of these look like they'relinear. And some of these look
like a line may not be a goodestimate. Yeah, it's no model
(26:02):
fitting, it's just okay. Maybethis is a reflection of my own
personality. It's orienting tothe enemy on the field,
Insert (26:11):
sir. Overwatch reports
and F 14 Tomcat is airborne and
on course for our position.
Mavericks.
Greg (26:21):
There's so many ways you
could have said that other than
that, yeah, yeah. You know, ifwe think about modeling as this
endeavor that we go through,there are some people who might
be ill at ease with what youjust described, some people
might think that that's fishingthrough the data to get a sense
of what the growth process islike. And then subsequently,
you've had a growth process tosee like how well it fits like,
(26:43):
well, of course, you picked itto fit that. On the other hand,
I think it's grounding in termsof am I even barking up the
right tree here and thinkingabout this, I think it is a very
reasonable first step to, quote,look at your data. The trick
here is that you mean looking atsome description of your data,
you took the spaghetti plot, andyou broke it out into one piece
(27:04):
of spaghetti at a time as youwent page down, page down, page
down, we can imagine all ofthose superimposed onto a
particular plot. I think it's avery reasonable first activity,
it would be like if you had novariance in a variable, why
would you try to predict it?
Well, let's take a look at thevariance of a variable. We're
taking a look at the nature ofthe growth to try and see
whether or not it's evenreasonable to talk about growth
as a process that describeswhat's going on. I
Patrick (27:28):
would never use these
individual OLS trajectories to
guide the model that I'm goingto fit. Sure. But what it is, is
it's like a data screeningdevice. Yeah, we should all be
looking at univariatedistributions and bivariate
distributions and potentialoutliers, before we ever fit a
model. Yeah. And you know, whereI see the danger is not Oh, I
(27:49):
knew it was a linear the wholetime, I'm going to do a series
of likelihood ratio tests as Ibuild functions anyway. So I
think that's a moot point as I'mnever gonna be parking or
parking or parking or whatever.
The bigger danger, and I havefound this in my own work as
well, is you go straight to thegrowth model without doing this
(28:11):
initial step. And oh, my gosh,your growth model can cloak a
problem in a way that you wouldnever know was there, you fit an
intercept, only you add a line,you add a curve, you do
likelihood ratio test, you backup and say of all the options,
linear is the best fit, youcould have a subset of cases in
there that are doing somethingvery different and not be aware
(28:34):
of that.
Greg (28:37):
Totally. Yeah, I mean,
think about it from an outlier
perspective, where, when you'retalking about univariate,
procedures, or even multivariateprocedures, you will often try
to visualize or compute a waythat a case stands out or
doesn't stand out relative tothe others. How do you do that
with regard to align? What do Ihave to look at and what you're
describing is something thatmakes sense so that you can even
(28:59):
visually identify whether itlooks like the population you
have is homogeneous, or you gotsome weird folks. And when we
say weird, folks, we mean weirdfolks in terms of a line, which
otherwise wouldn't be in yourdata.
Patrick (29:10):
Those are the
advantages of the individual
estimates. Now we go through allour usual model building. Now
let's say we do our full boregrowth mod, and we build an
unconditional growth model. Andwe identify the optimal
trajectory. And we bring in timeinvariant covariance and time
(29:30):
varying covariance, we doeverything that we would do. Now
we can get model basedtrajectories, that it is
absolutely the individual's rawdata, but it's in part
influenced by thecharacteristics of the sample as
a whole. And we can gather allof those together and look at
them and I gotta tell you, thoseare the ones I think we should
(29:53):
use more in practice. I see theindividual OLS as a data
screening and initial date.
Equality thing, the modelimplied, I think there are
multiple advantages, you can getsome pretty opaque results from
this model, you get fixedeffects, you get random effects,
you get covariances of randomeffects, you get residuals, you
(30:14):
get all of our usual parameters.
And if you're doing a morecomplicated model, these tables
start breeding like rabbits,where every factor has a mean
every factor has a variance,every factor has a covariance
with every other factor. And itcan get pretty opaque, both in
communicating that to a reader,but also, in developing an
(30:38):
understanding of the confidencethat you have, that you've
appropriately modelled thecharacteristics of the data in
your sample, but being able tooutput these individual
trajectory estimates, and thentake those out in the garage
with your Coors lighting GreenDay. And just play around with
those, I think is a hugeadvantage that many people don't
(31:02):
capitalize upon when theyotherwise could.
Greg (31:05):
So on the back end, what
you're talking about is now
actually putting the individualin the individual. Right, right,
right, which is, we talk a goodgame.
Patrick (31:16):
Exactly. It's the green
light button all over again,
Greg (31:19):
you are all individuals.
Insert (31:30):
I'm not?
Greg (31:32):
How do we get it back to
that individual level? And why
would I want to do that
Patrick (31:36):
the getting them is
easy. In an SEM, any program
will read out Factor scores. Andthe same with the multi level
model is any multi level programwill give you empirical base
estimates at both level one andlevel two. What we're talking
about here is level twoempirical Bayes estimates, which
are getting case based estimatesof the random effects. And those
(31:58):
are the intercepts. And thoseare the slopes. So go to your
favorite program, look up how toget these estimates and you got
Greg (32:04):
and don't forget their
estimates, please don't forget,
there are only estimates
Patrick (32:08):
that is huge, right?
Because they are latent, right?
We can't see it, we'reestimating in regression, we
actually don't estimate theresidual, we compute it. Because
we have y and we have Y hat andwe take the difference between
the two and that's e here. Theseare estimates. And what that
means is there is no single wayof going about doing it in any
(32:30):
way that we do is flawed in someunknown way. There's an
unreliability in the estimate ofthat. Exactly. And that's
reflected in the SEM literature.
Is there many, many differentways of getting Factor score
estimates, there's not one way.
So these are estimates. Nowyou've done your modeling, you
get these case based estimates.
Now, what do you do with them?
(32:52):
In my own work, I try to do twothings. One is I am a massively
huge fan of the pokin stuff,then we've talked about this
before, as you're going for ahike with your kids and
everybody gets a pokin stick,because the world is just an
absolute garden of things thatneed to be poked.
Greg (33:11):
Did you tell your kids
this early that they should go
around and poke everything?
Patrick (33:16):
I mean, oh, and we've
been stung and bit and I mean,
there's some life lessons inthere as well. What is a pokin?
Stick is all right. Again, asyou're listening, think about
something that is near and dearto you. Is it your masters, your
dissertation? A manuscript,you've been working on a long
time, you've got your finalgrowth model, you've got your
(33:38):
tables, you've got your fixedand random effects. How
confident are you that you'vemet all the assumptions of the
model that there are no outliersthat there's no subgroup
heterogeneity, all the thingsthat we worry about? I love
using these case based estimatesfor diagnostics. You've got them
in your back pocket. I'm a verypragmatic guy, I've got to see
(34:00):
stuff, I've got to touch stuff,I've got to feel things we can
talk about the means of theslopes and the variances of the
slopes. I want to cut it openand look inside. And when you
have all your intercepts and allyour slopes, well, now you can
go to work with those you canrank order them and look for
potential outliers orinfluential observations. What
(34:23):
if there's somebody in there whohas a very steep slope relative
to everybody else? Well, I mightgo in and pick that kid off and
re estimate the model as asensitivity analysis to see if
my fixed and random effectsvary. I gotta tell you these
models are persnickety
Greg (34:43):
as hell surprisingly, you
might think that they're robust
because they're somehowaggregating overall people. You
get a weird line in there andthings change.
Patrick (34:50):
I gotta tell you, you
are damn right. I have done a
lot of these models over a lotof years and it is not uncommon
where you identify by one or twocases that you drop, and the
variance estimate of your slopesthat you are going to center,
your discussion around goes tonon significant. These are very
(35:12):
sensitive to outliers. Well, ifyou don't estimate these
individual trajectories, younever know those exists. Now,
it's a whole nother conversationof Do you delete that case? Do
you inform the reader? Andagain, I don't want to get into
that rabbit. Warren of okay, soyou find three outliers, what do
you do with them, but at leastyou know, they exist. I have one
(35:34):
application where I was soexcited in the results of my
model, and everything was justhow I wanted and I was gonna
write this wonderful discussionsection, I went through this
endeavor that we're discussing.
And there were three cases thateverybody else was growing, and
they were dropping. And you plotthese on a single plot, and here
are these three kids who startreally high, and then drop
(35:57):
precipitously. I omitted thosethree cases, and all my growth
effects went away. And whathappened is those three kids
moved to juvenile detention.
That's why their trajectoriesdropped in the outcome is that
they were locked up in juvie. Isthat the population to which I
(36:19):
want to generalize? Well, no,you would not know that I would
have written my discussion of itwas the best of times, it was
the worst of times in theseresults that were all driven by
three kids.
Greg (36:33):
And I really want to
underscore that thing that
you're saying. We're notsuggesting that you go in and
go, and I don't like that line.
And I don't like that line. AndI don't like that line. What
we're suggesting is that thismight be a way to be able to
identify cases that are not partof the population to which you
wish to generalize. Patrickdidn't just look at those lines
and go, Oh, looks like juvie tome, Patrick went to those cases,
got additional information thathelped him to understand why
(36:57):
that was going on with thosecases, and then made a decision
based on the purposes of thestudy to pull those out. This is
not different. In terms of howyou handle things with regard to
outliers. In other settings, youalways have to ask yourself,
What am I generalizing to? Thechallenge is we can't see lines
unless we start getting some ofthis information. And that's
what this is helping us to do.
Patrick (37:19):
And I think that's a
really nice way of casting it,
you don't have access to theseindividual trajectories in your
fixed and random effects. Youget them you look at them. And
now you're informed. To me,that's the huge thing. I now
know this now, it's an entiredifferent conversation of what
do you do with that knowledge?
Yeah, but you haven't. And Iwould infinitely rather have the
(37:42):
knowledge and then struggle withwhat to do with that, rather
than not know it at all. And thefun thing about these is, you've
got these in your back pocket inyour data file. The first column
is biological sex. The secondcolumn is education. The third
column is was I in the treatmentand control. The fourth column
is my intercept. And the fifthcolumn is my slope. And so now
(38:05):
you can get really creative. Sosometimes I will rank order the
intercepts and the slopes fromsmallest to highest, it'll pick
off the top and bottom 1%.
Alright, it's pokin stick, wouldI write the same discussion
section, if I picked off theextremes, if suddenly my theory
(38:27):
was supported? When it wasn'tbefore, I'm not going to omit
those and never tell the readerblah, blah, blah. This is just
know your limitations know thesensitivity of the model, we
make assumptions about thedistributions of intercepts and
slopes, right. These aresometimes buried in the fine
print, but in all the models wedo we assume that the intercepts
(38:50):
are normally distributed, theslopes are normally distributed,
the intercept and the slope islinearly related that can be
captured in a covariance, do abivariate plot of intercepts and
slopes. picture in your mind'seye, the x axis is intercept,
the y axis is the slope, andyou've got a scatterplot of how
those two relate to one another.
These are hugely advantageousfor checking model assumptions,
(39:13):
we make assumptions abouthomogeneity of variance. Look at
the distribution of interceptsand slopes with a box plot for
treatment versus control formale versus female are those box
plots more or less similar toone another? It's old school
diagnostics. But at the level ofthe trajectories,
Greg (39:35):
you made a really good
point that we assume that these
latent variables might benormally distributed. But
imagine, for example, that youare modeling alcohol consumption
in adolescence and you run amodel that is the equivalent of
fitting lines to each of thoseand your distribution of
intercepts. If you drew it outas though it was normal, you
(39:57):
would find that some kids arestarting off in third Rate
drinking negative five drinksper month. And the numbers on
the distribution make no sensegiven your substantive
knowledge, obviously. But if youjust assume things are normal,
then it doesn't make sense. Ifyou do what Patrick's talking
about, you might see the heavierskew, right? This isn't just oh,
this is something interesting todo. This is part of your
(40:19):
understanding of the phenomenonthat you're trying to study.
Patrick (40:22):
To be super clear, this
stuff has been around for 20
more years plus, routing, Bushand bright have an entire
chapter in their multi levelbook. It's hierarchical linear
modeling, I think it was 2002.
It's almost 20 years now. But ifdiagnostics for a multi level
model, all of this stuffapplies. Even if you have nested
structured data, kids withinclassrooms, you can get
(40:46):
classroom level estimates of therandom effects. And they walk
through a really nicediagnostics in much the same way
that I'm describing now with thegrowth trajectory. So none of
this is new. But that's onlyhalf of what I like about it.
The first half is how confidentare you in the stability of your
model, you do whatever you dowith those diagnostics. And now
(41:09):
let's say you and I havefinalized a model that we have
confidence in, we believe it'sstable, it's not being driven by
a small number of outliers.
We're meeting the assumptionswe're done and done. And now we
want to open the cage andrelease it into the wild. By the
way, yesterday, I caught myfirst squirrel. I sent you a
(41:30):
little video clip of that. Yes,you did. Yeah.
Greg (41:33):
This is what your life
becomes, by the way, for those
of you who are out there, thisis the excitement that goes back
and forth between us, Patrickshows me animals. Oh, it's the
first squirrel of the season.
Patrick (41:42):
Yeah, I live on two
acres of wooded land, and I've
decided to remove all thesquirrels. Anyway, we're going
to open it and release it intothe wild. Now, it is our ethical
responsibility to communicatethose results to a broad
readership, who are going toconsume our findings, we give a
table of means and variances andcovariances. And some are raw
(42:07):
and some are standardized.
Imagine taking those and thenaugmenting it with graphics. So
one thing I love is the fixedeffect is what is the average
starting point. And what is theaverage rate of change make an
XY plot where we put the fixedeffect on it a starting point of
1.6 100 increases point oh sevenper year, but then overlay a
random sample of individualtrajectories. Oh, these are
(42:31):
beautiful plots. And you canaugment an obtuse table with
these graphics that say here isthe model based estimates of the
individual variability of thetrajectories within the control
group. And the panel next to itis within the treatment group,
you have a dark line, that's thefixed effect within control and
(42:53):
the fixed effect withintreatment. And you have the
individual trajectories aroundthe control, and you have the
individual trajectories aroundthe treatment. It brings your
opaque table to life. I like
Greg (43:07):
that idea a lot,
especially because we have an
inherent sense of growthmodeling. But at the end of it
all, we talk a good game aboutmodeling individual processes.
But the parameters that we focuson are more descriptions of
what's happening across thoseindividuals, rather than taking
the individual differences intoaccount. So what you're
describing is a way to startvisualizing. Are those
(43:27):
individual lines really tight tothe overall trajectory, are they
all over the place that isreally valuable information. And
it's so much more meaningful tovisualize rather than just
having a variance of a slope ora variance of an intercept.
Patrick (43:43):
And it's no different
than any of our traditional
models that are the same thing.
We have a variance of aresidual, what are the residuals
look like? It's just all we'redoing is scaling that up a
little bit. It's not even thatmuch. We're instead of
individual residuals, we nowhave individual intercepts,
individual slopes, another thumbtack I want to put into maybe a
(44:04):
future topic. I will make onerecommendation for what not to
do with these trajectoryestimates. Ooh, and we will come
back and revisit it on somefuture episode.
Greg (44:18):
cliffhanger. I love it.
You
Patrick (44:20):
are going to have a
siren song. Alright, so Greg is
Lashed to the mast and issailing by the sirens so that he
can hear their beautiful song
Insert (44:32):
asleep. My name is
Ulysses Everett McGill, asleep
about the devil nice three.
Patrick (44:44):
You've got everybody's
intercept. You've got everybody
slope and the sirens aresinging. Use them as predictor
variables, use them as outcomevariables and Odysseus is going
to go right by To the sirens,and is going to say in 2000
years, we're going to know whynot to do that
Insert (45:06):
join you to ignorant
fools and ridiculous
superstition. Thank you anyway,you boys are dumber than a bag
of hammers.
Patrick (45:13):
I am going to leave
with a cliffhanger, you may be
tempted to use those as IVs orDVS and other models and we
cannot recommend strongly enoughto not do that. And there are
statistical reasons that we'lltalk about on another day, but
do not use them as predictors oroutcomes in a subsequent model.
Greg (45:36):
Wait a minute, do you
think that I'm not
developmentally ready to be ableto handle that information? Is
that what you're saying?
Patrick (45:41):
You know what I'm gonna
need to push the episode is over
button, because I think we'vetalked enough so let me get
under my desk and Goodbye,everybody.
Greg (45:52):
Dad, don't push them.
Patrick (45:58):
Thank you so much for
listening. You can subscribe to
quantum tude on Apple podcast,Spotify, or wherever you listen
to audio intended to punish yourchildren while riding in the
backseat of your car. And pleaseleave us a review. You can also
follow us on Twitter we are atquantity food pod and check out
our webpage at quantity pod.orgfor past episodes, playlists,
(46:18):
show notes, transcripts andother cool stuff. Finally, you
can get cornered to the merge tosecretly connect with others who
also squander their valuabletime at Red bubble.com Where All
proceeds go to Donors Choose tosupport low income schools. You
have been listening to quantityfood, characterized by more
fabrications and embellishmentsthan George Santos his resume.
(46:41):
quietude has been brought to youby quiet quitting and academia a
phenomenon for which our lawyershave stated the quantity of
takes no responsibility andvehemently asserts that it is
pure coincidence that both wordsstart with a cue by Wednesday's
viral Tiktok dance. Honestly,we're just trying to associate
us with the video to burrowourselves into your brain.
(47:08):
And we conclude with aquantitative public service
announcement. Although Patrickdoes indeed catch squirrels in
his backyard. These are safelycaptured and are immediately
released into Dan Bauer screenedin porch. This is most
definitely not NPR