Statistics: Beyond Numbers

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
(00:00):
Hello everyone, welcome to the magnificence of mathematics. I'm your host, Eddie Kingston.

(00:20):
Last time we talked about probability and some counterintuitive ways it can behave.
Now we're going to talk about an adjacent field, statistics. Stats is a field that hits
close to home for me, because that's what I got a master's degree in. I only have
a bachelor's in mathematics. I have placed both the math and stats master's programs
in my university, and quite frankly the only reason I enrolled in the stats program is

(00:42):
that they got back to me first, and I figured that it would be a more lucrative option.
I'll talk more about my grad school experience later, but first I want to talk about what
stats even is.
Statistics has to do with the collection, analysis, and interpretation of data. Statisticians
use models to describe patterns in data and interpret them in ways that make sense to
us. It's important to note that models are just that, models. In the real world, no data

(01:08):
set will follow any particular distribution perfectly. There are two main flavors of stats,
descriptive and inferential. You might already be familiar with descriptive stats. Basics
of descriptive stats are what you learned in school, mean, median, standard deviation,
variance, etc. These are used to describe features of an entire population, like a census.

(01:29):
However, a lot of the time, getting an entire population's worth of data isn't feasible,
so researchers take a representative sample, or a subset, of the population. This just
means taking a relatively small amount of people that are bunched into one particular
group. Getting a representative sample is usually done by randomly selecting members
of the population to be in the sample. Then, researchers use the second flavor, inferential

(01:52):
stats, to draw conclusions and make general statements about the entire population. This
tends to take the form of hypothesis testing, say whether or not the proportion of listeners
of this podcast who live in the US is equal to 50%, or estimation, say a more specific
estimate of the proportion of listeners who live in the US. Inferential stats can extend

(02:12):
as far as forecasting stock prices and mining for bitcoin.
Let's talk a bit about inferential stats. There are two main camps that statisticians
live in, the Frequentist camp and the Bayesian camp. Frequentists believe that population
parameters and probabilities are fixed. For example, the probability of landing heads
on a coin can be found by flipping a coin over and over and over again and analyzing

(02:33):
its long term behavior. You should see that that number converges to 50%. Frequentists
use what's known as hypothesis testing to either see whether there's any relationship
among two variables, like whether the amount of roses the floor of cells goes up when they
lower the price of the roses, or to make a guess at how much the data changes over time,
for example whether the floors can sell more than 30 bouquets if they lower the price by

(02:55):
$5. When running a hypothesis test, researchers
come up with a null hypothesis, which is a starting statement that the researcher might
try to disprove, and an alternative hypothesis, which provides a different idea of how the
data behave. For example, a professor might want to see if a change in their syllabus
in a course they teach improves students' grades. The null hypothesis could be that

(03:16):
students' grades are unchanged, and the mean grade in the course is the same. The alternative
hypothesis could be that students' mean grades are higher, or at least unequal. This, in
particular, would be a great example for which one could use Welch's t-test, named after
20th century British statistician Bernard Lewis Welch. In general, the basic idea of
Welch's t-test is that you want to compare the mean of two different populations and

(03:40):
test whether those means are equal. The defining features and assumptions of Welch's t-test
are that the variances of the populations, that is, how far the data in each population
tend to stray from the mean, are unequal, and that the sample means in both populations
are normally distributed. This means that, for example, if a professor takes a bunch
of random samples of grades from their class, the means of those samples follow a neural

(04:03):
distribution. You may have seen what's called a bell curve when learning about stats. It's
a kind of figure where a lot of the data are centered around a certain value, and there's
fewer data points the further away from that value you go in either direction. From here,
statisticians collect some sample data, compute the means and standard deviations of the samples,
then use those values to calculate a test statistic. The idea is that if the null hypothesis

(04:28):
is true, this test statistic would come from a certain distribution, in this case a t-distribution,
which is in some respects similar but in other ways different from a normal distribution.
Then researchers use this value to calculate a p-value. A p-value is the probability of
observing a certain value assuming the null hypothesis is true. If we were comparing the

(04:49):
test statistic to, say, a normal distribution with mean zero, and we observed a test statistic
of, say, three, that's extremely unlikely that that test statistic came from a normal
distribution. So a researcher would reject the null hypothesis in favor of the alternative.
If the test statistic was closer to zero, say 0.3, they would say they fail to reject

(05:11):
the null hypothesis. Threshold p-values vary from field to field, but the most common significance
level, that is, the highest p-value which one would reject the null hypothesis, is 0.05,
or 5%. This would mean that if the p-value a researcher calculates is less than 5%, meaning
there is a less than 5% chance of observing a certain test statistic if the null hypothesis

(05:34):
was true, then the researcher rejects the null hypothesis in favor of the alternative.
Conversely, if the p-value is anything more than 5%, they fail to reject the null hypothesis,
which just means they lack sufficient evidence to rule out the possibility of the null hypothesis
being true. It's important to note that it is improper to say you accept the null hypothesis

(05:55):
is true. One of the most common methods to introduce students to hypothesis testing is
in a court setting. The null hypothesis is that the defendant is innocent. During a trial,
evidence is collected that may or may not point towards the defendant's guilt. The jury
then returns a verdict, either guilty or not guilty. Guilty, of course, means the jury
thinks that there is sufficient evidence to convict the defendant, so they reject the

(06:19):
null hypothesis of them being innocent. Not guilty doesn't necessarily mean the jury
thinks the defendant is innocent, it just means that the jury doesn't have enough
evidence to convict the defendant, so they acquit them.
Another important idea in statistics is the idea of regression. The most common starting
point is talking about simple linear regression, meaning there is only one dependent variable,

(06:41):
also known as a response variable, y, and one independent variable, aka an explanatory
variable x. For example, x could be the price of roses at a florist shop and y could be
the number of roses sold. Researchers can take a look at a graph of the number of roses
sold on a certain day versus when the roses were a certain price, and perhaps make a linear

(07:02):
approximation of how many roses would be sold when the price is a different amount. When
conducting a linear regression, researchers find an estimated intercept, which is how
many roses would be given away if they were free, and an estimated slope, which is how
many roses would be sold every time the price decreased by $1, or whatever interval they're
looking at. Remember y equals mx plus b from your high school algebra class? It's the same

(07:25):
idea here, b is your intercept and m is your slope.
What researchers do with this information depends on their goals. It could be either
to predict how many roses would be sold if the price was another amount that wasn't
looked at yet, or to estimate how the explanatory variable affects the response variable, ie
just finding and reporting about the slope. There's also multiple regression, which involves

(07:46):
multiple explanatory variables and one response variable. Multilinear regression, which involves
one explanatory variable and multiple response variables. And multiple multilinear regression,
which involves multiple explanatory variables and multiple response variables. And this
is just for continuous data. There's also ways to do regression with non-continuous

(08:06):
or discrete data. This is with what's called a generalized linear model. We've barely
started to draft plans to scratch the surface of statistics. There's so much more to it,
more tests to learn, more theory to talk about, than it couldn't possibly fit in one episode.
Instead, what I'll do is talk more about my experience as a statistics student and offer
my perspective of a former graduate student in general so that those of you listeners

(08:30):
who might be thinking about graduate school can get an idea of what it's like.
So my university went by a quarter trimester system, which we call terms. In any given
school year we have three terms, fall, winter, and spring, each consisting of 10 weeks of
classes, including a few extra days in the fall to make up for Thanksgiving towards the
end of the term, and one finals week. I took my first stats class in my second term of

(08:52):
freshman year, so January to March of 2019, where I learned about the absolute basics
of stats, like what I talked about here. Then I took two terms of a class called methods
of data analysis in each of my second and third terms of my sophomore year, where I
learned more about all the different kinds of tests that exist and the basics of multiple
linear regression. After that I took two terms of mathematical statistics, which start out

(09:16):
with the basics of probability that I talked about in my last podcast, but then quickly
go into the more mathematical side of hypothesis testing and do a lot of calculations involving
calculus.
By this point I had applied and gotten into grad school, so I was fortunate enough to
have finished my bachelor's in math in three years and decided to jump right into my first
year of grad school and what would have otherwise been my senior year of college. Grad school

(09:37):
lasted five terms for me, so a whole academic year and then fall and winter terms of the
year after that. The electives I took included multivariate analysis, talking about multilinear
and multiple multilinear regression, time series, which involved the analysis of temporal
data, one of my favorites, data visualization, where I got to make pretty graphs using R,

(09:58):
real analysis, which got into the depths of measure theory, which I talked about last
time, gradual level probability theory, which uses a lot of measure theory, and statistical
methods for genomics research. Besides these, my cohort and I took two whole three term
sequences on statistical methods and the theory of statistics, which got super mathematical
and took up the vast majority of all of our time. Those sequences made us learn a lot

(10:23):
because we were all tested on these classes via comprehensive exams or comps in each subject,
theory and methods. We studied for those comps throughout the year and especially throughout
the whole summer after our first year, with the immense help of one of our professors
who went above and beyond to hold study sessions on past comp exams to help prepare us better.
Huge, huge shout out to Sarah if you're listening to this, we all really appreciate your enormous

(10:46):
help. I was really particularly weak in a lot of the second portion of methods and theory,
mostly because my son was born two days after finals ended that term, and I was more stressed
about when he would come than about learning the material. I had to relearn a lot of that
stuff and practice it over and over again during the summer. Thanks to Sarah, my friend
Toy studied all summer with, and the department allowing us to use cheat sheets on each exam,

(11:09):
I was able to pass both exams at what was known as a PhD pass, which meant I scored
higher than the minimum required on each test to qualify for my school's PhD program,
as opposed to an MS pass, which was enough to graduate but not enough to qualify for
the PhD. I originally thought about staying and going for a PhD, but with raising an infant
son and just being burned out in general, I decided to just stop after five terms with

(11:31):
a masters, and whenever I think about it, I'm so glad that that's what I ended up
doing. I just finished this past March, so grad school for me went from September 2021
to March 2023. Don't get me wrong, grad school was overall very rewarding to me. I made a
few great friends who I still keep in contact with regularly over Discord, and got to work
with some absolutely brilliant professors, one of whom I learned in the middle of my

(11:54):
third term was the mom of a guy I went to elementary school with. But I would be lying
if I said I didn't notice my stress levels even in my last two terms when comps were
over versus when I was done and started a new job and moved into a new house in March.
I miss the people, but I don't miss working around the clock on homework and devoting
a significant amount of time each weekend to that plus grading for my teaching assistantship.

(12:16):
Speaking of which, those of you who went to college probably remember taking at least
one class that had a lab component where a graduate student walked you through how to
code something or how to dissect an animal or something depending on the class. I almost
never got the chance to lead a lab at all. All of my first three terms consisted of me
grading holding office hours for an intro stats class for engineers, and my fourth term

(12:37):
was split between grading for mathematical stats and for an intro class for a graduate
data analytics program, neither of which had me leading a recitation or lab. It wasn't
until my last term that I had my first class where I actually led a coding lab, which coincidentally
and poetically was for the very class that I first took as a freshman. I was a bit nervous
about the idea of publicly speaking and leading labs, but I actually started to enjoy it after

(13:00):
a few weeks. It helped that a lot of it was just me doing some basic live coding and R,
a programming language that I had been using for four years up to that point, not having
to make much eye contact with the students. It was mostly just 20 minutes of me going
through how to do some analysis that the students would have to replicate on their homework
and an hour of me walking around the room answering questions as students worked on

(13:20):
their lab assignments. The last term of the program involved working with a professor
on a project. For the master's program, this doesn't have to be anything novel. It could
just be you learning about something you didn't touch on in any of your previous classes and
writing a 10 to 15 page report and making a presentation on it. For example, one person
of my cohort who graduated the same term as me did their project on Deming regression,

(13:41):
a special kind of regression. My project involved using social networks to model relationships
and alliances and an online game of survivor, which was really fun and one of the highlights
of the program for me. I'll talk more about that project if and when I do an episode on
graph theory. If I had to give some advice for future grad students, here are some suggestions
I have. One, go to office hours, especially if you're in a situation like me where you'll

(14:06):
be assessed on the classes you take through not only a regular final exam, but also a
huge comprehensive one at the end of the year. Your professors want to see you succeed and
they want to do what they can to help you learn. Nobody will let you fail out of the
program as long as you do what you need to do to learn. Two, lean on your cohort for
support. I'm not saying y'all have to be best friends or even necessarily friends at

(14:27):
all, but at least work together on assignments and pick their brains every now and then.
More likely than not, y'all have different backgrounds and experiences and therefore
unique perspectives when it comes to problem solving. You might have certain strengths
and intuition to impart on your cohort and they might have different strengths and intuition
to impart on you. Three, join clubs or otherwise hang out with grad students in different departments.

(14:50):
Not only does this expand your worldview, but you have a greater chance of finding some
people who share some of your hobbies. Four, take some time for yourself when you need
it. Taking a couple hours to take your mind off of schoolwork will ultimately be more
productive than working all day on a problem. It also helps you become less burned out in
the long run. Five, don't stress about grades. As long as you get the grades you need to

(15:12):
stay in the program, which can usually be achieved by just staying on top of your work,
you'll be fine. Employers don't care about your grad school grades and if they do, they're
not a good place to work for in the first place. If you want to apply to a top PhD program,
however, then grades matter a bit more, but ultimately it's about the experiences you
get in your program. In the grand scheme of things, a grad school is what you make of

(15:33):
it. I'm personally glad I went through it and took the path I did, but I'm also glad
I stopped when I did and kickstarted my career. The important thing to note is you don't
have to be an absolute genius to survive grad school. A strong work ethic, a willingness
to learn, and collaboration skills are all you need to get by. At the end of the day,
don't work yourself to death if you don't want or have to. If you only want to get your

(15:54):
masters in skitaddle, it's only a year or two of your life. If you're going for a PhD
on the other hand, that's a much larger chunk of time spent in graduate school, but I can't
speak to what that's like. Maybe one day I'll revisit this topic and include someone
who went through a PhD program to get their perspective.
Thank you for listening to this episode of the Magnificent Mathematics. Next time, I'll

(16:15):
talk more about calculus, the foundation for modern math. See you then!
Algid Productions LLC Outro. Thank you for listening!

All Episodes

Episode Transcript

Popular Podcasts

Stuff You Should Know

The Joe Rogan Experience

Two Guys, Five Rings: Matt, Bowen & The Olympics

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}Statistics: Beyond Numbers

Episode Transcript

Popular Podcasts

.css-r6mb8g{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:1;overflow:hidden;}Stuff You Should Know

The Joe Rogan Experience

Two Guys, Five Rings: Matt, Bowen & The Olympics

All Episodes

Statistics: Beyond Numbers

Stuff You Should Know