Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
(00:15):
Welcome to Base by BASE, the paper cast that brings genomics
to you wherever you are. OK, imagine this scenario.
You've got two people, right? And both of them carry the exact
same dominant genetic variant, one known to be linked to
serious stuff like say, really high cholesterol or certain
types of diabetes. But here's the kicker, the real
(00:35):
puzzle. One person, they developed
severe symptoms, you know, needing serious mental health.
And the other person, they're basically fine, perfectly
healthy maybe, or perhaps the disease, but it's super mild,
almost unnoticeable. How is that even possible?
I mean, this isn't just some abstract thought experiment.
It's a huge challenge right now in genomic medicine.
How can we possibly predict who gets sick and like, how sick
(00:58):
they'll get when the genetics seem so well, so fuzzy?
It's a fascinating puzzle, this variability.
And that's exactly what we're diving into today.
And, you know, before we even start trying to piece together
the answers, we really have to give a shout out, acknowledge
the huge effort that went into this research.
This wasn't small potatoes. Oh, absolutely not.
You're spot on. Today we're really celebrating
some incredible work by a big collaborative team, mainly folks
(01:22):
from UCLA and also the Icon School of Medicine at Mount
Sinai. They just made really
significant strides in understanding why these genetic
diseases hit people so differently.
It's a major leap forward, honestly, in teasing apart all
those complex genetic factors that influence, you know,
whether disease shows up at all,penetrance and how varied the
symptoms are, Expressivity. Right.
(01:43):
So let's unpack that a bit. For a long time, I think we sort
of viewed many genetic conditions as pretty
straightforward, didn't we? Like 1 broken gene equals you
get the disease, end of story. But it's turning out to be,
well, a lot more complicated than that.
Precisely. Yeah, the the old view is a bit
too simple. We're often dealing with what
are called autosomal dominant monogenic diseases.
(02:03):
Now that just means you only need one copy of a, let's call
it a pathogenic or disease causing variant in one specific
gene to potentially get the condition.
Well what's really eye opening is the scale.
Over 3% of the entire populationactually carries one of these
dominant pathogenic variants 3. Percent.
That sounds like a lot. It is, and yet, just like you
said, only a fraction of those people, the carriers, actually
(02:25):
developed the disease they're supposedly at risk for.
And even when they do, wow, the symptoms can range from barely
noticeable to extremely severe. It's a huge spectrum.
OK. So that's the core question
then, isn't it? What explains that massive
variability? Why isn't it just like flipping
a switch? You have the variant, you get
the disease same way every time.Exactly.
(02:46):
And this study we're looking at today tackles that head on.
It meticulously investigates 3 sort of main ideas, three
factors that scientists have thought contribute to this
incomplete penetrance. Right, that term meaning not
everyone with the variant gets sick.
Yep. And also variable expressivity,
which is just the scientific wayof saying the symptoms vary a
lot from person to person. So the three factors they dug
(03:08):
into are first variable variant effect sizes.
Basically the idea that even within the very same gene, not
all bad variants are equally bad.
Some might only slightly mess things up, while others
completely break the proteins function.
OK, so it's not just a binary broken or not broken, but like
how broken a gradient of effect almost.
(03:29):
You got to think dimmer switch not enough.
The second factor is polygenic backgrounds.
Now, this is about the combined effect of, well, thousands,
maybe even millions of common genetic variants scattered all
across your genome. It's like your overall genetic
predisposition, right? The sum total of all these
little nudges that can make you,as an individual, slightly more
(03:50):
or less susceptible to something.
It's like a background noise that can either amplify or
dampen the effect of that one single major variant.
OK. So your whole genome kind of
sets the stage and the third factor you mentioned, it gets a
bit more complex. Yes, the third one is marginal
epistasis. This is where things get really
interesting, and arguably the trickiest bit to pin down.
(04:12):
It's about how your entire genetic background, those common
variants we just talked about, doesn't just add up, but
actually interacts with and modifies the effect of that
single pathogenic variant. You might care.
Corrects modifies, so not just adding effects together, but one
influencing how the other behaves.
Exactly. It's not additive, it's
interactive. Think of it like chemistry.
(04:32):
Mixing 2 chemicals might do something completely different
than just having them sit side by side.
Can you give an example that sounds quite abstract?
Sure. Let's take the MC SORAR gene
again. It's famously linked to obesity.
Now, some variants in MC4R definitely cause severe obesity,
as expected. But, and this is the wild part,
other variants in the exact sameMC4R gene have been found to
(04:54):
actually protect people against obesity.
Wait in the same gene. Some 'cause it, some protect
against it. Right.
It's strongly suggests there's acomplex interplay happening.
The effect of a specific MC4R variant seems to depend on the
context of the rest of your genome.
It's this interaction, this epistasis, that can flip the
script. It's incredibly challenging to
(05:15):
study but potentially holds hugepredictive power.
OK, yeah, that makes it much clearer.
But wow, tackling all three of those factors, variable effects,
polygenic scores, and epistasis sounds like a monumental task.
What kind of data did they even use to get at this?
Monumental is the right word. They needed massive data sets,
so their main source was the UK Biobank.
They used exome sequences, that's the protein coding parts
(05:37):
of the genome, and linked clinical information from
hundreds of thousands of participants.
We're talking huge numbers like the 200,000 and then the 450,000
exome releases. OK, UK Biobank, that's a
familiar powerhouse. Absolutely.
But crucially, they didn't just stop there.
They made sure to replicate their key findings in a
different large data set. That meant Sinai Bio Me Biobank.
(06:01):
This includes, I think, nearly 30,000 participants.
And importantly, it's a more ethnically diverse cohort than
the UK Biobank. That's key, isn't it, showing
it's not just a fluke of one population.
Exactly. Replication across diverse
populations really boost confidence in the findings.
It shows the effects are likely more generalizable.
So they had the data. What about the methods?
(06:21):
How do they actually isolate andmeasure these three different
sources of variability? Sound statistically challenging?
Very challenging. They employed 3 really cutting
edge statistical genomics methods, each kind of
specialized for one piece of thepuzzle.
So for digging into those variable variant effects, they
use a method built on something called the ESM 1B protein
(06:42):
language model. This thing is seriously cool.
Protein language model. What does that mean?
Well, think of it like AI modelstrained on human language like
GPT. ESM 1B was trained on millions
and millions of known protein sequences, learning the language
of proteins. Basically it wasn't trained
specifically on disease data because it understands normal
protein structure and functions so well.
(07:04):
It could then look at any possible amino acid change, any
mutation in any protein, and generate a score predicting how
likely that change is to mess upthe protein's function.
Whoa. Yeah.
It'll ask for a much more nuanced classification than just
pathogenic or benign, especiallyfor those tricky variants of
uncertain significance, or VUS where we just don't know their
impact. It predicts A continuous score
(07:26):
of potential disruptiveness. OK, that sounds incredibly
powerful, almost like predictingthe dimmer switch setting for
each variant, even ones we've never seen before.
That's a great way to put it exactly.
Then for assessing the polygenicbackground effects, they use
polygenic risk scores or PRS. These are pretty well
established now. Right, those combine the small
effects of many common variants.Precisely, they sum up the tiny
(07:49):
contributions from thousands, maybe millions of common
variants across your genome to give you a single score
representing your overall genetic liability for a trait
like high cholesterol. What they did here that was neat
was looking at very fine grainedquantiles like the top 1%, top
.1% of PRS to really see the impact at the extremes.
(08:10):
Gotcha. More precision there.
And what about the third piece, that really complex epstasis?
Yes, marginal interstasis. For this they actually brought
in their own novel method, something they developed called
FAME, which stands for a Fast Marginal Epistasis Test.
Testing for interactions betweena rare variant and potentially
millions of common background variants is computationally
(08:32):
brutal. It's been a major bottleneck.
Feng is designed to do this much, much more efficiently.
It quantifies how much that common genetic background
modifies the effect of the specific pathogenic variant
you're interested in. FFME.
OK, so a specialized tool to hunt for those complex
interactions at scale. And what specific conditions did
they apply all this tech to? They focused on common cardio
(08:53):
metabolic conditions, things affecting the heart and
metabolism. These are perfect candidates
because we know they're influenced by both rare variants
and common polygenic effects. So specifics included familial
hypercholesterolemia, that's thegenetic kind of high LDL bad
cholesterol. Also familial hyper alpha
lipoproteinemia, which is actually high HDL, the good
(09:14):
cholesterol, familial hypertriglyceridemia, so high
triglycerides, certain types of monogenic obesity and MODY which
is maturity onset diabetes of the young.
And importantly they didn't justlook at disease causing
variants, they also included known beneficial variants in the
same genes like ones that naturally lower LDL or protect
(09:34):
against obesity. That gives you the full
spectrum. Right, looking at both sides of
the corner. OK, so they had the massive
data, the cutting edge methods, the specific conditions.
What did they actually find? What were the big results when
they put it all together? Did these three factors really
explain the variability? The short answer, Yes,
absolutely. Their findings rovided really
strong statistical and clinical support for all three
(09:54):
mechanisms, laying a role in that variable enetrance and
severity. It was quite striking.
O First on the variant effect differences.
Remember that EM1B rotein language model?
They found his scores predictinghow disruptive a variant is were
tightly correlated with how severe the phenotype was in
carriers for I think 6 out of the 10 gene phenotype pairs they
(10:15):
studied closely. OK, so the predicted brokenness
score actually matched up with real world patient outcomes?
Exactly, and the MC4R gene example was particularly
telling. ESM 1B didn't just predict BMI
differences among carriers, it could actually distinguish
between the gain of function variants, the ones that protect
against obesity, and the loss offunction variants that 'cause.
It wow that level of nuance is incredible beyond just bad or
(10:38):
good. Totally.
It means ESM 1B could offer muchmore specific prognosis.
And crucially, it provided potentially meaningful
predictions for literally thousands of those the US
variants of uncertain significance, giving us clues
there before we just had question marks.
Plus, it actually outperformed all the other standard variant
prediction tools they compared it against.
(10:58):
That is a huge deal for clinicalinterpretation.
Were these ESM 1B finding solid though?
Did they replicate? They did.
That was key. They show the ESM 1B scores were
predictive even for extremely rare variants, which are often
the hardest to understand. And yes, the results held up
when they checked them in different UK Biobank data
releases and importantly in thatindependent, more diverse Biomi
(11:20):
Biobank cohort. So strong evidence, OK.
Second, finding the polygenic background.
This was also really significant.
They found that polygenic risk scores had a major impact on the
phenotypes of people carrying pathogenic variants.
For conditions like monogenic obesity, high HDL, the LDL
lowering variants, hydroglycerides, the carriers
PRS score independently influenced how severe their
(11:41):
condition was. But here's the part that really
makes you rethink things. Go on.
They found that hundreds, even thousands of people without any
known pathogenic variant but whohappened to have extremely high
polygenic risk scores actually had more extreme phenotypes like
higher cholesterol or BMI than some people who did carry known
pathogenic variant but had a lowPRS.
(12:04):
Wait, hang on. So your general background
genetic risk, your PRS could push you further into the risk
zone than someone who actually has the disease gene.
But a protective background? Precisely.
It's a powerful demonstration that the common variant
background isn't just noise. It can be a major driver of
risk, sometimes even outweighinga single high impact rare
variant. It really emphasizes looking at
(12:25):
the whole genomic picture. Mind blown.
OK. And the third factor, epistasis,
the interactions. Right, marginal epistasis.
This for me is maybe the most profound finding for where
personalized medicine could go. They found widespread
statistical evidence that the common genetic background
directly modifies the effect of the monogenic variant through
these epistatic interactions. They quantified this using a
(12:48):
metric they called the EpistaticImprovement Percentage, or EIP.
This basically measures how muchbetter your prediction gets when
you account for these interactions compared to just
knowing if someone is a carrier or not.
And what were those EIP values like?
Were they small tweaks or big changes?
They're often substantial. The EIP range from about 48% up
to a really staggering 100 70% in the significant associations
(13:10):
they found. For LDL cholesterol, for
instance, the IP was 170%. That implies a model
incorporating Episisis could be like 2.7 times more accurate in
predicting someone's LDL level than just using their carrier
status alone. 2.7 times? That's not subtle.
Not subtle at all. It strongly suggests these
interactions aren't just a minorfootnote, they are major players
(13:33):
in determining how a pathogenic variant actually manifests in an
individual. It's not just the parts, it's
how the parts work together. OK, this is genuinely picking a
much richer, deeper picture of genetic influence than the
simple single gene model. So stepping back, what does all
this mean? What are the big implications of
these findings for, you know, the future of genomic medicine
(13:55):
and maybe even for us just thinking about our own health
risks? Well, I think this study really
lays A crucial foundation. It moves us definitively beyond
simplistic explanations. It clearly establishes that
things like incomplete penetrance and variable
expressivity in these monogenic conditions aren't mysteries
driven by just one thing. Instead, it's a combination.
The specific impact of the variant itself matters, sure,
(14:17):
but so does the cumulative effect of your entire polygenic
background and these complex interactions, the pistices
between that rare variant and your common background.
All three are demonstrably important.
So we really need to think in terms of a genomic profile
rather than just single risk genes.
Exactly, and the implications for medicine are potentially
(14:38):
huge. This work sets the stage for a
radically better prediction. Think about it, instead of just
telling someone you carry the variant for condition X, which
might be terrifying, but not actually tell them they're
likely outcome, future clinical reports could integrate all this
information. The variant specific effects
score, the person's PRS, maybe even key interaction effects.
(15:02):
This would offer a much more accurate personalized prognosis.
Imagine knowing not just if you carry a risk variant, but
getting a reliable estimate of how likely it is to affect you,
and maybe even how severely. That would be transformative for
genetic counseling, for decidingon preventative measures, even
for tailoring treatments. Absolutely.
It could change the entire conversation around genetic
(15:23):
risk. But as always in science, I'm
guessing there are still challenges, right?
Limitations to this study or next steps needed.
Oh, for sure. The researchers are very upfront
about the limitations. For one, they mainly looked at
the standard or canonical forms of proteins.
But proteins can exist in different forms, called
isoforms, in different cell types, and how those variations
(15:44):
play into this is still largely unknown.
Another tricky issue is that measuring penetrance itself
isn't static. Think about it.
Screening guidelines change, doctors get better at diagnosing
things earlier, diagnostic thresholds shift.
And maybe the biggest confoundernow is widespread medication
use. Like statins for cholesterol or
the new weight loss drugs? Exactly.
(16:06):
If lots of people in a study population are taking statins,
their LDL levels might look artificially low, masking the
true effect of a hypercholesterolemia variant.
Or new obesity drugs could lowerBMI, again making it harder to
see the pure genetic effect on weight.
So these real world factors complicate the picture.
Right Successful treatments can ironically muddy the waters for
(16:26):
genetic research. It's a challenge.
And then there's the issue of diversity.
To really make these sophisticated models work for
everyone globally, we need even larger biobanks that capture the
full spectrum of human genetic diversity.
Genetic backgrounds vary across populations, and crucially,
those VUS variants of uncertain significance are far more common
in people of non European ancestry, partly due to
(16:48):
historical biases and who get studied.
So applying these models requires careful validation and
diverse groups to avoid making health disparities worse.
That's a critical point, ensuring equity as these
technologies advance. Definitely.
And finally, this kind of complex analysis integrating
rare variants, PRS and epistasiswill likely work best for
(17:09):
quantitative traits, things you can measure on a continuous
scale like cholesterol levels, blood pressure, BMI, because you
need that fine grained phenotypedata which biobanks often
collect. Makes sense.
But despite those hurdles, the overall potential here for more
precise, truly personalized genomic medicine feels enormous.
It really is, yeah. I mean the development and
validation of tools like ESM 1B are game changing.
(17:31):
Having AI that can look at a variant, even a totally novel
one, and give a nuanced prediction of its functional
impact, maybe even tell gain a function from loss of function,
that's huge. It opens the door to finally
classifying thousands upon thousands of VUS that currently
leave patients and doctors in limbo.
This study powerfully underscores why we need to
(17:51):
integrate everything, rare variants, common variants,
interactions to build that complete personalized genomic
profile. It's the path towards genuinely
empowering precision medicine. So if we had to boil it all
down, what's the big take home message from this really deep
dive into our genetic complexity?
I think the core message is this.
For many conditions, especially complex ones like cardio
(18:13):
metabolic diseases, carrying a single bad gene variant is
really the whole story. Your unique genetic blueprint,
which includes the specific punch packed by that variant,
plus your overall background, genetic susceptibility from
common variants, and the subtle ways all these parts interact.
That whole package is what really shapes your risk and
potential outcome. It's the combination, the
interplay, that matters. So the final thought perhaps is
(18:35):
what is this shift towards integrating all this layered
genetic information really mean for the future of preventing and
treating disease, maybe even foryou personally?
Something to definitely Mull over.
This episode was based on an Open Access article under the
CCBY 4 Point O license. You can find a direct link to
the paper and the license in ourepisode description.
If you enjoyed this analysis, the best way to support Base by
(18:57):
Base is to subscribe or follow in your favorite podcast app and
leave us a five star rating. It only takes a few seconds but
makes a huge difference in helping others discover the
show. Thanks for listening and join us
next time as we explore more science base by Base.