Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
(00:02):
Half the challenges of runningRCT aren't entirely technical.
In fact, I would say 80%revolve around the logistics
as well as the human aspects.
So that involves kind of the IRB approval.
We had to work with the Cedars-SinaiIRB so that they were comfortable with
the amount of consent we're providing.
We thought it was minimal riskbecause we were not impacting
(00:23):
the sonographer in any way.
And also the cardiologistsfinalize everything so that it's
ultimately the cardiologist's call.
The technical implementation wasimportant in that we directly
integrated with the software thatthe clinicians use to assess echo.
This is called PAC system orstructural reporting system where
we wanted to show the AI modeltracings in the exact same way that
(00:45):
sonographers and cardiologists see it.
So, Brian Hie, one of the Stanford Ph.D.students, actually directly embedded the
model into syngo Dynamics, our PAC system.
But the overall highlight, and I thinkthat this was a Twitter post, which
people asked me, how much did this cost?
This study cost roughly 20bottles of wine, 50 Starbucks
(01:07):
gift cards, and then my effort.
That's amazing.
Welcome to another episodeof NEJM AI Grand Rounds.
I'm Raj Manrai and I'm excited todayto bring you our conversation with Dr.
David Ouyang.
David is a cardiologist and researcherat Cedars-Sinai Medical Center in Los
(01:28):
Angeles, and he's been at the forefront ofapplying AI to cardiology for many years.
He took us behind the scenes ofsome of his biggest papers in this
space, and it was really eye openingto hear how creative and clever he
was in launching a clinical trial atsuch an early phase of his career.
This is a career stage where I thinkit's tempting to say that running
a clinical trial is something I canonly do later, when I'm much older,
(01:50):
but he really countered that narrative,and he showed us just how he did it.
He also told us about his foundationin statistics and his time working
with some of the giants in the field.
And I think this really shines throughnow in his approach to AI and medicine.
The NEJM AI Grand Rounds podcastis brought to you by Microsoft,
Viz.ai, Lyric, and Elevance Health.
(02:14):
We thank them for their support.
And with that, we bring youour conversation with Dr.
David Ouyang.
Well, David, we're super excited to haveyou with us today on AI Grand Rounds.
Thanks for coming.
Yeah.
And Raj,
thanks so much for the invitation.
I'm really glad to be here.
David, this is a question we ask allof our guests right at the beginning.
(02:37):
Could you please tell usabout the training procedure
for your own neural network?
How did you get interested inAI and what data and experiences
led you to where you are today?
So, I was born in China, grew up in Texas, spent some time in
Houston, Dallas, as well as Austin.
Went to Rice University, majoredin statistics and biochemistry.
(02:57):
My statistics advisor was actuallyHadley Wickham, who was the
creator of ggplot and tidyverse.
So really learned a lot from him.
Went to UCSF for medical school,initially thought I wanted to do an M,D.,
Ph.D., but ultimately decided on doing M.D..
And then was at Stanford forresidency and fellowship.
That's where I met a lot of closecollaborators and mentors, people
(03:20):
you guys already had on thepodcast, including Ewan and James.
And then, now I'm currently an assistantprofessor at Cedars-Sinai Medical Center.
So we always like to pull it backa little bit further than that.
So what, intrigued the young Davidabout statistics and medicine and
how did you kind of like get seton that path in the first place?
Let's go beyond the CV bio for just a sec.
(03:42):
So, I'll do a brief tangent, but haveyou guys seen the movie Arrival?
Yeah, the high-level overviewis that aliens come to earth.
But they teach the humans alanguage, and that completely
changes their perspective, right?
In that movie, the languagedoesn't have a beginning or end,
and that they can actually thensee into the future in the past.
(04:04):
But I really believe that kind ofhow we think about questions, whether
in science or how we apply them,is really based on our training.
Obviously, medicinetraining is a crucible.
It's one of those things where,by sheer nature, being the only
person there at the end of thenight, you learn a lot of medicine.
But I would say that I've really had alot of value and a lot of my perspective
(04:24):
comes from working with Hadley in termsof data visualization, knowing how,
I think variables are structured, howyou actually want to present it, how
you can potentially hide things, howyou want to elevate things actually
has a, gives me a lot of perspectiveon what problems I choose for research
and what are tractable AI problems.
I can't believe I didn't know thatyou had worked with Hadley Wickham.
(04:46):
That's such an amazing factoid.
I remember his ascendancy in the early2010s when I was a grad student, and
he had the grammar of graphics and Ialways thought he was like the coolest
statistician, that I had ever heard of.
And so now like in hindsight, itmakes complete sense that of course
you worked with him because I thinkyou're also like one of these very
cool nontraditional statisticians.
So, it's like, Arrival, it's likeall coming together for me now.
(05:09):
Fun fact, and I think Hadley shouldspeak for himself, but Hadley
actually went to medical school.
He is from New Zealand, wentto medical school and ended up
deciding medicine's not for him.
And he was a professor at Rice.
And I think similar falling throughyour footsteps, Andy, I think
ended up going to our studio.
I think they now have a new name andwent to really focus on leveraging
(05:30):
kind of technology for data science.
And maybe just some context herefor our medical listeners who
don't know who Hadley Wickham is.
So, Hadley had, has launcheda revolution in data science.
He has created a suite of
R packages for data visualizationbut has also reinvented how we
visualize data in the first place.
He's so influential that like oneof my good friends actually named
(05:50):
his daughter Hadley, in recognitionof Hadley Wickham's contribution.
So, a complete legend inthe field of data science.
Yeah, when I was in medical school,he would go to all these tech companies
to give master classes. And he wouldactually need a research assistant or
someone to help walk through the crowd.
And so I was really fortunate to,actually, we went down to eBay and I
(06:12):
helped some of their full time peoplelearn, ggplot and stuff like that.
So that was actually a really funexperience and also part of my
initial interest in technologyand I guess tech companies.
David, that's amazing.
Like Andy, I can't believe I didn'tknow that you sort of had this, uh,
this arc and this, this, uh, kind ofresearch experience working with Hadley.
Could you tell us, my guess is that itmight be R, but it might be something,
(06:35):
uh, totally different and preceding sortof your, your interaction with, with him.
But what was your introductionto programming and data
science and computer science?
Did you start programming in high schoolor was this with Hadley in college?
Yeah, so fun fact.
I went to a math and science high school.
I went to the same high school as Roxana.
That's where I learned C++, Java.
(06:56):
It was only during undergrad where it wasprimarily R and Python. And obviously
with deep learning in the last couple ofyears have been much more Python specific.
I would still say that for R manuscriptsthat I handle the figures for,
I still primarily do it in ggplot.
So another like unknown connectionto me that you and Roxana, another
(07:19):
guest of the podcast and editor atNEJM AI, went to high school together.
That's amazing.
Amazing.
So, David, we want to dive now,deep into some of your research.
So, you're a physician scientist, you'rea practicing cardiologist, a deep learning
researcher, and an echocardiographer.
And I think Andy would agree with methat you're among probably the very best
(07:41):
examples of anyone that either of usknow to truly blend the clinician and
researcher parts of your life, the sort ofphysician scientists, both sides there.
You know, you do many things, butI think one of the recurring topics
in your papers is about applyingAI to cardiology and in particular,
imaging studies and echocardiograms.
I think the very first paper of yoursthat I read was a paper that you
(08:02):
published in Nature back in 2020.
And the title of the paper is"Video-based AI for beat-to-beat
assessment of cardiac function."
So maybe we could start with that one.
Could you tell us about what thatpaper is all about and maybe the
backstory as well. How that projectgot started, where you were, and if
you could frame also where the fieldwas at at the time and what you were
trying to do with that particular paper.
(08:25):
Yeah, Raj, thanks somuch for the comments.
You're, you're very kind and it's avery generous introduction to the paper.
When I was a cardiology fellow, I realizedthat a big part of cardiology practice
is in interpreting images and videos.
Echocardiography is the most commonform of cardiac imaging, and it's
(08:45):
something that overnight I had tobe called to do and to interpret,
oftentimes with backup, but many timesfeeling more and more comfortable
that I could do the assessment.
When I was doing this as a cardiologyfellow, I realized that we actually
have large imaging databases thathas tremendous information that spans
essentially, the full spectrum of disease,from healthy individuals getting echoes,
(09:10):
to patients with very sick hearts in theICU, as well as needing heart transplant.
And this was back in the era, Iwould say maybe from 2018 to 2020,
when convolutional neural networkswere getting much more excitement.
The caveat is that at that time, therewere very few video-based architectures.
(09:31):
Most CNNs are image-basedas opposed to video-based.
So that temporal relationship andinformation was not well captured.
And we thought that echocardiography,or cardiac ultrasound, is very inherently
a video-based modality, and it's reallythe place to actually do medical AI
that integrates both very technicalinnovations, as well as applications that
(09:51):
leverages those technical innovation.
So cardiac function, or leftventricular ejection fraction,
is the most important and standard assessment in echo.
It's how we decide whether someonehas heart failure or not. It's how we
decide whether someone can still getchemotherapy or not. If there's toxicity
and a variety of many other reasons.And that's something that's a very both
(10:13):
visual as well as temporal assessment.
So, our paper in 2020 was using kind of3D ResNets, kind of R2 plus 1D, R3D,
as well as other approaches to assessheart function, and assess this at many
places. And that's where we already didexternal validation at Cedars-Sinai to
show that this is a generalizable model,but also showed that if you're able to
(10:36):
use a video-based model, hopefully youcan actually have more precision than the
cardiology assessment of these measures.
David, for the ML folks who arelistening to this, can you just quickly
explain what ejection fraction is?
Yeah, the left ventricular ejectionfraction is a ratio of the heart
size when it is full of blood andwhen it pumps all the blood out.
(10:57):
That's our crudest and most commonassessment of heart function,
meaning that a really strong heartsqueezes roughly about half the
blood out in any given heartbeat.
And a really weak heart squeezesmaybe 10% of the blood and
potentially need multiple heartbeatsto have the same amount of blood
going through the rest of your body.
It's a quantitative measurement.
So, it's a number between 0-100% but it's also a very
(11:22):
video-based measurement where youneed to capture that relationship
where the ratio of the largest andsmallest sizes to assess that function.
Cool.
David, that's amazing.
So, you know, in that paper,right, you're applying this
neural net architecture to videos.
And I think, you know, you hit it there.
That's the sort of key technicalmethodological contribution of that paper.
(11:43):
And then you're setting up a task thatis how clinicians themselves, right,
how echocardiographers, sonographersinterpret the studies of videos, not
single images like the sort of previousera of AI applied to similar tasks.
Could you tell us maybe aboutthe clinical significance of
some of the findings there?
Like, was it a sort ofvery clear improvement?
Was it obvious to you that thiswas working right out of the box?
(12:06):
Or did you have to sort of refinethe task iteratively and even
composing that original Nature paperto know that this was working or to
demonstrate that this had potential?
Yeah, we developed both regression aswell as segmentation models, meaning
that we had a model that both identifiedthe left ventricle, so it can actually
trace the part of the heart that isrelevant for this measure, as well as
(12:27):
come up with a number that actuallyrelates the assessment of heart function.
When I first saw the model output forsegmentation, I was like, wow, that
is actually something that is bothincredibly laborious that sonographers
and cardiologists have to do.
And it's not clear that I can tellthe difference between me doing
it versus the AI model doing it.
(12:49):
There's a bunch of reasons why it'sbetter when software or AI does it.
It can do it every singleframe and every single beat.
It can be more preciseand more reproducible.
It never gets tired.
It never gets tired.
And it ultimately, I think we, weshow in the paper that it actually
allows for a more precise assessment.
And so where is that work now?
(13:10):
So, you know, fast forward a fewyears, are you applying that same
algorithm in your hospital andother hospitals that you work with?
Or has it sort of, was it an interestingkind of research study that led to
then the next incarnation of thesemodels to be applied in practice?
Yeah, so the next step in ourassessment after the paper was
published, one, we made sure thatall the code and all the underlying
(13:34):
training data was publicly released.
So as part of the Stanford AMIdatasets, we were really fortunate
that we've had many other people useit, but we also wanted to see what
it looks like in clinicians hands.
So, in 2022, we conducted ablinded RCT of this assessment.
This is when I alreadymoved to Cedars-Sinai.
So, we asked the Cedars-Sinai echo lab forthe sonographers to trace about 4,000
(14:00):
echoes, showed the cardiologist 2,000 ofthose, where the other 2,000 we swapped
out what was being assessed by AI andconducted a blinded pro-selective RCT.
That was when we showed that clinicians,one, can't tell the difference between
AI and sonographer, and two, are lesslikely to change that assessment.
So, it was, while it was a non—.
(14:22):
When the AI had generated it.
Yes, we were looking at the differencein assessment from the preliminary
assessment from the sonographer or AIand the final cardiologist assessment
when you ask them to change it towhat they think is the ground truth.
So, I'm jumping ahead because we wantto, you know, this is really unique.
I think you're a machine learningresearcher and AI researcher who's
been able to lead a RCT where we haveso many questions about how you do
(14:46):
that, how you get that off the ground.
But maybe while we're talking aboutthis particular study, can you just
tell us about, what was the, what werethe sort of hurdles to even launch
this kind of a study at Cedars-Sinai?
I guess maybe one way to frame itis, you know, Andy and I like to talk
about the socio-technical challengesof doing research in academic medical
centers and universities. And sometimesthere's technical problems, right,
(15:08):
technical challenges that you haveto solve, and oftentimes there's sort
of social ones, where whether it'sorganizing people around a particular
task, getting approval for a study,getting funding, assigning that sort
of funding to a particular study. Howdid you navigate those challenges and
what were sort of the main thing thatyou solved to even launch that study?
Let alone then conduct it and publish it.
(15:29):
Yeah, Raj, this is areally great question.
Half the challenges of runningan RCT aren't entirely technical.
In fact, I would say 80%revolve around the logistics
as well as the human aspects.
So that involves the IRB approval.
We had to work with the Cedar-SinaiIRB so that they were comfortable with
the amount of consent we're providing.
We thought it was minimal risk becausewe were not impacting the sonographer
(15:53):
in any way, and also the cardiologistsfinalized everything so that it's
ultimately the cardiologist's call.
The technical implementation wasimportant in that we directly
integrated with the software thatthe clinicians use to assess echo.
This is called PACS system, orStructural Reporting System,
where we wanted to show it.
The AI model tracings in theexact same way that sonographers
(16:15):
and cardiologists see it.
So Brian Hie, one of the Stanford Ph.D.students, actually directly embedded the
model into syngo Dynamics, our PAC system.
But Raj, I think the overallhighlight, and I think that this
is a Twitter post, which peopleask me, how much did this cost?
This study cost roughly20 bottles of wine,
(16:36):
50 Starbucks gift cards,and then my effort.
And so what I would say is that it'sa nominal amount of money to do these.
Priceless.
This is a MasterCard ad or a Visa ad.
That's amazing.
That's amazing.
Okay.
So every time there's sort of a—.
Just to get the order of operations
there right.
Did you get
buy in after the 20 bottlesof wine had been consumed,
(16:59):
or was that a thank you gift?
That was a thank you gift.
And I would say that was notpaid for through my startup.
That was a personal thank you gift.
Got it.
Got it.
That's amazing.
You know, so maybe just walk usthrough a little bit about the
study in a little more detail.
So, what was the sort ofprospective component?
What was the evaluation, the sort offigure of merit, what were you evaluating?
(17:22):
And it's kind of similar to the questionthat I asked about the first paper
that we talked about, what was themagnitude of the difference between
the two arms here, between the AIread and the sonographer, and how did
you tell that that was clinically orpotentially clinically significant?
Yeah.
Echo is a really unique place wherethere are two clinicians looking at
every individual image, by which I meanthat the sonographers or the ultrasound
(17:44):
technicians that acquire the imagesoftentimes do a preliminary assessment as
part of standard clinical practice, butthey themselves are very much experts.
So, in the standard clinical practice,a sonographer gets the images from
the patient, actually traces the leftventricle, and then have a preliminary
assessment of ejection fractionthat's finalized by a cardiologist.
(18:06):
Because this is a two-cliniciantask, we thought this was a perfect
place to blind and randomize, meaningthat we can inject the AI model to
duplicate or simulate the work or partof the work that a sonographer does.
And then because it's already doneasynchronously, the cardiologists have a
set of echoes that they have to look at.
(18:27):
And if you shuffle and randomize on anyindividual study, they might not know
whether it was done by AI or sonographer.
Our primary endpoint was the differencein the preliminary assessment.
So, whether by AI or sonographer,as well as the final assessment of
that same study that's been adjustedand finalized by the cardiologist.
(18:47):
We thought a more than 5% changein ejection fraction was a
clinically meaningful change and wewere looking at it initially as a
non-inferiority study but assessing forwhat is the proportion that was changed
by the sonographer arm and the AI arm.
Can, maybe just to, to
ask a follow-up question there.
One, I mean, I'm sure because theywere enrolled in a study, but were the
(19:10):
cardiologists aware that they couldbe grading the output of an AI system?
And if so, did you capture their self-perception of whether or not
it was from a sonographer or an AI system?
Yeah, the cardiologists were aware thatthey were participating in a trial.
Any individual study, they did not knowwhether it was a sonographer or AI.
(19:31):
And after each study, we actuallyasked them if they thought
it was AI or a sonographer.
So even outside of the metricsof precision and accuracy that we
assessed, we evaluated how oftenwere the cardiologists correct.
They were correct abouta third of the time.
They were incorrect a fourth of the time,and they just didn't give us an answer or
said they can't tell the rest of the time.
(19:52):
Were there any interesting,like sub-analyses of the
ones that they were correct?
Did they have any inherent bias?
If they thought it was an AI, didthey grade it differently than if
they thought it was from a human?
Yeah, there was actually minimal subgroupvariation, whether they thought it was
AI or stenographer, they changed itless frequently when it was the AI arm.
(20:13):
Or there was less change ininitial to final EF in the AI
arm in almost all subgroups.
Because I could just imagine this likecomplex cognitive bias system where
they're not actually grading the thing.
They're trying to guess who made it.
And depending on their own cognitivebiases, if they guess that it was an
AI derived thing, then they mark it up.
(20:33):
Or if they guess if it's asonographer-based, then they don't.
But it sounds like that that didn'treally come through in this study.
Yeah, it's a really good point and a bigreason why we think blinding is really
helpful and necessary for the assessmentBut there was no subgroup variation.
Cool.
Awesome.
And David, you said that the firstauthor of the paper, Brian Hie, that he
(20:54):
had to convert the images or convert thevideos or convert the outputs, right,
into the sort of the same kind ofstandard way that the cardiologists were
used to reading these studies, right?
Yeah,
so one of the benefitsof AI is that AI doesn't tire.
It can actually measure every singleframe and every single beat, but
that would remove the blinding.
(21:15):
So, we intentionally weakened the AIand Brian actually reverse engineered
the PAC system and did a SQL injectionso that it actually looks exactly like
and was annotated in the exact sameway as if a manual annotation was done.
So, 20 bottles of wine, something else, andone SQL injection to accomplish this RCT.
I was about to say, this is the only time the hospital IT staff has heard
(21:38):
SQL injection and not immediately had a heart attack themselves.
Right, right.
So, this is very much, I think,hospital IT was very helpful.
We did this all on the developmentbranch of our PAC system.
They were definitely and appropriatelyskittish of us doing this in the
production branch, but even havingaccess to the development branch,
is something that, Raj, to yourinitial point, needs a lot of soft
(21:59):
power and, needs a lot of coaxingand encouragement and collaboration.
So, I was going to ask a follow-onquestion about where this work is at
now, but I think this might actuallybe a good transition point to what
Andy wants to talk about, which is howyou start companies and how you move
from academia into, into products.
So, Andy, do you want to take it away?
Yeah.
You read my mind, Raj.
So, I think this is, so when I likesometimes, like we'll teach to or lecture
(22:24):
to folks from the business communityabout how you develop an AI product. And
literally mainly what I'm doing, David, istelling them about your work. Because not
only have you done the like initial modeldevelopment, the trial, you then have also
passed the academic membrane and gottenFDA clearance and have a commercially
available product.
So, could you tell us the legacyof these several studies that you've
(22:45):
done, what your commercializationstrategy and journey has been like?
Maybe we'll pull out some threadsthere that could apply to lots
of folks who are listening.
Yeah, Andy, this is a reallygreat question and something
I'm really passionate about.
In 2022, after the RCT was done,Brian and I thought this is really the
right time for clinical deployment.
(23:06):
This is not necessarily the skillset of an academic medical center.
In fact, there was a lot ofregulatory hurdles and efforts
necessary to get FDA clearance.
But we underwent the summer batchof Y Combinator, accelerator.
We fundraised and we thought this wasreally the perfect place to deploy AI.
(23:26):
I would say that our company, InVisionMedical Technology Corporation, has the
same thesis as my lab, which is that we think that AI can
help the accuracy of cardiovasculardiagnosis. It will both improve the lives
of patients and clinicians, and that wehave a unique perspective on how to get
that into clinician and patient hands.
Awesome.
So, could you say a little bitmore about your YC experience?
(23:49):
So, for listeners, YC is Y Combinator.
It's this very well-known startupaccelerator where you go in almost
pre-product, and they help you develop,they help you accelerate the idea, and
eventually lead with some seed funding.
So, can you
give us a little peek inside likewhat the Y Combinator machine is like.
Yeah, the great part of co-foundingand leading a startup is that there's
(24:13):
a lot of learnings that I also takein from my research lab itself.
The first is to work hard and work fast.
So, I would say that when we did YCombinator, we were very much encouraged
to really have something ready fordemo day to really build relationships
with hospitals that I really helpedstart some of those conversations,
and really to think about what is thenext stage, actually just work harder.
(24:35):
The bias towards action, as people liketo call it, is in of itself a superpower
or good reason why startups are reallygreat and potentially have a competitive
advantage compared to incumbents.
But we really felt like the opportunitywas there and it was an opportunity that
I think will inevitably help patients.
(24:55):
I thought I was listening toPaul Graham for a second there.
That was great, David.
So now putting on my entrepreneurialhat, who's buying this?
What is the business case?
What is the market?
And what's the commercializationstrategy for, if I'm a hospital CIO,
why do I want to, what'sthe use case for me?
I'm putting money down on this.
Yeah, Andy, that's a great question.
(25:16):
Fundamentally, the biggest challenge forsoftware as medical device companies
right now has been reimbursementor what is the financial value
versus what is the clinical value.
I think through our papers, we show thatthere's strong clinical value and proves
the precision of this assessment, whichhas a lot of downstream implications
and diagnosis and treatment of patients.
And it's also something that savessonographers and cardiologists
(25:39):
time, which has business value.
The challenge in this space is thatoftentimes AI algorithms that reproduce
or aid clinicians in the assessment of things that are standard assessments do
not have a strong reimbursement strategy.
I would say CMS as well as privatepayers will say actually that's something
the cardiologists already should do.
(25:59):
Why should we give youadditional payment to do this?
This is an area where we have topitch the idea that this very much
streamlines and improves clinical value.
But potentially for a hospital CIOor CTO that we need both the clinical
use case and the clinical value aswell as the business value where
that becomes a little bit weaker.
(26:20):
In parallel, we've really focused ondisease diagnosis, particularly we
have an FDA breakthrough designationalgorithm for the screening of cardiac
amyloid, where we found much morealignment between all the stakeholders.
Cardiac amyloid is a rare disease.
It's often missed, but the necessaryinformation is already in the echo.
So, we've built an algorithm that wasa JAMA Cardiology 2022 paper that
(26:42):
actually allows for early screeningand early recognition of this disease.
This is something that we've actuallycollaborated with life science partners,
as well as I say nonprofit organizationsthat are interested in improving the
precision of medical diagnosis toreally push a product like that forward.
We recently got a CPT code for this thatwill actually be available in January of
(27:03):
2025, and that is where there is alignmentbecause this is a task that clinicians
traditionally are not very good at.
That's awesome.
So that's the second CPT code,
I think, on the podcast, Raj. Abramoffwas on the podcast a while back.
He got a CPT code for his device.
So, I think that makes total sense.
David, I wonder if I could get yourthoughts on the cognitive dissonance
(27:25):
sometimes in medical AI, specificallyas it relates to commercialization.
So, I think without a doubt, you, as yousaid, you have demonstrated the potential
clinical and patient benefit of thistechnology. It works tirelessly.
It never gets tired.
It never goes on vacation.
It's as accurate as the human equivalent.
And yet there's this challenging valueproposition that we have to make.
(27:50):
How do we fix that?
Like, cause that to me has alwaysbeen one of, and I know that's
like a very broad and big question.
We had Vijay Pande on a couple weeks agoand asked him like a similar question.
But like, this to me has always beenlike some of the tension, in medical
applications for AI is that you canactually have the world's best product
but there may not be an actual productmarket fit for that because of these
(28:11):
like competing incentive structures.
Yeah, no, this is a really great question,and I think that a lot of companies in
this space are struggling with that.
I can't say that I have the solution.
The hope is that if you creategreat clinical value, the
business value will come.
But definitely not having an M.B.A. and notnecessarily having a large war chest to
(28:33):
actually push forward a lot of kind ofregulatory or kind of financial changes.
The real hope is that
eventually more and morepeople will use the product.
We'll see that it's valuable.
And then there's more of abusiness case for its value.
I wish I had a better answer.
I think that's the best answerthat anyone can give right now.
I was just curious, given yourlived experience of being on both
(28:53):
the founder side of this and theclinician side of this, but I agree.
It seems intractable.
If I knew how to solve it, Iprobably wouldn't be behind this
microphone right now, becauseit would be a huge opportunity.
This is where I would also highlightour relationship with life science
for our cardiac amyloid product, whichis, that is actually an area where
there is more alignment, by which thatin the last five years or so, there
(29:15):
are multiple new therapeutics that aretargeted for this disease that both
improve morbidity and decrease mortality.
In fact, one of the ESC headliners foris another therapeutic, but ultimately
this is an area where there's alignmentbecause it's an underdiagnosed disease
that has a really potent ormultiple potent therapeutics.
And so would this be like a, like acompanion diagnostic kind of thing
(29:38):
where there's a diagnosis thatthen you would need for treatment?
It's an area that's tough becauseyou also don't want to self-deal.
There are laws against having companiondiagnostics that directly funnel to a
particular company, but this is an areawhere there's There's a clear unmet
need and there's a clear therapeutic.
One big challenge for AI is that,almost uniformly, AI is a diagnostic
(30:01):
where a lot of the ways that we thinkabout medical treatments are for
therapeutics and we try to pigeonholea lot of the metrics for, into kind
of the framework for therapeutics.
Got it.
Yeah.
And reimbursement always happens fortherapies, which makes those business
models like much clearer, which is whyyou see like huge rounds, uh, for biotech
companies making therapeutics, butdiagnostics have this kind of imbalance.
(30:26):
So, thanks, David.
I think that was fantastic.
I think this is a great time totransition to the lightning round.
If you're ready. Let's do it.
You have so far done what few have doneon this podcast, which is escape LLM
discussion this deep into the episode.
So, we have to bring it up now sinceit's been such a popular topic.
(30:47):
We've heard a variety of opinionson this specific question, but I'm
curious on your take given your likevery cross-functional background.
What impact do you think LLMs will haveon medicine in the next five years?
I think LLMs will have a tremendousimpact, and we're already seeing it,
both kind of the submissions that we'regetting for NEJM AI, as well as kind
(31:08):
of use cases from companies rangingfrom Epic to Abridge and Microsoft.
That said, maybe this is acontroversial opinion, but with
trainees, I don't recommend they doLM based research projects in my lab.
The reason I say this is that for anyresearch project, you both want to
care about the research project as wellas you want to learn a reproducible
(31:30):
skill that will inevitably be useful.
The challenge with a lot of these I wouldsay closed-source LLMs is that you're
primarily doing prompt engineering.
There's no theoretical basis or empiricalevidence that your strategies for GPT-2
works for GPT-3, will work for GPT-5, and etc.
So, I don't believe that that is areproducible or kind of continuing skill.
(31:52):
So obviously it's really useful to useLMs for projects, but I don't recommend
it to be the core of any particular focus.
This is also different if you're using
LLaMA or you're actually doing the
hard work of fine tuning and training.
That is an interesting take becauselike my naïve thoughts on this
is that it actually is a boonto like busy medical students.
(32:13):
They can do it independently.
So, if you don't have coding skill, ifyou don't have access to a wet lab,
there's still a set of interestingyet transient questions that you
could sort of self-investigate.
But I agree if your core focus isskill development as a researcher.
Then if you do it in LLMs, it's a skillthat will require constant maintenance.
(32:37):
Because the, they change sort of likeon a monthly, if not weekly basis, what
you did for a paper now, it's probablynot helpful or relevant for what you
might have to do a year from now.
Yeah, I would definitely say for non-technical members of the lab,
say potentially someone with a medicalbackground that doesn't have any
prior computational background, it might be a reasonable project.
(32:58):
But my main hesitation is that, oftentimes, it's like cotton candy.
You put a lot of effortinto prompt engineering.
You think that you have a really robustthing, but then it becomes really
brittle, or the more you touch it, yourealize it's more of an assessment of
the model or assessment of your promptengineering strategy than an underlying
theme that's related to medicine.
(33:20):
I agree.
And in honoring the spirit of thelightning round, I will hand it
off to Raj for the next question.
I was going to say, I was like, that wasthe, it was so interesting that I didn't
want to break it up, but it was thelongest lightning round response ever.
So, congratulations on that.
I think, yeah, this is not lightning.
I think we discovered that if we don'ttalk about LLMs before the lightning
round, the lightning round turns intoa longer, a longer back and forth.
(33:43):
Okay.
Anyway, moving to the next question,David, if you weren't in medicine,
what job would you be doing?
I think I have the perfect job.
I would say that it's a crossfunctional job that does both
research and clinical work.
If I weren't in medicine, I think I'dlike to be in tech in some fashion.
My wife is a product manager.
(34:03):
That seems like a lot of fun.
Being an engineering manager ofsome sort is also really cool.
Excellent.
Awesome.
So, this is a question that much inkhas already been spilled about, but I
have to ask you, given your background,do you think medical students should be
required to take more statistics courses?
I would definitely say, like going backto that Arrival theme, that it really
(34:27):
impacts and influences how I think.
I don't think every clinician needsit, but if you want to be a clinician
in AI, it is absolutely necessary.
I'm just
reliving very long debates that Andy and I would have as post docs
sitting at, Countway Library about thistopic and we would, it would be like
hours and then we'd return back to our,what we're supposed to be doing.
(34:50):
We would return back to playing frisbeegolf on the fourth floor of Countway.
Yeah, that's what ourpostdoc was in frisbee golf.
Alright, next question.
What is an example of a frivolous thingyou do or something you do just for fun?
What's your hobby?
I like to fly drones.
I think it's really fun and ithas a different perspective.
So, I have actually, my wifecomplains, but I have three drones.
(35:12):
Wow.
See, it kind of relates.
I mean, imaging, right?
It's computer visionrelevant there too, but wow.
Super cool.
It's like playing videogames in real life.
Yeah.
Amazing.
Amazing.
Okay.
So, this is a popular onethat has been revealing.
So, we'll ask it again.
If you could have dinner with oneperson alive or dead, who would it be?
(35:35):
Oh, it's interesting.
Can it be in the future, or doesit matter the age of the person?
Again, you would, thiswould be a first there.
The future is within scope, and I'minterested to see what that would be.
Yeah.
So I, I have a one-and-a-half-year old son.
I think it'd be great if I couldsay, have dinner with him where he's
(35:55):
my age now and have a conversation.
I think that'll be really fun.
And if it was in the past, you know,similarly my dad when he was younger,
or other people that I know well inthis context, but hopefully have a
different perspective at a different age.
And just to clarify that you're notrevealing some news here, you mean
you'd like to have dinner with yourson in the future and you both at the
current age that you are now, not justhaving dinner with your son 30 years from now.
(36:18):
Yes.
Yes.
Hopefully, I will still be able tohave dinner when I am 30 years older.
Our last lightning round question.
Will AI in medicine be driven moreby computer scientists or clinicians?
That's like asking, in a bicycle,what's more important, the
front wheel or the back wheel?
(36:39):
I would definitely say it'sboth, and I think it's neither
is sufficient without the other.
Just keeping up the analogy more,when you're going downhill, you have
to know which one to stop first,you know, so you don't, uh, you
don't get in trouble, but I love it.
Alright, David, congratulations,you've survived the lightning round.
We just have a, you know, a few bigpicture kind of concluding questions here.
(37:00):
That we want to wrap up with.
So, I think we talked about this alittle bit with some of the papers
that we talked about earlier.
But I think you've donesomething that really is unique.
And I really want to emphasizethis because I think there's a lot
of amazing work that's happening.
You know, we're evaluating a lot of greatstudies at NEJM AI, tremendous number of
submissions that are very interesting.But what I think we've noticed and we've
(37:21):
been discussing, and we discussed thisactually yesterday, right, at our weekly
editorial meeting. There is this gapthat we notice between, I think, sometimes
very interesting technical papers andrigorous clinical evaluation usually in
the form of clinical trials, RCTs, thatare well conducted pre-registered where
(37:41):
there's an outcome that is very clearlyrelevant to what the authors are trying
to assess and hopefully clinicallyrelevant or clinically relevant as well.
And I think they're, itreally is a skill, right?
And, you know, we weretalking about this yesterday.
It seems like something that, if you justlook at RCTs, it should be easy, right?
Have one intervention group and youhave like one control group and then
(38:03):
randomization takes care of all thissort of confounding that you don't
really understand about the world.
And you end up with a proportiondifference or some other
outcome that you're looking at,but there's a lot of nuance.
And it's very difficult just toformulate the question correctly.
And then as we talked about,it's also sort of socio
technically challenging as well.
We have to either raise fundingor wine bottles and SQL injections
(38:26):
or other things to raise funding.
You have to convince people thatthis is meaningful and important
to do. That it's ethical to do.
And then you have to convinceproviders or sites that are then
enrolling patients, and then provideinformed consent, and enroll patients.
There's so many sorts of challengesand important steps along the way.
And so, I, I think it's no surprisethen that there's a lot of amazing
(38:48):
research that's not RCTs and thatthere is a real dearth and need
especially medical AI, but Ithink across medicine, right?
For an increase in robust evidence,typically in the form of RCTs to know
what works and what doesn't work.
And you've done something amazing,which is, you know, you figured it out.
And I think you give us some cluesthere, which is that you're creative.
(39:08):
I think you did this before youhad, you didn't have $10 million
from a company to support whatyou were doing at the time, right?
You were able to bootstrap it and becreative about how you launched it.
But how did you learnto run a clinical trial?
Like what is for other researchers,especially other medical AI, machine
learning researchers, what should theydo if they want to run a clinical trial?
(39:29):
What should they learn?
Who should they talk to?
What skills should they try to acquire torun a clinical trial and to move things
from interesting retrospective or evenprospective study to a full on RZT or
clinical trial to evaluated technology.
Yeah, this is a really interestingquestion and something that
I've thought a lot about.
(39:51):
The two ways I would frame it is,one, incentives drive behavior.
So, the reason why we're not seeingas many RCTs is that to get FDA
cleared, you don't need an RCT.
So, many commercial companies don't findthe value in doing an RCT unless they're
pushed or necessarily asked to do so.
So, we aren't seeing as manyRCT in commercial products.
(40:13):
Hence, I would say our EF algorithm,was RCTed before it was FDA cleared
or before it was a commercial product.
But the second piece is that similarlygoing back to the question of incentives
driving, I guess, behaviors is that there's the difference between the
cost of running a trial and thebusiness of running clinical trials.
So, clinical research organizations areactually a high margin project, right?
(40:37):
You've heard very big names, in academicmedicine, but they charge an arm and
a leg because they can, because theirprimary customers are pharma companies.
Because, you know, quote unquote,the EF algorithm was our baby.
We were doing it at cost and at cost
it's a very small proportion.
It is a small percentage of the cost of ifyou actually had to ask a CRO to do this.
(41:00):
I would say that the teachings thatI've gotten through InVision and running
a startup is you just got to do it.
It is, there is no gatekeeping.
There is the more you try, the moreyou'll learn through the process and
the better you'll be because of it.
As a clinician, even going throughresidency and fellowship, you
already see many clinical trials.
So I think the chair of medicine atStanford was previously Bob Harrington.
(41:24):
He was a trialist.
And when, every time we meet him,we learn something, a new wrinkle or
interesting idea about a clinical trial.
But by purely being a clinician andin the field, you should already
have some intuition of what youlike or don't like about trials.
The second piece is, I would say,to the first approximation, most
AI technologies are diagnostics,so it should be much cheaper.
(41:46):
The turnaround time, the outcomes,all those things could potentially
be much more pragmatic and, much more,
uh, I think, short timeframethat allows for more iteration.
So, I think a lot of it is, Iwould encourage more clinicians to
really just do and to push forward.
And I think this is an area that kindof Chai and many other places are
(42:07):
trying to figure out a way to actuallyadvocate for more prospective evaluation.
So, we had MasterCard earlier,but this is Nike now.
Just do it.
Right.
Yeah, you know, you imagine this being,I think it's very easy to imagine this
being something that is for someone elseto do, with decades more experience or
who has much, much more in the way ofresources to actually get off the ground.
(42:29):
But I think you're at least an existenceproof, but you're much more, but you're
an existence proof that you could dothis very early, very young and without
significant resources to, to launch avery interesting and impactful study.
I think that's like such an interestinginsight you had there, David, too.
Cause frankly, as a non-trialist,I would feel completely
intimidated and overwhelmedbecause of the perception of how
(42:53):
expensive and complicated they are.
But you made this like very, verygood point is it's expensive and
complicated, or at least expensivebecause of who is running it.
Like who wants the answer.
And these are usually organizationsthat have lots of money that are like
well capitalized and can tolerate
an expensive high margin trial.Because there could be a blockbuster
(43:15):
drug on the other side of it,but you can actually do it.
You can actually do, you know, theequivalent of like a lean startup or
a, you know, a lean trial, especiallyin AI, because there are lots of things
that are favorable from a cost andtime perspective that are not true
and like a big RCT of a new, drug.
So, I think that that's like, a very,very useful piece of nugget and a
(43:36):
place where clinicians can have extremeownership and like really build like a
very clinically important brand aroundclinical trials of AI technology.
Alright, so I think we're going to moveto the last big picture question.
This is I think firmly in the categoryof Andy-style questions, which are less
(43:56):
about clinical impact and more about wild-eyed futuristic types of concerns.
And so,
I'm just like, especially givenyour geographic location on the West
Coast, you're a YC alum, you've workedwith Ewan who also has like lots of
startup and entrepreneurial things.
The vibe that we on the East Coast getfrom the West Coast is that the
(44:18):
future is going to be here any second.That GPT-5 is going to take over the world.
We should all have like ourfallout shelters for when AI
takes over the world and that therapture of the nerds is here.
Repent. Is kind of the vibe that,comes from certain parts of
San Francisco and the Valley.
I guess given that you're West Coastand in cardiology, what is your
(44:40):
trajectory, or what do you thinkcardiology will look like in 10 years?
Is it, like, a more efficientpractice enabled by AI?
Is it different ways of practicingentirely, or is it something that's
just very, very hard to predict?
Yeah, Andy, that is a verytough and important question.
(45:02):
I think that there are maybe twopieces that I would highlight on
a high level and why I think I'mreally happy with my job as it is.
The first is that medical AI willinevitably need people like the three
of us, because by definition of thedomain and the type of data, most large
(45:25):
scale LMs do not have access to this.
By virtue of being a clinician, I havemore perspective on what is good data,
where is the variability, and what arethe pitfalls in the data, that I think
that it would be really hard to have apure nonclinical or LLM, to approach.
So, I think that that last mile deploymentis so important that inevitably, I
(45:49):
think to the first approximation,
it'll still look relativelysimilar even in 10 years for the
clinical practice of cardiology.
One analogy I like to give is thatthe ChatBot UI is an artifact of LLMs.
It's a bug, not a feature.
If you're trying to learn tennis,you probably would not learn it
well if you're just using LLMs.
(46:11):
And in the same way, if you were tryingto practice medicine through ChatBot,
that's not what clinicians want andthat's not what cardiologists want.
I think that it ends up being thingslike our algorithm that is actually
directly deployed in the clinicalpacks or the clinical EHR, all these
things that are directly integratedthat has more of a meaningful impact.
(46:33):
I was listening to a podcast recently withNat Friedman, who's the CEO of GitHub.
And he actually said that for GitHubCopilot, the adoption and utilization
really skyrocketed when for Copilot,the interface was essentially optimized.
Their initial iterations were, why don'twe just give you a huge code chunk, and
then there's multiple choices and you canchoose, but that's like really a clunky
(46:56):
way to actually deliver information,even if you have a great model.
So, in the same way, clinicians, Ithink, I personally wouldn't want
to have to keep reaching for achatbot to answer a simple question.
But if it was embedded into myEHR, embedded into my PACS, it
is actually allowing for morestreamlined and efficient care.
So, I guess I'm asking less about dopeople want to use bad products and
(47:21):
more about your conditional probabilitythat the thing that you actually
want to use happens and changespractice over the next 10 years?
Yeah, I, I might be a pessimist,but I think that clinical practice
will still look 90% thesame, 10% more improved.
Right.
Uh, and I would go to say that itis, we often forget that maybe a
(47:46):
lot of the inputs doesn't, isn't,
or it doesn't have allthe necessary information.
Even if you have the best decision making,if there's uncertainty, or if there's
the lack of information in that inputmodality, you can only improve it so much.
So, I would say that kind of, I stillsee cardiologists seeing patients.
I see that potentially we will assist themwith EKGs and echoes that can flag things
(48:07):
that they're not sure about, but at theend of the day still reflux into the gold
standard therapeutics that are necessary.Er, diagnostics and therapeutics.
Okay, so that's helpful.
Let me then ask you another sortof statistical estimation question.
What fraction of diagnosticsin cardiology does AI do mostly
(48:30):
autonomously over a 10-year time horizon?
80% more likely than not.
Yes.
That's a pretty sizable chunk.
That's a pretty, we didn't have achance to chat about it, but we, we
like how EchoCLIP looks, we have newversions of kind of video language
models that very much simulatethe task of interpreting images.
(48:53):
So then, okay, this willbe the final question.
What is the David Ouyang as a cardiologyfellow in the year 2034 learning to do
in cardiac fellowship if 80% ofthe diagnosis has been taken over by AI?
SQL injection.
(49:14):
I have never regretted learninga field deeper, whether that's
medicine or kind of computational.
Like,
if it's hard for you, it'shard for everyone, and it's
a moat or a defendable skill.
I hope that the future cardiology fellowsstill take time to really think about
what they're doing and when algorithms or tools are incorrect or how
(49:34):
to evaluate quote unquote bad data, and Ithink that will be persistent across time.
Awesome.
Critical thinking never goes out of style.
I think everyonehas to agree with that.
I love it.
David, thank you so much.
That was amazing.
Andy, Raj, this was really fun.
Thanks for inviting me.
Yeah, thanks for coming on David.
(49:54):
This copyrighted podcast from theMassachusetts Medical Society may
not be reproduced, distributed,or used for commercial purposes
without prior written permission ofthe Massachusetts Medical Society.
For information on reusing NEJM Grouppodcasts, please visit the permissions
and licensing page at the NEJM website.