Translational AI in Medicine: Unlocking AI’s Potential in Health Care with Nigam Shah

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
(00:04):
The point we made in the article isthat all of these existing models are
not trained on medical data and arenot instruction tuned to follow up
or respond to what we ask it to do.
And then we just say, summarizethis patient record for me
and feed it a patient record.
It has never seen a patientrecord and no human taught it
what a good summary looks like.

(00:25):
And then we expect it to work.
Maybe we should be using our medicaltextbooks and curated material like
UpToDate, ClinicalKey, or what have you,and then further tuning them on EHR data.
It is possible to curate theEHR data going into that.
So, just like, for example, clinicalguidelines are based on results of RCTs

(00:48):
or observational studies, which are itselfbased on some cleaned up version of EHR
data, we could conceive feeding in a dietof good quality records to learn from.
Care as it should be, not care as it is.
Hi and welcome to a new episodeof NEJM AI Grand Rounds.

(01:10):
I'm Raj Manrai and I'm herewith my co-host Andy Beam.
And today we are delighted to bringyou our conversation with Nigam Shah.
Nigam is a Professor ofMedicine at Stanford University.
and the Chief Data Scientistfor Stanford Health Care.
Andy, this was a really fun conversation.
I thought that Nigam had a lot ofinsights, both about working with
data that comes out of the health caresystem, as well as how to navigate

(01:33):
collaborations between cliniciansand machine learning scientists.
All-in-all, this was a reallyfun and insightful conversation.
Yeah.
First and foremost, it'salways fun to talk to Nigam.
So, it was great to finallyhave him on the podcast.
Obviously, he's a world classbiomedical informatics researcher.
He's published lots of really impactfultranslational papers, and I think actually
really takes translation very seriouslyin a way that a lot of academics don't.

(01:57):
For example, he also says thingslike CapEx, which you don't normally
hear a professor talking about,but he has this whole other side
of his professional identity
on the business of health care.
And so he helps Stanford's health caresystem think about how to deliver
health care more effectively to betterserve their patient population.
So again, I consider him kind of likea full stack health care data scientist.

(02:19):
And not only does he write the papers,he also cares about how the health care
actually gets delivered.
So, it was super fun to learnfrom him in this conversation.
And yeah, it was, it was great to sit downwith Nigam and hear what he's been up to.
The NEJM AI Grand Rounds podcastis brought to you by Microsoft,
Viz.ai, Lyric, and Elevance Health.

(02:42):
We thank them for their support,
And now we bring you ourconversation with Nigam Shah.
Nigam, welcome to AI Grand Rounds.
We're thrilled to have you today.
Well, thank you for having me, guys.
Pleasure to be here.
Nigam,
so, this is a question we alwayslike to get started with.
Could you please tell usabout the training procedure

(03:04):
for your own neural network?
How did you get interested in AI?
What data and experiences ledyou to where you are today?
Well, that is a great question.
Uh, there was a, not a greatstraight forward descent.
Uh, there was a couple of spikesin the loss function along the
way, so to speak, which basicallymeans it was a meandering path.
So I started out as a doctor in India,got my M.B.B.S. degree and I was going to be

(03:30):
an orthopedic surgeon, believe it or not.
And we have a family friendwho got his Ph.D. in the
U.S.
in the seventies and knew mesince I was a little baby.
And he convinced me to trymy hand at research and said,
apply for a Ph.D. program.
And if you don't like it, comeback and you can be a carpenter.
Like, what's your problem?
You know, like, okay, he has a point.

(03:50):
So, I applied, got into a bunchof places and that was year 2000.
And the Human Genome Projectkind of hit the mainstream news,
and I got really hooked into thenotion of using computation to
understand biology and medicine.
And I was in a molecular medicineprogram, so I lobbied my committee to let
me switch to the bioinformatics major.

(04:13):
And when I finished, people on mycommittee basically said, people
who have medical degrees and who doreasoning engines, so my thesis was
a reasoning engine on yeast biology,said you should go to Stanford.
So I came there as apostdoc and never left.
That's an interesting forkwhere you were on a trajectory
to be an orthopedic surgeon.

(04:34):
What was it about that moment inyour life that made you want to
take a hard left and go get a Ph.D.?
Mostly the influence of thisgentleman, Dilip Desai, who's
been a family friend for decades.
And his argument was that he thought, I'mgood at logical thinking, and he said, you
can be a carpenter, or you can try thisthing, and if you don't like it, in one

(04:58):
year you can come back and be a carpenter.
Is carpenter shorthandfor being an ortho bro?
Yeah, pretty much.
Okay.
So, you were too smart fororthopedic surgery was his
main concern it sounded like.
Or, you know, we could spinit many different ways.
There's a bunch of ortho jokes in there.
But overall the point was thattry this new thing which I

(05:21):
had no exposure to in India.
And they said, if you don't likeit, you can always come back.
And the experimentationcost is not that high.
So I think that the general advice wasto, if you don't know, give it a try.
Looking back, did youmake the right decision?
Oh, like I am thrilled.
Maybe I would have beenhappy fixing hips and knees.
Like, I don't know,but I'm having a blast.

(05:44):
OK. So, at Stanfordthen, you did a postdoc.
Could you tell us a little bitabout who you worked with and
what you did during your postdoc?
Absolutely.
So, it was with Dr.
Mark Musen, who several ofour listeners might know.
And my thesis project was aboutbuilding a reasoning engine for yeast.
And at that time in 2005, I honestlybelieved that we would have figured

(06:05):
out how to structure medical data intoknowledge bases and everybody would
publish facts instead of just prose.
And Mark was leading the center called theNational Center for Biomedical Ontology,
which was one of the 10 or 12 NCBCs.
And that center's mandate was to createa portal that would catalog all the

(06:27):
different standard terminologies andontologies and ways of representing
structured biological knowledge.
And I was like, if I solve thisone, you know, we're going to have
knowledge basis on which we canreason over like in two years.
And so, I was like, yep, that soundslike the best thing to do as a postdoc.
And that's why I joined.

(06:47):
So, this might be a question that'sbetter posed at the end, but we
think about different eras of AI.
We think about the expert era and likeontologies and reasoning systems are
definitely part of that clade of AI.
How do you think about the role ofontologies and knowledge representation
in the current LLM moment?
That is a great, great question.

(07:09):
I think ontologies will make a comeback.
There are already thingslike graph neural networks.
And if we're able to store priorknowledge in a structured format
in a tinier footprint, we don'thave to spend the compute to
relearn it from unstructured prose.

(07:31):
So far, I would say it's still a littlebit of a research area as to how do we
correctly inject the prior knowledgethat might be in medical terminologies
like SNOMED and RxNorm and whatnot.
Into the learning processof a language model.
Well, I guess before we, we hop over justone more thing is that like, this is one
of the big open questions in AI rightnow is how much world knowledge do you

(07:53):
need to explicitly model in the system?
So, it seems to me that you'reone of the people who's well
positioned to unify the expert eraof AI with a large language model.
So I'll be interested to see whatyou do with that going forward.
Yeah, well, I'll keep you guys posted.
So, I think that's a great transitionpoint to your work now, Nigam.
So, we want to talk about both threadsof what you do, both your research

(08:17):
arc, how you run your lab, as wellas a new role that we understand that
you have, or maybe not, you know,maybe a year or two now at this point.
And so in addition to being a professorat Stanford, we understand that you
are the inaugural Chief Data Scientist
of Stanford Health Care.
We thought maybe we could start therebefore we dive into your research.
And so maybe you could tell us, you couldjust begin by telling us what the Chief

(08:41):
Data Scientist at Stanford Health Caredoes and why you signed up for the job.
So the one-line version of thejob is to bring AI into clinical use
safely, ethically, and cost effectively.
And I mean, we hear so muchabout AI these days, right?
I mean, language models andimage processors and whatnot.

(09:03):
And I encourage our listeners tosort of do this thought experiment.
If you look at 100 companies or500 companies that are selling some AI
solution and add up what they believe istheir total addressable market, I would
not be surprised if it's larger than thetotal cost of health care in this country.

(09:24):
The point being that yes, these newtechnologies are amazing, but we have
to find a way to make them sustainable.
And sustainable not just from afinancial standpoint, but from a
burden on our physician's standpoint,from value to our patient's
standpoint, and just the cognitivecomplexity of managing everything.
So that's the reason it's a big, headyproblem, as one of my other mentors,

(09:47):
Larry Hunter likes to say, if aftertenure you don't take on something that
would have otherwise gotten you fired,then, you know, you're wasting tenure.
Can I just hop in here?
You say words that I don't hear academicssay often, like TAM, Total Addressable
Market, CapEx for Capital Expenditure.
In your role as Chief Data Scientist,how did you learn, that part of, what is

(10:09):
essentially the business of health care?
It's a great question.
So, what did you do, you know,for academics who are interested,
actually, in making a similar move?
How did you learn thatvocabulary and that skill set?
So mostly by experimentation andnot on the job. Prior to the job.
So, the joke on our campus at Stanfordis that you need to spin out a
company as a part of getting tenure.

(10:31):
And I'd spun out two companiesbefore I took the job, actually three
companies before I took the job.
And, uh, as part of the foundingof those companies is how I learned
the business of how to think aboutproblems, both health care and outside.
And, you know, we might make fun ofbusinesspeople, especially when
we're scientists and doctors, but thediscipline that they have to follow.

(10:56):
is worth learning as we approachscalable solutions or seek
scalable solutions to problems.
I also think the concept of productmarket fit holds for academics, even if
you aren't actually starting a business,that you actually have to have a solution
to a problem that someone actually has.
And so you probably learn, even outside ofthe economics and business models, lots of

(11:18):
reusable skills in spinning out companies.
Absolutely.
100%.
Yeah.
Nigam, could you talk about yourapproach for working with clinicians
on these projects and bringingAI into health care at Stanford?
So, I have a very bimodal sort ofschizophrenic view on that, and I will
give both versions to our listeners.

(11:38):
One view is that you partner withthem, and you understand their problems
and attempt to build a solution thatsolves the problem that they articulate
or that you observe. You know, thisnotion of product market fit that,
uh, Andrew was just mentioning.
But the other view is why don't we lookat the overall structure of what we're

(12:00):
trying to do and imagine a solution thatwould not be visible to somebody who's
in the midst of it at the frontline.
And I try to do both.
It's cognitively quite challenging,but if we don't, then, you know,
as the cliche goes, if you askpeople what they need, they'll tell
you faster horses that poop less.
And so I think as in others inmy, in my role, uh, we, we sort

(12:24):
of have to try to balance this.
Yes, let's listen to the people onthe front lines, clinicians, nurses,
pharmacists, but at the same time,let's listen to the problem behind
the problem they're telling us.
So, we can come up with the solution thatthey really need but cannot articulate.
So a way to think about that,would it be fair to say you need to

(12:45):
listen to the problems, but maybeignore the proposed solutions?
Yes, yeah, I think that'sa great way to put it.
Okay, great and where are youin piloting AI in clinical use
now at Stanford Health Care?
So, is this something that you see,and maybe the answer is both, maybe
you can give us some examples, butis this something that you see more

(13:06):
as an application on the kind of backend or administrative work that's
involved in the delivery of health care?
And I'll include in that even thingslike writing notes and documenting what's
happening during the patient encounter.
Or are you piloting and excited abouttechnology to assist with diagnosis
and the actual provision of care?

(13:27):
And tools to equip yourphysicians to help them.
Second opinion or anothertake on, on a given case.
So, I bucket things into the use ofAI to advance the science of medicine,
the practice of medicine, or the deliveryof medicine, health care basically.

(13:49):
And the allocation tends to beheavier on the health care side,
mostly given the way our legalsystem and risk tolerance plays out.
So, it's a lot easier to deploy a solutionthat is producing a bill in an automated
fashion than there is to deploy a solutionthat is doing automated diagnosis.

(14:13):
So as a field, I think as we start,we go after the low risk, quote
unquote, safer cases, so to speak.
But that doesn't mean we don'texperiment with the hard ones.
Because if we have to deliver on thepromise of AI to improve access and
improve equity in care, we have toarrive at solutions that automate

(14:38):
parts of things that physicians andnurses and pharmacists currently do.
But I wouldn't start there.
And so, you know, we, we got to nibble atthe edges first, but in the research, we
got to dive in, into the complex problems.
And so again, it, it's a littlebit bimodal in that sense that
when I walk over to what I jokinglycall my .edu self, the, you

(15:00):
know, jeans and t-shirt self.
We want to take on the problemsthat are five-to-10 years out
in terms of their solution.
And then when I go to my .org self,which is like formal pants and a shirt
with black shoes, then we're talkingabout things that have a one-to-three
year horizon in terms of their impact.
And having that both perspectivesis super fun, actually.

(15:23):
Can you tell us, for your .org self,uh, about some of the pilot AI studies
that you are running, where you see thefrontier, where you are in implementation?
So, there's a couple of pilotsthat are publicly announced.
I'm not running them, like my teamwould be in a supportive role there.
There's one that, you know,everyone's heard about using GPT-4

(15:45):
to produce responses to my healthmessages or patient messages.
So, this is basically the patient portalmessages come in and using GPT-4 to draft
a response that a human would then review.
And, you know, without sharing too manydetails, it's under review right now,
the amount of work doesn't change asmuch, but the providers feel happier.

(16:07):
They feel supported.
And their cognitive loadon their brain goes down.
So, it is helping in that manner,just not in the manner we would
have anticipated up front.
Staying in the sort of health care deliveryrealm, I saw you give a talk once where
you made a distinction that once Iheard it, I was like, ah, of course.

(16:27):
And it was the distinction betweenefficiency and productivity.
And that hospitals actually probablyonly care about productivity.
Could you, if you remember the talk,could you walk us through that distinction
and why it's an important one for AI?
Absolutely.
So, this is the one that oftengets me into trouble on occasion.
And let's just work with the MyHealth,the patient portal example, right?

(16:51):
Let's say I as a physician, or youknow, you're working as a doctor, and
you're getting 200 messages a day.
And right now, you work between
9 to 10 p.m.
or 8 to 10
p.m.
to respond to those messages.
So-called pajama time, so to speak.
We can have an AI solutionthat works beautifully and

(17:11):
completely takes away that work.
Okay, let's just imagine that the9 to 10 work pajama time is gone.
From an access to patientsstandpoint, nothing has changed.
They're still getting theexact same responses and no
new patients are getting care.
And so from a productivitylens, have we using the same

(17:35):
resources produced more care?
The answer is no.
In fact, you could argue in a veryperverted sense that productivity has
gone down because that AI solution is anadditional expense we have to pay for.
And then you can do some math andsay, well, you have a happier
physician, less burnout, less turnover,and that has some value, and hence you

(17:57):
can show like marginal productivity.
So that is the example where itis an amazing efficiency gain,
like infinite efficiency gain ifit just worked out of the box.
Negative or marginal productivity gain.
Now this is also an example of, you know,you ask the people at the front lines,
that is their pressing need, andthere's good product market fit,

(18:23):
and it is a solution that people are asking for.
But if we take a step back and say wehave to manufacture good health care for
8 billion people on the planet, is thisthe most important problem to solve?
And then the answer is probably not.
And so, when you get local, PaloAlto, Stanford Health Care, our

(18:47):
15,000 employees and our patients,this problem becomes top of mind.
But if you're a new Ph.D. student,trying to impact the care of a billion
people, there's probably way otherproblems that are more exciting,
maybe harder, but worth solving.

(19:07):
And so, coming back to the exactquestion, efficiency versus productivity,
if we just follow, the mediaand the companies, they're
all going after efficiency gains.
Assuming they can be parlayedinto productivity gains.
And my main point is that that is not,that assumption is not always correct.

(19:30):
Does this set us up for, so if you goafter productivity gains, meaning you
can produce more RVUs per physician ifthey're using AI, to a significantly
less happy workforce though?
Because now, You're doing the same job.
You're producing more.
Um, so what's the flip side of that?
Yeah.
So, it depends how wedefine productivity, right?

(19:52):
I mean, if productivity is defined as
in a very perverse way as moreRVUs, we could destroy morale in
six months if we wanted to using AI.
Uh, but if we define productivityas how many primary care needs
were served per unit dollar spent.
It's a different way tothink about productivity.

(20:13):
Why does a primary care, does it cost 150bucks or 250 bucks or whatever it costs?
Can we do it for five bucks?
But we're not thinking that way.
Like I don't see any AI company outthere that says, I'm going to deliver
a world class primary care visitexperience for five bucks or less.
Why do you think that is?

(20:34):
Just the way the payments are made, right?
I mean, they got to chase thepayment and the existing budgets.
And right now, there is no budget in anyhealth system that says, we will deliver
primary care visit for five bucks.
Now there are companies out there thatare trying to be completely disruptive.
There's one that we recently foundout and I have no affiliation with
them, which is like a pod that youwalk in, and it delivers like the five

(20:57):
or 10 basic primary care services.
Uh, and there's nohuman anywhere in there.
And it's like 99 bucks a monthsubscription, and you get
unlimited primary care forthe five or 10 things they cover.
I agree that access is important, andwhy should a primary care visit cost 150?

(21:17):
If I follow that argument to itsconclusion, it seems like the primary
care provider as a position, seems to bedestined for extinction because if you're
charging five dollars a pop, there'sno way there's a doctor behind that.
Maybe there's an NP or a differenttype of provider attached to that.

(21:37):
Are we heading, if we can actuallyprovide primary care at scale like
that, to a world where we don'tactually have primary care MDs anymore?
I don't think so.
I don't think so.
So, if you look at history oftechnology, like ATMs were supposed
to put bank tellers out of business.
But after ATMs came around, theranks of bank tellers went up.
Cars were supposed to put peoplewho drive other people around out of

(22:00):
business, but the number of humanswho drive other humans around went
up after automobiles came around.
So, I think what will happen is thatthe things people do will change.
Like, if a computer can manage somebody'sinsulin dosage better than me, like,
by all means, go to the computer.
Use the human for tasksthat a computer can't do.

(22:21):
Like, on the one hand, we havethis massive physician shortage.
On the other hand, we require aprimary care physician to sign
off on a statin prescription.
Like, why?
Makes no sense.
I mean, nobody ever gotaddicted to statins.
Raj – Do you have any follow ups?
Uh, no, I, I think we cango to large language models.

(22:41):
Okay.
So, we, we've tried to forestallthe large language model discussion
for as long as possible, butwe always, it was inevitable.
We always end up here.
So, I'd like to anchor on yourrecent perspective in JAMA.
It covered a lot of ground.
For our listeners, it's titled"Creation and adoption of large
language models in medicine."
I think for one, it providesone of the clearest descriptions

(23:03):
of technically what's goingon with large language models.
And we've discussed them before,but we've probably done a disservice
to our listeners and not reallydefined a lot of the key terms.
So since you've done such agood job at this in print.
Could you first walk us throughthe basics of how LLMs work and
then maybe touch on uh, some of theadvanced topics like instruction

(23:23):
fine tuning and things like that?
Yeah, happy to, happy to.
Thanks for bringing that up.
So, I like to explain languagemodels with this example.
Imagine there were onlytwo sentences in the world.
Where are we going?
And where are we at?
And then if I asked our listenersto calculate what is the probability

(23:43):
of seeing the word going?
After I've seen the threewords, where are we?
And most people can say, well, youknow, one out of two, so 50%,
0.5.
That's basically the intuitionbehind language models.
Now just imagine playing this game overbillions of documents and trillions

(24:03):
of sentences and calculating theprobability, not just of seeing the
word going, having seen three words, theprobability of seeing an entire paragraph,
having seen 20 pages prior to that.
So those 20 pages are thecontext, and what you're
producing is the generation part.

(24:24):
And then imagine learning an equation thatinstead of having 3 or 4 probabilities
in it, has a billion probabilities in it,or a hundred billion probabilities in it.
That's what a languagemodel is intuitively.
Now, the beauty of this is that fora computer, language just doesn't
mean words in English, Spanish,German, or anything of that nature.

(24:45):
They could be any sequence of symbols.
Could be amino acids, could be DNA,nucleic acids, could be sounds,
could be codes, CPT, ICD codes.
And so anything that has the sequentialstructure where one symbol comes after
the other, and there's some rhymeor reason, AKA a grammar to that.

(25:10):
You can throw this technology atit and learn a language model.
And when you've learned a languagemodel, you get two things.
Thing number one, which everybody'snow familiar with, is you poke the
language model with some words,and it tells you a few words.
The chat interface, where the model'sbeing used to generate new content.

(25:32):
There's another way to use languagemodels, which is you feed in a
sequence of tokens, and it gives youback a vector of numbers, which is
a representation, or embedding, ofwhat you fed in into numerical form.
Essentially putting yourdata as a line in a table.
And both of these have value.

(25:54):
We can use the generationmechanisms to have a conversation
with the language models.
And we can use this embeddingbusiness in order to represent
things, protein structures,documents, patient records, images,
EKGs, in a pneumatic representationon which we can do computation,

(26:17):
such as find similar patients.
Predict what is going to happen next.
Tell me how many days untila heart attack might happen.
And so that's sort of the intuitionbehind writing that, like to explain
to our clinician colleagues thatwhat are these technologies capable
of beyond just chat on the Internet.
Thanks.
Oh, go ahead.
Yeah.

(26:38):
And then the sort of the secondpart of that is that, alright,
so now we have these technologies.
Uh, there's lots of peoplebuilding language models.
What are they feeding tothe model as it's learning?
And turns out the majority of the thingsout there have not trained on EHR data.
They've trained on something else.

(26:59):
So, the things that they produce alsosound like, or read like, the something
else that they've learned from.
So that's one item.
What are the inputs going in?
And then second, as we're chatting,these things are just producing
the words based on probabilities.
To give a GPT-based example, if twoyears ago, if we had told GPT, explain the

(27:22):
moon landing to a six-year-old, it wouldsay, explain gravity to a six-year-old.
So a bunch of humans had tosit down and say, like, no,
no, no, GPT, that's not true.
Uh, that's, I mean, it'strue, but that's not correct.
What we need you to do is to say,people went to the moon, and collected
some rocks, and sampled, and tooksome pictures, and came back.
Like, that is the right answer.

(27:43):
So, we either show the right answer,or we ask it to produce five, 10
answers, and we pick the best one.
All of this called instructiontuning or reinforcement learning
with human feedback and whatnot.
And the point we made in the articleis that all of these existing models
are not trained on medical data.
And are not instruction tuned to followup or respond to what we ask it to do.

(28:07):
And then we just say, summarizethis patient record for me
and feed it a patient record.
It has never seen a patient record.
And no human taught it whata good summary looks like.
And then we expect it to work.
So, if I was going tosay that back to you.
We currently use models that aretrained on text data from the Internet.
And you're arguing that maybe that'snot the right set of signals or

(28:29):
the right set of symbols to train alanguage model for medicine and we
should, we should be using EHR data.
Is that fair?
That is one way to think about it, butmaybe we should be using the general
Internet or maybe we should be using ourmedical textbooks and curated material
like UpToDate, ClinicalKey or whathave you, and then further tuning them

(28:50):
on EHR data, just like a med student.
Like the med student doesn'teducate him or herself on Reddit.
Well, there, there are, uh, subredditswhere they prepare for the board exams and
there are, like, communities like that wherethey actually do learn a lot from Reddit.
But just the, so I, Nigam, I thinkthe explanation is fantastic and I

(29:11):
really liked the article that, Andyreferenced that you wrote, but just
to maybe push back a little bit on
what it's trained on.
Do we really know what GPT-4,for example, is trained on?
Do we know that it doesn't have thirdparty medical data, electronic health
records from some locale, some country,as part of its corpus of training data?

(29:34):
Uh, yeah, we don't.
You're absolutely right.
We don't know.
They won't tell us.
Uh, at least for the Lamamodels, the public domain models,
they're not trained on EHR data.
But for GPT, who knows?
I guess too, one of the things thatcomes to mind is once you start
training a model on EHR data, you'renot training it to do medicine,

(29:54):
you're training it to do health care.
And so, as I'm sure you know betterthan anyone else, the EHR data
primarily exists to facilitate billingand reimbursement and is at times
only loosely correlated with theactual clinical state of the patient.
So how do you think about introducingall of the warts that we know exist in
EHR data to something that only has sortof a platonic understanding of medicine?

(30:20):
Yeah, no, that's a great question.
But if we want to train a modelto do billing, isn't that the
best data to train off of?
Right.
You know, so exactly.
So if you want to make GPT biller, thatseems obvious to me, but if you want it to
actually practice medicine, it seems lessclear to me that a language model trained
on EHR data is actually what you want.
Yeah.
And in that case, you know, we'd probablygo towards training on medical textbooks

(30:44):
and society guidelines and so on.
And it is possible to curatethe EHR data going into that.
So just like, for example, clinicalguidelines are based on results of RCTs
or observational studies which are itselfbased on some cleaned up version of EHR
data, we could conceive feeding in a dietof good quality records to learn from.

(31:10):
Care as it should be, not care as it is.
I think it's also just like worthpointing out that most doctors only
learn to bill after they becomeattendings and are therefore done with
the training, that that actually is thelast thing that they learn how to do
after they've learned all of medicine.
Absolutely.
I mean, we go to the school of medicine,not to the school of health care.
Andy, are you suggesting we shouldtrain our models the same way?

(31:33):
There's at least maybe a non-commutativeorder of operations there for people.
So, another thing that you mentioned,you know, again, putting on, I think
this would be your .org hat is thevalue prop for LLMs in health care.
You discussed that in this article.
So how does that play in here?
I mean, we had Mark Cuban on thepodcast earlier and he said that

(31:54):
the shine came off of LLMs veryquickly for him and that essentially,
he's just using it as autocomplete.
And so have we overestimated thevalue prop of LLMs in health care,
or have we not fully understoodwhat they're capable of yet?
I think we're overestimatingthe value proposition.
We're, we're kind of, yeah, Iwould agree with that assessment.
And I think what the hard questionswe're not asking is, what are the

(32:15):
systems that we need to build?
That can be driven by a languagemodel and would have value.
And the second question we're not askingis, do we really need to always use one
giant language model for everything?
And why do we not build specialist models?
And then the third one I'd interject,and you brought up Mark Cuban, I mean,

(32:38):
he's revolutionized genetics and drugs.
Why do we want to be hostage, likewe as health care and doctors, be
hostage to technology companieswith these closed source models?
Why can we not pool our textbooks and ourEHR data, which I agree, there's lots of
egos and, you know, vested interest andso on, but just as a thought experiment,

(32:59):
you know, if 10 health systems cametogether and partnered with the top five
medical publishers to first learn fromthe medical literature and then from
their curated EHR data and put that modelin the public domain, we could reduce
the cost of using AI in health care.
Is the answer to that not the same answerto why haven't they pulled their data

(33:20):
and put it in the public domain though?
So, I think there'sa distinction because
previously, there was no way to createthe incentive to put the data out there.
In this case, by sharing your data,you get value back in the form of
being able to use that model tosolve some problem that you had.

(33:41):
That's true.
So, you can create a shared resource thatyou wouldn't be able to do on your own.
I guess it's harder to imagine
that being as valuable as like creatinga shared chest x-ray model or something
like that, that probably has less valuethan you can imagine coming from like a
big LLM that understands all of medicine.
Yeah.

(34:02):
Raj, do you, do you want to follow upor should we hop to the lightning round?
Let's move to the lightning round.
Alright.
The rules here of the lightning roundare your answers have to be brief
and they don't have to just be onesentence and they are a mix of silly

(34:24):
and serious and it's on you to decidewhich category each question comes from.
I actually think you've alludedto some of these already in your
answers, but I'm curious to seeif my language model has correctly
predicted your response in this case.
So what single person has had thesingle biggest impact on your life?
Dilip Desai, the mentor who got me the U.S.

(34:45):
My language model is well calibrated.
If you weren't in medicine,what job would you be doing?
I was going to be anastronaut when I was a kid.
Very cool.
Why did you give up on that dream?
Turned out that when I came of age,India didn't have a space program.
A valid reason.
A valid reason.

(35:05):
Bootstrapping your own spaceprogram outside of people named Elon
Musk seems like too much to do.
Next question.
Overrated or underrated?
The AI scene in the Bay Area.
The AI scene?
Yeah, like, the AIecosystem in the Bay Area.
Overrated or underrated?
Way overrated.

(35:26):
Way overrated?
Why?
I think logistic regressionhas become AI now.
Fair.
Yeah.
And, uh, our coefficients on theeast coast are just as good as your
coefficients on the west coast, so.
Oh, absolutely not.
Ours are better.
They're at least, uh, probablymore fit and better tanned, so.

(35:47):
They have better weather.
Better weather.
Will AI and medicine be driven moreby computer scientists or by clinicians?
I would say neither.
Unless our clinicians,uh, we clinicians step up,
it's going to be drivenby the finance officers.

(36:10):
Yeah.
It's, it's the people who buy the product.
I mean, just like EHR, no one ever gotfired for buying Epic and no one ever
probably will get fired for buying GPT-4.
So, okay, next question.
If you could have dinner with oneperson alive or dead, who would it be?
That's a tough one.
I would probably go for alive, andI'd probably go for Barack Obama.

(36:34):
I think that that, I thinkwe had, he's a popular choice
for that question, uh, so far.
I think that Euan, Euan Ashleyalso said Barack that too, yeah.
Our first, first episode.
Um, also a Stanford professor.
He's also the only presidentI know that actually wrote,
uh, a scholarly JAMA article.
Oh yeah, he had, like, a Sciencepaper and something else.

(36:54):
Yeah, no, he had I think a set of them.
I think he had, like, a double feature.
Yeah.
In both of the journals.
Um, alright, this is our lastquestion, uh, and this one is a little
bit out of left field, but I have toask, thinking about what my dad has
told me about this exam,
should the U.S.
adopt the Joint Entrance Examfor admission to colleges?

(37:18):
And so just to contextualize this forour listeners while you think about
that, the Joint Entrance Exam or theJEE is this notoriously hard
exam that a very large fraction ofthe Indian citizenship or any Indian
citizens who are entering collegetakes before they go to school.

(37:38):
So sort of universal testacross, across much of India.
Should we have a equivalent ofthe JEE in the U.S. for college?
With open to everybody inthe world or just for the U.S.?
Let's say open to everybody in the world.
From an access standpoint,I would say yes.
It might put a lot of people onedge and defensive because when

(38:00):
you have, a billion other peoplecompeting with 300 million people,
just by sheer probability and priors.
So, I would be actually say we shouldbe creating a national entrance
exam, at least at the country level.
This whole madness of applying to40 colleges one-on-one, it's just

(38:23):
like, one, it's inefficient and itjust creates these weird pockets of
misaligned incentives where morepeople apply and everybody's admissions
rate look like 1% or less.
Alright.
You fix the numerator, and we keepinflating the denominator, and everybody
gives high fives, like what are we doing?
It's a crime against math.

(38:43):
The other thing about the JEE is thatit's, uh, I don't know if this is still
the case, but when my dad explainedthis to me when he was going through
the exam and then into college in Indiais that it is much greater weight.
It may be actually like the sole criterionfor admission, at least at the time into
IIT and into many of the other schools.

(39:05):
So that's quite different than the stateshere as well, where, uh, many schools
are moving away from the SAT, but when,you know, let's say 10, 15 years ago when
SAT was near universal or the ACT, oneof the two was near universal
in the U.S.
Even then it was only a small, or oneof the main, but one of the, one of
the features to determine admission.

(39:26):
So that's a, it's a verydifferent model, right?
To sort of use JEE as a way to rank,uh, everyone and then to decide on
what schools you're able to go to.
Alright.
So I would sort of say that insteadof having an exam, we should have
a national admissions system.
I think that's like both othercriteria, but it's, it's standardized

(39:48):
at the sort of national level, right?
It's centralized at the nationallevel, but it's not solely a
sort of written or technical exam.
I would say my niece is applying tocollege now, and there is like a common
application that you can apply onceand get applied to a bunch of different
places, but it's not universal.
Exactly, exactly.
Like match for residencyis near universal.
The admissions processes arenot universal, Andy, right?

(40:09):
So even if there's a common app, Theadmissions is split over all the schools.
Well, Nigam, I think youpassed the lightning round.
Congratulations.
Thank you.
Thank you.
Alright.
So, we'd like to zoom out a little bitand talk about some big picture stuff,
um, with the time that we have left.
You've touched on this a littlebit already, but I think I'll
rephrase how I'll ask the question.

(40:29):
How nervous should we be
about the AI ecosystem, giventhat it's essentially dominated by
two large tech companies, both asacademics, but also as patients.
What should our nervouslevel be about that?
I think it should be high.
And I would say instead of nervousness,we should approach it with a matter

(40:50):
of concern in the sense that are wewilling to abdicate such a crucial part
or seemingly future crucial part of ournational infrastructure to third parties
over which there is no national control?
Like we would not let justtwo countries run the entire
electric grid of the country.
Internet is not run the same way.

(41:12):
And if health care is equally important,why would we cede control in that manner?
So, it's not fear, but I thinkin, uh, it is more about
national interest and maintainingequitable access and fair play.
That we should all be, concerned about it.
I guess if we're going up to likethe federal level, how do we, so, I

(41:32):
mean, it's hard to get things thatmost people agree on done currently
in the current political environment.
This to me feels like almost like abig particle physics project where
you need a CERN for AI in the U.S. ifthere's going to be an alternative.
To the ones owned by big tech.
So, like, I agree with the vision.
I just stop when I try to think abouthow we would operationalize that.

(41:57):
So, like, what is the pathforward there to realize the
vision that you just laid out?
I think we need an externalshock to the system.
Like when Sputnik flew over the Americasin the sixties, that created the space
program that put a man on the moonand bring him back alive.
So we need some external shock.
Like U.S. and a lot of other,our institutions, like the

(42:18):
places you're at and the placeI'm at, we hate being second.
So we got to tap into that and, youknow, maybe we engineer an external
event where like nationally there'sthis imperative like, oh, we're second.
Come on, we got to do something.
Yeah, I think spite is always agood motivator and maybe we can
use institutional competitivenessto catalyze some of that.

(42:41):
I think one of the things that strikesme as ironic in this moment is the
only reason we have legitimate opensource alternatives It's kind of
because of Meta/Facebook, uh,they've published Llama and Llama 2.
A lot of people have beenbuilt on top of those models.
There are startups that make open-source models, like Mistral, and MPT,

(43:02):
and, you know, even the UAE has an open-source large language model.
And these models impress me becausethey continue to punch above their weight.
You know, if you're looking at, you know,accuracy per dollar spent on the model.
They punch way above their weight.
So how do you see the open-sourceecosystem evolving and how do you, how
do you think that folds into health care?

(43:23):
Raj and I are both editors atNEJM AI, and we ourselves have
written a lot of GPT-4 papers.
You don't see the same levelof health care investigations
from the open-source models.
Is that because the closedsource models are just better,
easier to use, all of the above?
How can we catalyze the open-source modelsin health care so that there's a legitimate

(43:44):
alternative to the closed sourced ones?
Yeah, that's a great question.
Something close to my heart.
I think people are not experimentingas much with open-source because
there's no good platforms
on which you can easily experiment.
People experiment with GPT because there'sChatGPT, a browser-based interface where
you can paste stuff in and do a project.

(44:07):
We don't have that for the mostpart for all of these public models.
The engineering lift to get itup and running is significant.
So that's one barrier.
But I think that barrier canbe solved, like going back to
one of our prior conversations.
Imagine even 10 healthsystems or imagine insurance
coming together, three large insurancecompanies, saying we will fund an

(44:30):
open-source AI foundation model, largelanguage model, for all of your billing,
answering the patients, and so on.
Because ultimately, all thepayments come from insurance.
And if we want to contain the totalcost of ownership of AI, why would
we want to pay the CapEx of havingbuilt these large giant models that

(44:53):
somebody else built for whatever reason.
And kudos to Meta to have releasedLlama, which gives everybody a
foundation on which to build asolution that doesn't cost as much.
I don't need my patient messageresponse model to also draw
me a picture of a unicorn.
It cannot have that capabilityand it's okay, but if it's 10x

(45:16):
cheaper, like all the better.
I think that's a really good point.
Time dilation is a real thing herebecause ChatGPT is only a year old.
Now it feels like a decade old.
And the underlying model was still
3.5.
They did some RLHF that youmentioned earlier to make it slightly
more pleasant to chat with, butthe barrier to entry was zero.
That wasn't the big breakthroughand why it went viral is that there

(45:39):
was a website that you could go toand chat with it and it just worked.
I think that that, now that you saythat, seems to me to be the missing
piece here for the open-source side ofthings is we need a, uh, and Hugging Face
hosts some of these things too, but,um, it's just not at the scale that
OpenAI has done for the GPT models.
And maybe, maybe that would be agood collective open-source effort
to make something like that thatwould be easy to experiment with.

(46:03):
Absolutely.
So, we are strong proponents of that.
In fact, we just droppedtwo models on Hugging Face.
At NeurIPS and MLHC, the embeddingkinds of models, not the generated
ones, not the ones you talk to.
Because we need sharedexperimentation, which we don't have.
I mean, you have folks at Harvard, MatthewMcDermott, pushing really elegantly on

(46:23):
creating the right infra that we can allcompare and experiment and share results.
Like, we gotta do that.
Yep, totally agree.
But Nigam, it occurs to me that it's, Ithink it's something you just said, you
know, moment before that, which is thatwhy did ChatGPT get so popular, right?
And what was it?
It was, it wasn't the existenceof the model weights, right?

(46:45):
It wasn't the sort of API.
It was this chatbot.
It was this interface that I remember,being in NeurIPS in New Orleans last year.
And one of my students justcomes up to me and says, have
you heard of this ChatGPT thing?
I was like, no, what's that?
Takes out his laptop.
And then pretty soon everyone in thehallway there is just, interacting

(47:07):
with it, asking it, it kind ofbreaks for a little bit, right?
Time's out.
And then it just takesover the conference, right?
Like in sort of the, you know,the cocktail party conversations.
And it was this universal abilityto just interact with it in so many
different ways that I think immediatelycaptured so much of our attention.
And so, I like the idea ofbuilding similarly useful tools

(47:28):
for opening or open models.
Uh, OpenAI, but open models, right?
Open source.
Lowercase O there.
Lowercase O.
Lowercase O.
Lowercase O, exactly.
But also, like who would do that, right?
So, who would, because it's, it'san engineering challenge, right?
To build and support something like that.
It's one thing to upload it to HuggingFace which I think you deserve.
Everyone deserves a lot of creditfor doing things like that.

(47:48):
At the journal, we're very supportive
of that.
We have a whole article typethat's devoted to sort of
shared benchmarks and datasets.
But it's another thing altogether to kindof support a service that is available
to everyone and that becomes kind of ashared experience for tens or, hundreds,
uh, whatever the number is, millions ofpeople to use and talk about together.

(48:11):
But we have precedent.
We have precedent.
What is the best?
There's PubMed.
So, PubMed.
So, this would be a sort of NCBI or kindof, equivalent entity to, to try to
organize something, something like this.
It can even start out academic.
So, you know, GenBank, uh, there isGEO, Gene Expression Omnibus that
started out as micro databases at afew academic institutions, which then

(48:35):
sort of got lumped in and got suckedinto NIH's national infrastructure.
So there's precedent.
But I think what needs to happenis that 20 years ago, there was
this national conversation aroundCenters for Biomedical Computation.
Eight or nine of them got funded.
And, you know, there was one thatZak had at Harvard, and we had
a couple here on Stanford campus.

(48:58):
Where is that national conversationabout language models for research?
Like, our previous NLM director, PattyBrennan, I was sort of joking with
her that, uh, you know, we should,NLM should just rename itself to
Large Library of Medicine as
llm.nih.gov,
right?
I mean, yeah.
But we need that conversation.

(49:21):
Yeah, agreed.
Alright, Nigam, our last question foryou, and I think it goes pretty well with
open-source models, is the following.
How will the democratizationof data empower patients?
I think patients will startexpecting customer service.

(49:43):
One of the things I tell med students andour clinicians is that as a profession,
we have to understand customer service.
We got away with it for a very long time.
Ha, ha, ha.
Not having it.
But as people are empowered,they will ask questions.
And questions that cannot be blownoff and should not be blown off.

(50:05):
And I'm one of the firm believers thatinformation net net in the hands of
the person affected is a good thing.
Yes, it'll be confusing in thebeginning and there'll be efforts
needed to educate and so on.
Knowledge and informationis amazingly empowering.
So, we will have empowered patientsas soon as data liquidity happens.

(50:27):
See, that's another Nigamphrase there, data liquidity.
That's, uh, like such a goodway to, to think about it.
Do you think, so one of thethings that also occurs to me is
that, you know, bridging a couple ofthings we've been talking about is,
LLMs as an interface to patient data.
So do you think that LLMs willactually enable patients being able to
sometimes talk to their data in waysthat they haven't been able to before?

(50:50):
Cause if someone dumps like a spreadsheetof data on your desk, like maybe that's
not the most useful, but if you cansay like, what was my blood pressure
average over the last three years andthen actually get the answer back,
is that a key enabling thing here?
I believe so.
There was actually a demo oneof the Stanford students did.
I forgot the exact name of that demo.
I think it was HealthGPTor something like that.

(51:11):
And the idea being that whatever youcan get in your, in this case, just
Apple's health app, you can talk to.
Now, the model is there, there's anapp, and then there's a data pipe.
Now, 21st Century Cures Act willmake the data pipe a lot bigger.
The model will still be able to handle it.
And so, you put the two together, we'renot that far away from the first proof

(51:34):
of concept that you can download 10years of data, maybe including an
image, and have a conversation with it.
Does that scare hospitals?
Because you can think aboutspotting medical errors and like
litigation and things like that.
There was this survey this morningabout, you know, asking hospital
leaders, CIOs, and IT folks, likehow do they feel about their comfort

(51:58):
level in complying or meeting thedemands of 21st Century Cures Act?
And only 36% feel they're prepared.
And rightfully so, like, theguts of health IT is a great way
to time travel right back into,like, I don't know, late nineties.
Raj, any follow up questions?

(52:19):
I think maybe I have just one more, whichis along the lines of, I think you just
said, you're a proponent of informationin the patient's hands, which I think
Andy and I are probably both pretty,pretty well aligned with you there.
Patients today, and physicians aswell, but patients today, let's
focus on them, are using largelanguage models like ChatGPT to

(52:41):
interact with their health data.
This is already happening, right?
Maybe you could just give us someparting thoughts on what you think
about that? And maybe what potentialgood and not good uses of these
models are for patients interactingwith data about their own health.

(53:03):
I think an immediate use isexplaining what happened.
Like at a given visit, youknow, after the visit is done.
Like, I have a medical degree andI had a health scare in 2021 and
after the visit happens and youspend 20 minutes, you come back and
then you have like five questions.
Now, given that I'm on the faculty,I could get those five questions
answered by sending emails.

(53:25):
But it would have been nice ifthat was available to everybody.
And most of those questions didnot really need the surgeon.
They could have been answeredby the surgery textbook.
I mean, the guy was probablyjust indulging me as a
colleague to respond.
So, I think contextualizing whathappened and then next level up is

(53:45):
like the basic, basic, simple stuff,statin dosing, hypertension drugs.
So when you say statindosing, what do you mean?
What specifically?
This happens, what should I do?
Should I go up?
Should I go down?
The common pediatric ailments.
I mean, you know, our children arenow older, but if you have a young
kid, I mean, how many times is itthat you kind of panic and then it

(54:08):
turns out to be nothing and happens tomillions of people all the time, right?
All the time.
So, what happened after thatcomes assurance that you're
able to say, it's okay.
This is not an emergency.
And so, the
capabilities needed there are not theclinician replacing diagnostic capability.

(54:29):
But only being able to detectwhen is it not a problem.
But we don't build products like that.
That's not how we thinkabout deploying technology.
Computer scientists and studentsend up horse racing a clinician
as opposed to being the gatekeepersaying, I'll only let in things that
are really worth your attention.

(54:51):
It's a mindset issue aboutwhat we should be building.
I totally agree with that.
And I think that's a great place to end.
So, Nigam, thanks so much forjoining us today on AI Grand Rounds.
It's been great talking with you.
Well, thanks for having me.
This was a pleasure.
Thanks, Nigam.
Thanks.
Alright.

All Episodes

Episode Transcript

Popular Podcasts

Dateline NBC

Stuff You Should Know

Intentionally Disturbing

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}Translational AI in Medicine: Unlocking AI’s Potential in Health Care with Nigam Shah