Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
(00:03):
You can see that we have this weirdmedical system where it can do more
than any medical system can in thepast, but people trust their doctors
a whole lot less than like the 1950s.
I, as a historian, did notsee large language models.
This is a surprise to me, right?
I was very bleak about what the futurewas going to look like, that it was
going to be more reductive, morebreaking people into individual pieces.
(00:28):
And now I see a technology that,again, it's not human, I don't want to
anthropomorphize it, but that seeminglyunderstands a lot of these contextual
factors and the things that make ushuman and give us meaning in our own
sense of disease and our sense inthe world and our sense of community
and what disease and suffering means.
In a technology that could reallychange the trajectory of where I
(00:51):
thought medicine was going, whichwas a place that, like, in many
ways I'm very old fashioned, right?Like, I reflect the values of medieval
physicians and ancient physicians andmany modern physicians too, right?
There are things that are standardover time, and I see these technologies
as a way to get us back to some ofthose core things of what a physician
has done while not losing many ofthose advantages that come with
big data. That come with collecting information.
(01:28):
Welcome to another episodeof NEJM AI Grand Rounds.
I'm Raj Manrai and I'm here with myco-host and good friend, Andy Beam.
Today, we are really excited tobring you our conversation with Dr.
Adam Rodman.
Adam is an assistant professor ofmedicine at Harvard Medical School
and a practicing physician at theBeth Israel Deaconess Medical Center.
(01:49):
Andy, I think this was areally fun conversation.
You know, Adam is in generalpretty fun to talk to.
He's a both historian and futurist whohas this really interesting perspective
from looking and studying the history ofmedicine and medical decision making. But
now really focusing on AI and what largelanguage models can do for diagnosis
and in treatment recommendations. Hepublished this great paper that really
(02:11):
caught my attention about a year ago on,using GPT-4 to diagnose cases from the
NEJM Clinicopathologic Conferences.
And we dig into that one, butreally just get his perspective
generally on where this is all going.
This is really, really fun and fullof interesting insights from Adam.
Yeah.
I love this conversation just becauseAdam is so hard to put in a box.
(02:33):
So, he studied economics as an undergrad,as you mentioned, is a connoisseur
of one of my favorite subjects,which is the history of medical AI.
So, we got to nerd out about expertsystems, uh, in the history of medical AI.
I'll forgive him for thefact that he went to UNC.
I'll give him a pass on that.
But for a lot of reasons, itwas such a fun conversation.
He's so high energy,so full of creativity.
And I think that he's so high energy,he hasn't let the health care system
(02:57):
kind of grind the energy out of him.
He still comes to all of his researchquestions and to all of his clinical
duties with this sort of sense of wonderand sense of energy and creativity.
And optimism.
Yeah.
And it's hard to maintain thatin today's health care system.
So, it was just a breath of fresh air.
Yeah, we tried our best tokeep up with him, and it was
just such a fun conversation.
I think you'll hear that comethrough in the conversation.
(03:18):
I totally agree.
And just, would it have been better ifhe went to Duke or was UNC the, can you
explain, I think, give some Raleigh,North Carolina context for our listeners.
Yeah, so here's a littleNorth Carolinian context here.
So, I went to NC State.
My wife went to UNC and, uh, for alot of reasons, NC State folks tend
to have a one-sided rivalry with UNC.
(03:39):
I'm not sure they reallythink much about us.
So, I always bristle a little bitwhen I'm talking to a Tar Heel.
But even given that, Adam's a greatguy and I had a great time talking
about the history of medicine with him.
Awesome.
The NEJM AI Grand Rounds podcastis brought to you by Microsoft,
Viz.ai, Lyric, and Elevance Health.
(04:01):
We thank them for their support.
And with that, we bring you ourconversation with Adam Rodman.
Well, Adam, thank you so much forjoining us on AI Grand Rounds.
We're excited to have you.
Thank you very much, Andy and Raj.
Adam, great to see you.
So, this is a question thatwe always get started with.
Could you please tell us aboutthe training procedure for
(04:22):
Adam Rodman's neural network?
How did you get into training?
Yes, exactly.
Wait, I have two, I have twomore, two more sub-questions here.
How did you get interested in AI?
And what data and experiencesled you to where you are today?
So, we're really talking aboutpre-training and then maybe a
little reinforcement learning.
Yeah, pre training and the fine tuning.
(04:43):
I'm, I've been fine tuned.
I guess everyone is sort of fine-tuned bytheir harsh reality with the world, right?
Yes, I think that's fair.
I have a very, I think, unusualpath towards this field and this
research that we're doing, which isthat I approach this as a historian.
So, for almost a decade, I think, oneof my big focuses of my research has
(05:06):
been, and I'm sorry, I know you'regonna make fun of me for saying
these words, but on epistemology.
So, the approach, like, physician'sapproach to knowledge, especially
as it pertains to diagnosis.
And then also, secondarily,the concept of nosology.
Again, you're gonna make fun of mefor this, but how we, like, what
structures we use to define diseases.
And my focus, my research focushas been really with the era
(05:29):
you'd call modern medicine, but inparticular from the late 19th century
through all the way through the 20th.
So there was a huge focus onartificial intelligence, right?
We've been talking, the oldest quotethat I've ever found referencing
artificial intelligence comes from1918 from Bernard Shaw, where he's
like talking about what a mechanical
physician artificialintelligence would look like.
(05:51):
And actually, when he published that inthe 30s, he used the word robot doctor,
which is funny because the word robotwas only invented two years before.
So these are very, very old ideas.
And if you go back to like the 1940sand 50s, there was this, I mean, I
think similar to today, there was thisexpectation that doctors were over, right?
Medicine was soon going to berevolutionized by electronic computers.
(06:14):
And that generation of physicianstruly thought that, right?
They were out there proselytizingthat computers are coming.
So I was, you know this, but I was,I had written a book and I was like
working on the proposal for my secondbook, which is basically about what
we're talking about, about the structureof knowledge and how it pertains to
diagnosis over thousands of years.
And then GPT, I had played with languagemodels before, if you guys recall some
(06:37):
of the horrific chatbots that werereleased to the world in the early 2020s.
And I mean, in my field, no onethought that these were going
to amount to a whole lot at all.
So I remember I got ChatGPT 3.5
very shortly after OpenAI releasedit, and I said, well, okay, I have
hundreds of manuscripts of historicaltests of artificial intelligence.
(06:59):
How does 3.5 measure up?
And shocker, it didn't do well at all.
And then, 4 comes out, and Iget it the day it comes out.
And I just run a benchmark on oneof my own patients through it.
And I, my, my jaw hits the ground.
Because I realize, right, like, by thehistorical standards that physicians have
traditionally used like stretching back toInternist-1, or even further right you can
(07:23):
go back to Ledley and Lusted in the 50s.
It, with no specific medicaltraining, was outperforming
anything that had come before.
And, basically, I, at thatpoint, was like, I can't be the
only person that realizes this.
Hey, Adam, can I hop in?
Yeah, yeah, yeah.
So, I'm gonna, I try and one up youhere on vocabulary and appeal to your
(07:43):
historian philosopher sensibilities.
I think you've started us a littlebit in medias res here in that we
actually want to hear about howyou got interested in medicine.
What was the young Adam Rodman like?
And what specific things led you tothis intersection of medicine and AI?
Oh, young Adam Rodmanwas a pain in the butt.
Old Adam Rodman is stilla pain in the butt.
Yeah.
So, when I was a young physician, evena medical student, I was obsessed with
(08:08):
the idea of why we did things, right?
I was unwilling to accept the worldas something that just exists, right?
There must be justifications or reasons.
And this is really whatled me into a path.
I studied history in college,but this is what led me into
the path of being a historian.
And initially I was interestedin therapeutics, right?
Yeah.
As an intern, you're like, well.
(08:28):
Well, why is this dose ofLasix that's a loop diuretic?
Why is this what we do?
Is this better than what had come before?
And of course, when you pull at thosethreads, you often learn that a lot of
the things that we hold dear in medicineare built on a foundation of sand.
And the more that you pull at that thread,you discover that there's just so much
fundamental uncertainty in our field,which is some of that's inherent to the
(08:49):
practice of medicine, but the field ofmedicine does not acknowledge that, right?
There's a lot of what Iwould call pseudo confidence.
In the same breath, we'll speak withthe same level of confidence about
something that we know really well, as tosomething that's basically a coin flip.
And, again, I am fully acceptingthat, especially now that I'm getting
old, there are just limitationsto our evidence, and that's fine.
(09:09):
But medicine does a bad job ofdiscussing that uncertainty.
So I, we were talking about podcastingearlier, actually started a podcast
when I was a resident calledBedside Rounds that later grew
into an academic history project.
It's a very good podcast.
Just gonna, just gonnaplug it a little bit.
It's a very, very goodpodcast for our listeners.
Okay, Adam, keep going.
(09:30):
Oh, I mean, you know, if you can't getme talking, I'm just going to keep going.
Yes.
Yeah.
And as Raj pointed out, if you listento the podcast, you can sort of get
an evolution of my thoughts aboutthe nature of medicine and how like
really our approach to knowledge isbaked into that epistemology, right?
How do we know what we know?
How has that changed over time?
(09:51):
And really with these interesting threadsthat stretch back to the beginning
of modern medicine about how does
the way we define disease, how doesour technology affect what we know
and our certainty about medicine?
And this, this touches so manyfields, like diagnostics, like
randomized controlled trials,and of course informatics.
Yeah, so I was just gonna say,it's interesting that personality
(10:13):
type seems to be questioning.
You don't take things for granted.
At least in my experience, that's notthe classical phenotype of someone
who's interested in going into medicine.
So, what was it about being a doctor?
Like, why did you get intomedicine in the first place?
We were just talking about this.
I was not a premed.
I studied history.
After college, I joined aneconomic policy think tank.
And, I hated it.
(10:34):
I just sat at a computer all day.
I wanted to be around people.
So, I made a really rash decisionto go into medicine based on
knowing very little about the field.
I also should have donea post bac in retrospect.
What I did instead is I was working fulltime at a think tank and took organic
chemistry while working full time.
I got a C.
Apparently Tulane School of Medicine madea terrible mistake by admitting me, and
(10:57):
I've been a pain in the butt ever since.
So, no, that's, I wasn't a pain inthe butt when I was a medical student.
I had to get to, like, residencyto really be a pain in the butt.
Adam, it's fascinating, becauseI, I knew a lot about that.
So, you know, for our listeners,Adam and I are close collaborators.
We've been working together for sometime, and I knew a lot about that, but
I don't think I appreciated exactlythat moment that you decided to go to
(11:18):
medical school and what triggered it.
But a lot makes sense now because I thinkbecause of that arc, you have a very
unique take on problems in medicine.
And I think you question the statusquo while also respecting a lot
of what's come before and we'lldive into that in the episode.
So, I think I want to transition to someof your work and I want to start with
(11:41):
your recent series of papers onevaluating large language models
for clinical decision making.
So, the paper that first caughtmy attention that you published
back, I think in June 2023 in JAMA.
So, this feels like 100years ago in AI time.
This is less than a year agonow, was on evaluating GPT-4.
(12:02):
on the NEJM CPCs, the Clinicopathologic Conferences,
or also known as the Case Records of the Massachusetts General Hospital.
And so, I had read a lot of these casesas a quasi-med student, you know, Ph.D.
student in the HST program, where wewere learning about diagnostic reasoning,
how experts think about these cases.
(12:22):
And so, I knew the sort of historicalsignificance and also how important
they were didactically and howhard they were for many physicians.
And so I, this really caught my attentionbecause the model performed pretty well,
relatively out of the box on these cases.
But I was wondering maybe you couldstart with that paper when you got
the idea, what the backstory was
(12:43):
for how you came up with theidea and maybe also just give
our listeners a physician's takeon the meaning of a CPC and the
formula and what a CPC is all about.
I'm going to start by goingback almost a century.
So, there's two historicaltrends that intersect in in
that paper and the idea of CPCs.
Then the CPC, the clinicalpathological conference dates
(13:04):
to the early 20th century.
It's actually started at, at Harvard.
One of the many thingswe're very proud of.
And it was adopted actuallyfrom the legal tradition.
So, this comes from a time when doctors arereally starting to get interested in what
we today would call clinical reasoning oreven metacognition, which is not just what
is the right answer, but how do we think?
(13:25):
How do we reason?
How does new information change that?
And what are reasonable disagreements?
Not necessarily new ideas like thephrase differential diagnosis dates
to the early 19th century, butreally formalizing that in a process.
So, the idea of the CPC, I think it'sRichard Kabat, was that the doctor
who had a case, and a case that wasa mystery at the time, would discuss
(13:48):
that with discussants who wouldthink out loud for the audience.
And the idea of all of thesecases is that it's like a mystery.
It's like a Sherlock Holmes novel.
There's a whodunit at the end.
So, we have a pathologicdiagnosis in the original ones.
Obviously now in the 21st century, theydon't all have pathologic diagnoses.
But in the old days, we wouldwalk through like a mystery case.
People would say their thoughts, they'dlist their suspects, exonerate, rule
(14:11):
them in, disagree, come to an answer.
Meant to be an explicitly didacticexercise in reasoning, right?
The point of it is not so much toteach you about a disease, but to
teach you how to think about a disease.
So fast forward to the 1950s.
At this point, the New England JournalCPCs, or the Mass General CPCs, I guess
that's what it's called, the Case Recordsof the Massachusetts General Hospital,
(14:32):
but the CPCs had become an institution.
Some other journals had versions, butthe New England Journal's was the biggest.
And you have these two brilliant protoinformaticists, Ledley and Lusted,
and they are imagining how you wouldteach a computer to reason, right?
Because prior to this, let's say youlook at the works of Keeve Brodman.
(14:53):
He didn't believe that doctorsreally knew what they were
doing when they were reasoning.
He was like, oh, it's all intuitive.
It's all hand waving.
But Ledley and Lusted were like,well, that can't possibly be true.
I mean, look at these CPCs.
And they sat down, and they triedto use what they called at the
time von Neumann's Theory of Games.
So, game theory, right?
Because game theory is only a coupleof years old to model out in these
(15:14):
individual cases, how people reasonand what they came up with is what you
and I today would call probabilisticor Bayesian reasoning, right?
The idea that when they analyzethese cases, doctors, didn't say it
out loud, but they intuitively hada pretest probability, and then
they were looking at each piece ofdata that came in, and trying to
figure out whether that increased ordecreased the posterior probability.
(15:37):
And they didn't use the words sensitivityand specificity yet, those had only been
adapted to medicine very recently, butthat's what they're getting at, right?
That you could have a sort ofBayesian system, and they also
imagined an iterative systemwhere new information would train
the model on punch cards, right?
They actually have a bunch ofcool pictures in the Science
paper with punch cards.
And what they proposed is that becausethis is how they modeled reasoning.
(16:00):
If you built a computerized system that could make medical decisions.
What is the best way to test it?
Using these formalized CPCs.
Because they're so well structured.
And basically since Ledley and Lustedwrote that paper, every single AI system
since has used CPCs as the gold standard.
(16:21):
So the most famous one is Internist-1,
which was developed in the late 1970s.Blackjack Myers, Jack Myers, he was
a chair of medicine at Pitt and thepresident of the AMA, and he had an
eidetic memory, right, a photographicmemory, and he set about basically
recreating his mind at a computer. So, when Internist-1 was evaluated in
the New England Journal of Medicine,
(16:41):
it did great, right?
It probably outperformed doctorsin the early 1980s, but they
used the CPCs to evaluate it.
And then, really, since that time, ifyou look at all of the other commercial
products, so later QMR, Isabel, DXplain,every single one of the studies used CPCs.
So, you got a nerdy historian who'slike, this seems really impressive.
(17:01):
It seems to, grok medicine in away that, I mean, I've used both
Isabel and DXplain, but they don't.
They seem far more likestatistical engines.
This thing seems to have some sortof, I'm doing air quotes, insight.
Just to be clear, this thing rightnow you're talking about is GPT-4.
GPT-4.
Yeah.
So I designed an experiment that wouldhave made Ledley and Lusted happy, right?
(17:23):
So, I used some of these old measures.
I tried to use the bestmethodology that I could.
And unlike old studies.
So, if you look at like theexplain Isabel studies or even
the original Internist-1 study, someonewas manually inputting all of that.
It took like an hour to put a caseand manually inputting all this data.
In this case, we just put theacontextual information in the context
(17:43):
window of, in this case, we didn'tuse the API, we used the chatbot.
And by the standards that have beenagreed upon in the like differential
diagnosis generator field, right outof the box, GPT-4 performed, it tied
the best existing system, which ofcourse had been like trained over a
decade and had tons of information.
And this thing could just do it like that.
(18:04):
Can I hop in here and ask a question?
Because you just made me realize somethingthat I, I've never thought about before.
So, uh, you know, DXplain,Internist, all of these things were
essentially like an idealized Bayesianversion of clinical reasoning.
And, when you talk about whatshould the ideal physician do?
That's kind of put up as theyshould be perfect Bayesians.
(18:25):
They should integrate and conditionon any information available.
But like, actually what I hear yousaying is that mimicking the messy
process of human reasoning, where we'rebeing intuitive and we're grokking
actually works better in the real world.
So should we give up onthat ideal Bayesian doctor?
Like, like what should we take away fromthe fact that we did this bottom up,
instead of top down.
(18:46):
Yes, that's exactly it.
Um, no, we shouldn't, I think, well,clearly there's a role for Bayesian
reasoning in certain domains, or evenas my friend Shani Herzig likes to
say, Bayesian reasoning is the cherryon top of the sundae of clinical
reasoning, but it's not the sundae.
And I mean, we've knownthis for a long time.
So, a lot of these earlyinformaticists knew this.
So, one of the famous examples is, um,Elstein is doing all of these wonderful
(19:08):
studies, like interview studies
at Harvard in the late 70s, and he'slooking at medical students and he's
looking at attendings and havingthem talk out loud through cases.
And what he discovers is that boththe med students and the attendings
show the same thought process, right?
They're all little SherlockHolmeses, they're asking questions.
Going after different differentialdiagnoses, but then a weird thing happens,
(19:29):
which is that the attendings ask likefive to 10 questions and get the answer.
And the medical students go on forlike half an hour and still don't
have a very good differential.
So, it really, it's at thatpoint in the late 70s,
and this is like the era of Kahnemanand Tversky, that Elstein realizes
there's something else going on.
And nowadays we know, likeit's an organizational
idea of reasoning, which is that
(19:51):
there are ways that human beingsorganize information in their head.
We call it script theory, andthat basically from the second
we start hearing about a patient,we go down certain routes.
And it starts to develop these schema,so these comparisons of different
scripts to give us a an idea, oh, thiscould be a pulmonary embolism or this
could be metastatic cancer in the lungsthat further drives our questioning.
(20:13):
So yes, I think what is special,and this is cutting to the chase, what
is special about language models isthat they're a bottom-up model of
reasoning that really mimics
the expert clinician system 1.
What they do not do well is any ofthese sort of deliberate metacognitive
strategies, most notably Bayesian.
Like, they can't do Bayesian reasoning.
Of course, me and Raj had astudy that showed that their
(20:35):
implicit understanding of Bayesianreasoning is better than humans.
So, they can't understand Bayesianreasoning, but they can't understand
it better than we can't understand it.
Um, but yeah, so that's what'snew about this technology.
Also, what's new is that its inputs aretextual, like contextual text information,
meaning that they can be scaled.
(20:55):
One of the things that I always liketo tell people is, look, guys, in the
early 1980s, we had computers that if youinputted it appropriately, could reason
better than humans in difficult cases.
Ah, Raj, we talked about problemknowledge couplers, like what Larry
Weed developed, and in the domainsin which it worked, it was incredibly
impressive, but you would just sit infront of a computer doing like, yes, no.
If you could structure the information ina certain way. But you know, you what's
(21:19):
so fascinating about the literature isif you go back to, I dunno what year this
was, maybe it's 70s, maybe 60s,Pete Solovitz and Steve Palka, right.
Are starting to question systematicallylooking at those systems and
enumerate the limitations of thesort of purely probabilistic Bayesian
approach to automated diagnosis.
So, they have this really good paper,categorical and probabilistic reasoning
(21:42):
and medical diagnosis, where they outlinethe data hungriness, the combinatorial
explosion of probabilities that youactually need to estimate to have this
purely Bayesian approach to the world.
And then I think also sketch what wouldbe probably, pretty intellectually aligned
with what you just said about scripttheory, which is how categorical, or this
kind of fuzzy, this fuzzy set of rules orwinnowing of the space of possibilities
(22:08):
is a critical first step before wethen launch into the purely Bayesian
approach to update decision making.
I think just to defendthe Bayesians, right?
Like, you know, there is a, inthe decision theorists, right?
There is a lot that we learn about,I think, utilities and how to extract
information from the patient and howto extract information from groups
of individuals that we don't reallytalk about in even the current wave of
(22:30):
excitement about large language models.
So what is a patient utility?
What is the risk preference?
What is a tolerance for the patient?
What are their goals?
I think there's some of our historians andpioneers of medical decision making have
really thought very carefully about that.
And what I personally would love to see isus, maybe not necessarily operationalizing
it the same way, but injecting that wisdomand that patient preference, maybe via
(22:53):
large language models, systematicallyinto decision making and into maybe
even the medical record, right, Adam?
Maybe there's a path to that.
Yes, you're trying to, you knowexactly how I feel about that.
Yes.
Okay, so I think thisis a good transition.
So, one thing you did really specialin that paper in JAMA, the
CPC's paper, where you evaluatedGPT-4 is, and you alluded to
(23:17):
this, is that you did, somethingnuanced in your evaluation, right?
So you evaluated the correctnessof the differential diagnosis, not
by a multiple-choice type accuracy.
There is no such thing forthose differentials, right?
For the CPCs.
But you use this validated, psychometric.
And again, I've started talking likethis because I'm spending a lot of
(23:37):
time with Adam these days, but wellvalidated psychometric to grade
essentially how accurate this model waswith respect to previous incarnations.
So that one was the bond scale, right?
Then just to transition to anotherone of your papers, you used
another psychometric, which maybeyou can tell us about and also
how you see this field evolving.
This is a recent paper in JAMA InternalMedicine led by Steph Cabral and you.
(24:00):
And you guys used what I understandis the R-IDEA scale to evaluate the
correctness and the diagnostic reasoningabilities of GPT-4 with respect to human
physicians on a common set of cases.
So maybe you could tell us what it takesto build and create a well validated
psychometric and how you see this fieldevolving for evaluating large language
(24:24):
models and human reasoning as well.
I was gonna say what a psychometric is.
Yeah, a psychometric soundsvery intense and scary.
It sounds like from the 1960s whereyou'd, what was it the CIA was doing
experiments on people with LSD?
What is that?
MKUltra, right?
So, it sounds like something from that.
And I guess it sort of is, right?
A psychometric is an evaluationthat is meant to evaluate something
(24:46):
that goes on in the human mind.
Of course, we, Raj, you andI have joked about this.
We have no way to see into the human mind.
It's very Cartesian.
Maybe I'm the only, sentient beinghere, and you guys are hallucinations
that the demons have put in my head.
However, because we can't seeinto people's minds, we have
to rely on external scales.
So, a psychometric is a wayof evaluating that, right?
(25:06):
Classic psychometrics
may or may not be valid, likethe Myers Briggs test, right?
Or IQ tests.
All of these are purportedly psychometricsto tell us something about what
happens on the inside of the head.
So, psychometrics in reasoning havebeen used for 20, 30, let's say 30 years
(25:26):
in medicine, um, and they've been usedactually for a fairly important, though
not exciting from an AI perspective,which is the teaching of clinical
reasoning, so medical education.
Clinical reasoning curriculareally entered medical
schools in the 1980s, right?
Prior to the 1980s, the focus of medical school had
really been on knowledge transfer.
(25:47):
You just need to know everythingthere is to know about diseases.
There's actually a famous ethnographicstudy from the 60s where they actually
go over this, the boys in white,and they don't teach them anything
about how to think, like you justlearn by modeling other people.
In the 1980s, first, theadvent of the computer, right?
People are like, well, you don't needto, or personal computer, you don't need
to know everything, you can look it up.
(26:07):
So, these are people who areway ahead of themselves, right?
This is like the Apple II and theCommodore 64 era, but they're seeing a
future where you can look everything up.
And also knowledge generation andsubspecialization had taken off that
no medical student could know at all.
So, a focus in medical school started toswitch to how to learn and how to think.
In order to teach people how tothink, we needed to have ways
(26:30):
to grade people on how to think.
So, a lot of very smart doctors andreally led by cognitive psychologists
started to evaluate expert cliniciansas well as medical students on what
good thinking actually looked like.
A lot of these are psychological theoriesthat would sound very familiar to anyone
who reads like Malcolm Gladwell or allthe pop psychology books these days.
It wasn't pop psychology back in the 80s.
(26:51):
It was really boring experimentalpsychology, but that's how it goes.
And the psychometrics began to get usedin medical education, and depending on
which psychometric we're looking at, gotmore and more evidence as time went by.
So when Steph and I wanted toevaluate both the reasoning of
humans compared to the reasoning ofcomputers, we hit the literature.
(27:15):
I already knew the literature, but welooked like to see what would be the
best tool to do this, and we selecteda psychometric called R-IDEA, which is
a psychometric The R is revised, butIDEA, which is a psychometric that looks
at the presentation of reasoning as itcan be gleaned from a written document.
And we chose this in particular becauseVerity Schaye and her team at NYU
(27:38):
had done a wonderful study validatingthis just a couple of years ago,
where they used a BERT model to readsomething like 30, 40,000 resident
notes and correlate the strength of thepsychometric with the cognitive load
and also the quality of the resonance.
So a psychometric for reasoning currentlyand probably from here on out is the
(28:01):
best way that we have to evaluate humanreasoning in order to establish validity.
You have to have both face validity, right?So, you need people who are experts say,
well, this makes sense. But then you have to establish
validity in populations, which meansthe population you want to study. You
need to run it on a lot of people. Youneed to make sure that, for example,
there's a dose response, like as peoplegain expertise, it goes up. That you
(28:23):
can see different subgroups, and thatthere's some degree of both consistency
in grading it and internal reliability.
And psychometrics, this is an old field.
The classic one is a Likert scale, right?
The 1 to 5.
Likert scales go back to the1950s, and it's one of these things
that we take for granted, right?
I walk into a bathroom in the airport,and it wants me to read it on a scale
(28:43):
of 1 to 5 on how my experience was.
And what's really funny is you canlook at arguments against Likert
scales in the 1950s, and they'restill the case today, right?
We haven't, we use these things commonly.
That doesn't mean that there's bigproblems in using them to rate things.
These more validated psychometrics arean old older tool, but one that, well,
as you know, or as I hope you believe,is increasingly important as we now
(29:04):
have language models that can at leastmimic the thought processes of humans.
So I don't want to detract because I wantto hear about the results of the study,
but I'm struck by the fact that it soundslike we're doing machine psychology.
And I wonder in what ways humanpsychometrics are informative
or misleading for machines.
So, like this guy, Eliezer Yudkowsky.
(29:27):
Talks about this all the time fromthe perspective of existential risk.
But he's like, what they're actuallydoing is they're able to cosplay or
put on masks and they can simulate anykind of human in any kind of scenario.
And so, while you may be getting insightinto the simulation of the particular
context that you're using the largelanguage model, you're getting very
little insight into the actual underlying
(29:47):
large language model itself, whichmay be important when the model is
used in cases it wasn't tested for.
So, I wonder how you thinkabout that dichotomy.
So that's great.
So, both of those thingsare simultaneously true.
And this is what's really crazy.
They're true in humans too.
So I'm going to giveyou the human example.
So, back in the early 90s, we had thisbrilliant cognitive psychologist, who's
(30:10):
still alive, named Bordage, who didresearch on how expert humans reason.
And what he discovered is that whenexpert doctors reason, they, and this
is right when script theory is startingto be developed, but they look at these
things called semantic qualifiers.
So, for example, and I had a patientlike this last week, I was on service.
If one knee is swollen versus multiplejoints are swollen, so monoarticular
(30:34):
versus polyarticular, it reallychanges how I think about that problem.
If it came on in 12 hoursversus two months, it changes
how I think about that problem.
We now call these thingssemantic qualifiers.
So Bordage, when he describedthis, was looking at the difference
between experts and novices.
Now, in medical school, we teachpeople to use these words from
(30:55):
the first day of medical schoolto describe their problems, right?
So Bordage was looking at a worldwhere they weren't taught this, right?
These are just emergent abilities.
Now we are teaching people to displaythe presentation of reasoning.
And what's really cool is Bordage's study, the smart guy, also was
practicing for like 40 years, solots of time to see things change.
And what's interesting is we can seethat using the presentation of reasoning,
(31:18):
especially in novice learners, does notactually mean they are reasoning better.
So it's very similar to the problemwith the language model, right?
They are cosplaying an expertclinician, but does that mean
that they are actually reasoning?
No.
They're not alive.
They're not humans.
What's interesting, and the reason Ithink it's important is for two reasons.
One, I think that, well, humans are goingto be interacting with these, right?
(31:41):
So the question becomes not, isthe model actually thinking in
this way, but how does the modelcosplaying an expert clinician
affect the human that's using it?
And then number twofor scalable oversight.
So, you use these psychometrics to trainthe models to evaluate human beings.
How does that work, right?
Can the models actually use thisunderstanding, this cosplay of reasoning
(32:03):
to evaluate actual human beings?
At the end of the day though, weend up in an ouroboros, right?
Is the, all of us interact withthe world via text or speech.
And they're just estimationsof what's going on in our head.
So, this is a problem that, this iswhy I love these technologies, right?
We're talking about robot psychology.
Obviously not the same as humanpsychology, but these problems
(32:25):
exist for humans as well.
Do you identify with Dr.
Susan Calvin from I, Robot, robospsychologist and Isaac Asimov series?
Robopsy-
I, I was gonna say I'd like to become a robot.
Would you be Chief?
Chief?
Yeah.
Chief.
Chief, robopsychologist of, of a- For a second I thought you were asking
if I identified with the Roombas.
(32:45):
Yeah.
Oh, yes.
Yes.
Do you identify with the vacuumcleaner that moves around your house?
You know, my, my kids when theydraw pictures of our family,
they draw me, my wife, the two ofthem, our dog, and also Roomba.
Amazing.
Amazing.
Alright, Andy, should wego on to the lightning round?
(33:06):
Let's do it.
So you've listened to the podcastbefore, Adam, so you know how this goes.
So, I'll spare you the introduction.
The first question, which superpowerwould you rather have for a day?
Invisibility or the ability to fly?
(33:28):
Um, well, first of all, I thinkanybody who says invisibility
is going to be up to no good.
So I'm going to go withthe ability to fly.
Well, so that's funny.
Cause that was my answerbecause the ability to fly
exists already as a technology.
So, I would want something that givesme a differential advantage, but
I guess that means that I'm, thatthat's my villain origin story.
Okay.
Yeah, see, that's the difference.
I mean, it would give you acomparative advantage, but I think
it's difficult to use that ethically.
(33:48):
Whereas flying, I mean, I don't thinkthat, what's the unethical use of flying?
Adam, if you weren't in medicine- It's a good question.
It's a good psychometric.
It's a good psychometric.
Oh yeah, the lightning roundof psychometrics, I like it.
Adam, if you weren't in medicine,what job would you be doing?
Oof.
Either a cognitivepsychologist or a journalist.
(34:11):
Science journalist.
Yeah, on brand, on brand.
Yeah, I'm pretty predictable.
Okay, so here's yourchance to be controversial.
Me?
Which medical specialty could be mosteasily replaced by a large language model?
Which medical specialty?
Yeah, so that's the, I think I havea unique perspective on who and what
can be replaced because I think it'sless of a specialty and more of the
(34:32):
cognitive tasks of that specialty.
So, what we're looking for isa specialty that relies mostly
on textual interpretation,
with less formal reasoningand certainly no procedures.
So, I think a lot of people want to saythings like radiology, well, radiology
might be a language model, but radiologyor dermatology, but I think some of these
(34:55):
specialties, so like say oncology, right,where it's effectively looking at large
amounts of data from clinical trials andthen individualizing that to patients.
I don't think that oncologists aregoing to go away, but I think that
a lot of those treatment decisionsthat they now make are going to
be taken over by language models.
So other things like nephrology, maybeeven rheumatology in my field in medicine.
(35:19):
So, anything that relies on a largeamount of text input, text output.
Any of the messy fields, right, wherethere's more epistemic uncertainty.
So like my job, a general internist,the primary care doctor and,
uh, inpatient medicine doctor,those are really, really messy.
So, I think it'll takelonger to replace us.
Is that a good answer?
(35:40):
It's interesting.
Yeah.
I wondered if telemedicine pluspsychiatry was going to be where
you would go with that, but I think.
Oh, well, this is great.
Maybe so psychiatry is fascinating, right?
Because psychiatry is this fieldthat has a different nosological
model than everybody else, right?
So, in psychiatry, diseases aredefined purely on symptom basis.
Whereas in the rest of the fieldthat based on pathological anatomy,
(36:01):
there's a pathophysiologic change.
And there's been thisreally weird trend, right?
Like tertiary syphilis, general paresisof the insane used to be considered a
psychiatric disease until we had actually malaria therapy.
So that Nobel Prize in medicine in27 was giving malaria to people to
cure syphilis, but then penicillinand then it became a medical disease.
So like you peel off things frompsychiatry when they have a pathologic
(36:24):
basis, which is really interesting, right?
Because you could say, well,maybe psychiatry will be easier
to replace because it's purely apatient based nosological model.
But also, maybe the squishiness of thoseborders, the natural grayness of those
borders are going to more inherentlyrequire a human being to navigate that
(36:44):
rather than a language model, becauseyou can't tell the language model
where bipolar one and just hypomaniabegin, because they're human creations.
So I don't know.
Adam, I think I know the answer to thisquestion, but I'm probably, I wouldn't
be surprised if I'm totally wrong.
Who is your favorite historical doctor?
(37:04):
Who do you think my answer is?
Osler?
No.
No.
Okay, I'm wrong.
I'm wrong.
I got a surprising answer.
Okay.
Do you know who - you're way wrong.
Do you know who Pierre Louis is?
You do, of course.
Tell us about Pierre Louis.
Pierre Louis. Yeah, so, so Pierre Louis is,well, for you guys, he's like the first
(37:26):
proto informatician, proto epidemiologist.
So, Pierre Louis is, I'd say, in thesecond generation of modern medicine.
So, he's practicing in the 1820s and 1830s.
And he is, uh, friends with Laplace.
So presumably, we don't know, there'sa paucity of primary sources from
this, but presumably he's hangingout with these nerdy, you know, maths
(37:49):
guys, and is like, well, there's noreason that I can't do this in people.
So, in the 1820s, he does some studies on,like, the distribution of yellow fever to,
like, basically saying, like, look, if Ilook at a lot of different patients and
I collect objective data, I can discover ways,
and you have to keep in mind, he has nomodern statistical methods, but I can
discover ways just using simple arithmeticthat I can help my current patients.
(38:12):
And then in the late 1820sand early 1830s, he decides
to be a pain in the butt.
So again, I love anybody whowants to be a pain in the butt.
In Paris at the time, therewas an obsession with leeching.
So there's this famous doctor,Victor Broussais, the vampire of
Paris, that was his nickname.
And people were getting like50 leeches at a time, having
(38:33):
liters of their blood removed.
And women had dressesin the style of leeches.
It was like an obsession,especially in Paris.
And Pierre Louis was like okay,
so we're saying this is a greattherapy, especially for pneumonia.
Yeah.
What if I actually look at the data,and he had his little notebook where he
kept all this data on patients. And hesaid, look, if bloodletting is so great,
(38:53):
then in my patients with pneumonia,if they got bloodletting early, they
should live a lot longer than the peoplewho got the lifesaving therapy late.
And he just sat down, and youcan see his journal still exists.
The simple arithmetic thathe went through, and he was,
like, shocked at the end.
He was like actually, the peoplewith the bloodletting are dying
at a much higher rate than thepeople who get late bloodletting.
This doesn't make any sense, but PierreLouis, being a wonderful pain in the butt,
(39:17):
just went out and published this andaccepted the large amount of blowback.
And it's a great example.
He didn't really have that much impactat the time, but people were not ready.
Modern statistics didn't exist.
They were like, that's cool thatyou added up some numbers, man.
Who cares?
Again, it's the 1830s, but hehad this small core of dedicated
(39:38):
followers who would go on toinvent, like, regular epidemiology.
So, Pierre Louis the prototypical doctor,who's a pain in the butt, combined
great patient care with a knowledge ofpopulation medicine and statistics, and
really brought about the modern world.
Nice.
No need to apologize.
I think the Mark Cuban episodealready got us our TVMA rating.
(39:58):
I think that was a greatepisode by the way.
Andy, I want to ask just onewrinkle on that question.
So that is not the answer thatI was expecting, and I love it.
And it's a great story in so many ways.
Maybe I could just ask a follow-up lightning round question.
I don't think we've ever donethis, but who is your favorite
historical fictional doctor?
My favorite historical fictional doctor.
(40:19):
Oof, that's a good one.
Or it doesn't have to be historical.
It could be contemporary fictional doctor.
From Dr.
House to, whoever.
Oh, can I do a Star Trek doctor?
Can I do a Star Trek doctor?
Yes.
And why. Yeah.
My favorite fictional doctoris, oh God, I'm such a nerd.
So I'm going to apologize in advance,but that would be The Doctor.
(40:40):
So, the hologram doctor from Star TrekVoyager, because he is a computer doctor
who learns what it means to be human.
Classic Pinocchio story.
Nice, nice.
To show you that I can also do in contextlearning, I'm gonna do a, I'm gonna
call an audible here and not ask you thequestion that I had planned to ask you.
Given that you're a hardcore medicalhistory nerd, I want to ask this
(41:01):
overrated, underrated question.
So, Hippocrates.
Overrated, underrated as thefigure in the history of medicine.
Overrated.
Can you say a little bit more why?
Oh, I mean, so there's a move.
He's got good press.
So, there's a, first, Hippocrates wroteonly a tiny fraction of the documents.
There was a whole schoolcalled the Hippocratic School.
(41:22):
That stretched like hundreds of years.
Many of which he was not alive, so.
And a lot of the Hippocratic ideasare actually ideas that were popular
generally around the Mediterranean.
So, Hippocrates just gotreally good press, right?
Great ideas in terms of naturalisticmedicine, but we have papyri
from ancient Egypt that show thatthey were practicing naturalistic
(41:43):
medicine like 1,500 years before.
So overrated for sure.
Awesome.
I knew you'd have a greatanswer for that one.
Glad I asked it.
Alright, Adam.
You got more?
I'm ready.
We have one, one last one, and Ilove that I get to ask you this.
What would Michel Foucaultthink about ChatGPT?
Oh, yeah.
What would Michel Foucaultthink about ChatGPT?
(42:03):
Foucault would, all the poststructuralists, like Derrida
and Baudrillard would all lookat ChatGPT and say this is what
we were talking about, right?
This is the final stage of human culturalevolution, which is computers mimicking
humans that are being consumed bycomputers mimicking humans and being
evaluated by computers mimicking humans.
(42:25):
So I think the post structuralistswould have, I mean, Foucault
is a very pessimistic man,
so been very pessimistic aboutit, but also not surprised.
Excellent.
I think, Adam, congratulations.
You survived the lightning round so wehave just a final few questions here.
And these are more bigpicture concluding questions.
So thoughts to, leave us withand to leave the listeners with.
(42:48):
So, I've seen you do this personallyand I've been very impressed by it
and so I was hoping you could really give us your philosophy and your
approach here which is there's a lotof medical students and residents now
including many of whom you're trainingand you interact with who are very
interested in artificial intelligence.
They want to get involved and they'rewondering what the best way is for
(43:10):
them to learn about AI and to startpursuing research for example in AI.
So maybe for the med students andthe residents, what do you think
is the best thing that they can doright now, if they're interested in,
and getting involved in medical AI?
So that's a great question.
And I mean, to hedge a little bit,it depends on their skillset, right?
If this is a, if you have like a Ph.D.in computer science before this, it's
(43:34):
going to be like basic computer scienceresearch, but generally we are in a time
where what we really need in this field
is collaborations between thepeople building the machines
and the people who understandwhat it is to practice medicine.
And in particular, the invisibleparts that you can't see, like
what's going on inside the humanmind and inside patient minds.
(43:57):
So, my advice for any resident who wantsto get into this would be to focus on
what you really bring to this, whichis your medical knowledge and your
medical expertise and partner, right?
Partner with computer scientists,
labs that are doing this
research to bring your perspective.
I generally disagree with this ideathat we need doctors to learn like
the principles of machine learning.
(44:19):
Like I use CT scans all the time.
From my perspective, a CT scan is amagic doughnut that a patient comes
in and I get a cool image out of.
I don't need to understand the physics
of a CT scan in order to bettertake care of my patients.
I do not think that people need tounderstand like the complex architecture
of neural networks in order tounderstand how to use a language
model to take care of their patients.
And from a research perspective what,if anyone is listening, what you
(44:40):
are bringing is your understandingof the context of medical care.
Is there anything specifically you wouldchange about medical education to better
prepare students for this AI future?
Well, uh, I just so happen tobe leading the task force at
Harvard to figure this out.
It's a great question because thefairest answer is no one knows.
At this point, a lot of the, and Idon't know if you guys would agree or
(45:03):
disagree with this, but a lot of thethings that have come out of language
models are surprising and, like, thisword is overused, but emergent, right?
Things that are notnecessarily programmed in.
And I have an old person brain thatwas trained, even, like, you know,
the Internet, I had computers prettyearly, but I was trained, my brain
was trained in really an analog world.
And the new generations that are comingup used language models in college.
(45:27):
My children had ChatGPT tellthem stories this morning.
So they're using language modelsfrom the ages of three and five.
So, I think that right now, from a medicaleducation perspective, the best thing that
us old people can do is keep an open mind,look at how our learners are using it.
I personally think alot of the benefits of
(45:47):
language models, and Raj knows that Ithink this, are like, unexpected, right?
I don't think the benefits of languagemodels are going to be, oh, look, I
have a scribe that can listen to me.
Oh, it can write my notes.
I think a lot of the benefits aregoing to be, oh, look, all this highly
contextual text that we produce can now beunderstood in mass and give us feedback.
And from a learner perspective, like,oh, look, I can have a personal tutor
(46:08):
who can tell me what I'm doing welland not, and give me personal teaching.
So, just keep an open mind.
Generally, I think we should focus onthose things that make us most human.
So, I think communication skillsnavigating difficult situations and
reasoning, especially other metacognitivestrategies, because like, I think
(46:29):
the future with language models isnot going to be like what the techno
futurists think, oh, what is it?
Martin Shkreli created Dr.
Gupta, right?
Like it's anyone who uses that.
I assume people are just using it tomake fun of it, but it's a terrible idea.
But language models aregoing to have a huge impact.
It's just going to be a lot weirderthan any of us predict right now.
So let me ask thecompliment to that question.
For a long time, there was this annualcontest called the Loebner Prize, where
(46:53):
people would write a chatbot that wouldtry and kind of pass the Turing test.
Every year, they would actually giveout another thing called the Most
Human Human Award, which is there werehuman contestants in this, and whoever
seemed the most human would win this.
So.
What should human doctors focus on as AIcomes more and more into the clinic and
is more involved in clinical practice?
(47:14):
What would be the equivalent of themost human human prize for a physician?
Yeah, and this is a great questionto complement the journal that we
all work for, NEJM AI.
Everyone read John Chen's recent pieceon the, uh, so, my gut feeling is to say
communication skills through difficultsituations. But then I read that John
(47:35):
Chen piece, right, where he had a reallydifficult conversation about a patient
with advanced dementia who was going toget a PEG tube, and he thought it was
a bad idea. And the conversation withthe wife went very poorly, so he decided
to recreate that conversation role playing. And then he discovered that,
I believe ChatGPT said was like something like, I understand that
(47:55):
your husband's a fighter, but there'smany different ways to fight that might
be doing everything, but fighting might also be the bravery to go for comfort at
the end of life or something like that.And I feel the same way that Jonathan
did when he had that experience, which islike, wow, this is more human than human.
So, Andy, that's to answer yourquestion, like, I don't know, right?
We've taken for granted that like sittingat the bedside holding someone, obviously
(48:16):
only a human being can do that rightnow because we don't have robots, but
some of this empathetic communication,even if it is just a simulacra of
human caring, seems to be effective.
So I don't know.
And this is from the guy who'strying to do this at Harvard.
Um, but before we head to the lastquestion, I would just like to say
that you have won the vocabularyaward for AIGR guests so far.
(48:37):
I feel like the level of Latinhas been higher and the number
of SAT words has been fantastic.
Is it simulacra?
Yeah, that and lots of greatSAT words in this episode.
So, so I guess like just zoomingout, why are you doing this?
What about this whole AI endeavor
gives you cause for optimism,and where do you hope it's going?
Well, it's funny thatyou think I'm optimistic.
(48:58):
Raj, am I optimistic or pessimistic?
Uh, I think you're optimistic.
I kind of agree.
I think you're an optimist.
Yeah, so why am I doing this?
If you look at the grand trendsof history, and let's just stretch
200-something years, right?
There has been a trend in medicinesince Pierre Louis on breaking
(49:25):
human beings down into data, right?
The word data itself in medicinereally only comes from the 1980s.
We were using things like thefacts of disease before this.
But to understand humans by breaking themdown, which, to be clear, has allowed
for great advances in medical care,because we can run advanced clinical
trials, because we understand lots ofsciences better, we can do amazing things.
(49:47):
We can knock out genes andput a pig kidney in another
human being, for God's sake.
Like, that's amazing.
But there have been real costs to thisapproach as well, in the dehumanization
of patients, in the isolation of humansfrom their medical care alienation,
you can see that we have this weirdmedical system where it can do more
than any medical system can in thepast, but people trust their doctors
(50:09):
a whole lot less than like the 1950s.
So people truly feel like isolated andput apart from their medical system.
I, as a historian, did notsee large language models.
This is a surprise to me, right?
I was very bleak about what thefuture was going to look like, that
it was going to be more reductive.
More breaking people into individualpieces, and now I see a technology
(50:33):
that, again, it's not human, I don'twant to anthropomorphize it, but that
understands, understands, air quotes,seemingly understands a lot of these
contextual factors and the things thatmake us human and give us meaning.
In our own sense of disease and oursense in the world and our sense of
community and what, what disease andsuffering means in a technology that
(50:53):
could really change the trajectory ofwhere I thought medicine was going,
which was a place that like, in manyways I'm, I'm very old fashioned, right?
Like I reflect the values of medievalphysicians and ancient physicians and
many modern physicians too, right?
There are things thatare standard over time.
And I see these technologiesas a way to get us
back to some of those core thingsof what a physician has done, while
(51:17):
not losing many of those advantagesthat come with big data, that
come with collecting information.
So, AI putting the humanity backin medicine, if I was gonna
do some summarization there.
Yeah.
And I'm also deeply pessimisticthat people are going to use
these technologies the wrong wayand make our lives much worse.
So that's what, that's why I'm sucha crazy person doing all this work
(51:37):
because they're powerful and what Isee happening right now with LLMs being
rolled out, why are we using LLMs todraft the first message to patients?
Isn't like, communicatingwith patients one of the most
fundamental things that humans do?
And this is what we're usingthese powerful technologies for?
So, I'm also motivated by, ifI can swear, like a deep fear
(51:57):
that we're gonna fuck this up.
I mean, but isn't that interesting?
Is the reason for that becausethat's the part of the job that
physicians enjoy the least?
Or why is that the point of entry there?
Well, it's not that it's talkingto patients is not inherently
what physicians enjoy the least.
It's because we've built a systemin which physicians have been
converted into a data entry clerk.
So, because of that, now communicating withpatients has been reduced to like this.
(52:20):
You know, sitting in your pajamasat the computer doing that.
So, of course, this isAmerica in the 21st century.
So instead of rethinking the system thatmakes us miserable, we're like, hey, let's
make a computer do this important part.
Got it.
Adam, this has been fascinatingand thanks so much for coming on.
It's been like a pretty broadranging discussion, and I know
I certainly learned a ton.
So, thanks again.
(52:41):
Well, thank you guys for having me.
And to think I was trying tolike limit my vocabulary usage.
Yeah.
Thanks for taking it easy on us.
Yeah.
Yeah.
Oh, Raj knows what I'm like.
Adam, that was, that was amazing.