Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
(00:03):
I like to always begin here with Alan
Turing because
I kind of think when Alan Turing first
asked,
can machines think?
He probably didn't envision a future
where,
a simple voice command to a phone could
reveal
the best restaurant, say, in Cincinnati, where I
am. But last week was restaurant week here,
(00:26):
and I enjoyed a meal that was recommended
by AI.
And when we start to really think about
this, the advancements
of AI from when Turing actually
wrote his seminal paper,
to today, it's it's pretty pretty remarkable.
We've witnessed
chess and jeopardy champions bow to the intellect
(00:46):
of machines like Deep Blue and Watson.
And when Siri was introduced,
AI became not just a tool, but a
companion.
Learning our routines and our case in music,
even, you know, monitoring our health that we
all wear some some sort of device like
that.
In my own kitchen, Alexa actually orchestrates the
(01:08):
daily rhythms. She's
the reasons that alarms get set or lights
get turned off. She's also the reason that
gummy bears tend to always make it on
to my grocery list.
She orchestrates,
all of my children's impromptu dance parties.
And while their their musical tastes are still
developing, it's really impressive to see how naturally
(01:31):
they interact
with the technology
that when I was much younger felt like
science fiction.
So I guess my point here is that
we're not just spectators in this AI,
evolution.
We're participants
and especially in medicine.
Today, I'm not here to ask you if
(01:51):
machines can think, but instead, I wanna share
how I believe it can be harnessed
to revolutionize
the way that we practice medicine
and how we are
training our future physicians.
But
before we go into all of that transformative
effects of AI in medical education, I think
(02:13):
that it's really important
to
level set. We all need a very clear
understanding
of what AI is.
AI,
you can think of is the art of
creating
smart machines.
It's a little bit like computers
trying to play the role of the human
mind.
(02:34):
It's not just about following a set of
instructions, but instead
AI is machines
learning from experience,
adapting to new inputs,
and doing his,
tasks that have historically
been done by humans.
When we're talking today, we're not talking about
robots taking over the world. We're talking about
(02:54):
the type of AI
that can suggest
what type of restaurant you might like to
eat at
or the kind of AI that helps a
doctor
diagnose a disease
faster than ever before.
This type of AI is not just programmed.
It learns,
it grows, and it evolves.
(03:17):
In the context of AI, you you may
or may not have heard terms like machine
learning, deep learning,
natural language processing, and more recently, probably neural
networks.
Each of these is a different type of
technique
in artificial intelligence.
So machine learning has a lot of parallels
in ways that we train future pulmonologists or
(03:39):
critical care fellows
or residents.
And
so think about it this way. You give
a system a whole lot of data. So
this could be anything from chest radiographs
to,
residency evaluations.
And the system learns to identify
specific patterns and make decisions
just as a resident learns to navigate the
(03:59):
multifaceted
nature of diseases.
This process of learning can happen in, 4
different ways. There are actually more, but these
are the 4 main ways that you'll probably
encounter. Supervised learning, unsupervised
learning, semi supervised learning, and reinforcement learning.
Interestingly,
several several of similarities can also be drawn
(04:21):
here between, these processes in the way that
trainees are rated on sub competency milestones for
ACGME.
Supervised learning uses
labeled
data. It's like having a seasoned physician looking
over your shoulder and saying up this here's
pulmonary edema, and that's a new orthorex.
In this context,
(04:42):
the AI is provided with labeled data from
which to learn
from. The AI then takes this data
and,
learns to make its own accurate
predictions or diagnosis
with unsupervised learning. However, there are no labels
at all,
just raw data. It's like the AI being
(05:03):
on a solo rotation without any labels or
direct oversight.
The AI has to sift through everything
and
has to learn to identify
trends or patterns hiding,
finding hidden structures. And
in this way, the AI can start to,
uncover insight that maybe even humans can't identify.
(05:26):
Semi supervised learning is exactly what it sounds
like. It's a combination of both. Right? And
this can be thought of in the blend
of instruction
and autonomy
that we give our fellows.
And finally, then we have reinforcement learning.
This is like the AI
night on call in the ICU.
It learns in real time through trial and
(05:47):
error, and it receives feedback,
similar to
the natural consequences
of clinical decisions.
Every interaction,
every outcome
sharpens
that predictive power.
So let's take a couple of examples, and
I like
to use images because I think it's really
(06:08):
easy to understand.
An AI experts historically love to use
images
of fruit or as you'll see in a
couple of examples,
cats.
So so for supervised learning, imagine that we
have a bunch of images of apples,
and we have a database
and each image
(06:29):
of the apple, which is different, is labeled
as apple.
And so we train a model on this
and we say, look at all of these
things. These are all apples and the the
model learns. So then when you actually use
the model and show it a novel picture
of an apple, you can say, what is
this? It'll say, it's an apple.
Unsupervised learning,
it's the similar concept. You have a bunch
(06:50):
of pictures of different types of fruit, apples,
bananas,
peaches,
and
you
expose the AI to all of these. But
the AI has to figure out the trends.
Right? So it it figures out, oh, these
things are yellow. These things are round. These
things have leaves. These things are a shade
of orange,
(07:11):
and it can categorize them. It doesn't know
apples and apple, but it knows it goes
with this category. It doesn't know what bananas
or banana, but it goes with this category.
With reinforcement learning,
it's a little bit different. It's all in
real time. So you train a model on
a bunch of different images, and then you
(07:32):
show it a picture of an apple. Say,
what is this?
And it probably gets it wrong the first
few times. Like here, it says it's a
mango,
and you provide reinforcement
learning. And you can think of this as
like
cookies and zaps. Right? So it says you're
it's a mango. No. And you zap it.
Right? Not really, but that's that's kind of
(07:53):
the the idea reward or,
or or or punishment. And so the model
learns from that. Okay. And so then the
next time you show it an apple and
you ask it, what is this? Say it's
an apple. Yes. It has learned, and this
occurs over and over again.
The next,
(08:14):
kind of if you drill down, next level
of this is deep learning, and this takes
this whole process
a step further.
It's a specialized approach within machine learning that
uses what we call neural networks. And you
can think of these as a series of
algorithms and they're modeled after the human brain.
(08:35):
These,
neural networks are designed to recognize pattern with
incredible depth and nuance.
They're made up of layers of interconnected nodes
or artificial neurons, although that is where the
similarities between the brain and end. And each
of these layers process information and learn from
it.
So let's take another example of images,
(08:57):
cats and dogs.
So, imagine we have a stack of photographs,
a mix of cats and dogs, and we
want our AI to sort through these and
be able to recognize which is which and
also identify
any novel picture it encounters.
K? So here we have a very simplistic
neural network, and it has layers of interconnected
(09:18):
nodes. And again, each of these layers is
gonna be a specialist.
One layer may be an expert in teasing
out textures, another in discerning shapes.
Another could be on, I don't know, identifying
colors. So each of these layers build on
the work of the previous layer and it
refines the AI understanding
(09:38):
until it processes the entire image and it
can say, oh, it's a cat with the
same confidence that a child would.
So
in pulmonary medicine, for example, deep learning with
neural networks,
can analyze thousands of chest x rays learning
to detect subtleties,
distinguishing between a wide array of conditions from
(09:59):
pneumonias to pulmonary fibrosis,
potentially with even greater accuracy
than ex experienced radiologists.
So in a very short time,
deep learning has become the powerhouse behind the
most cutting edge AI applications that we're seeing
today.
And this takes us to a pivotal moment
(10:21):
in the AI timeline.
The founding of OpenAI
in 20/50.
So OpenAI's
mission has always been to push the boundary
of what AI can achieve, and they define
this by, trying to to discover what's called
AGI,
artificial general intelligence.
We can get to that later.
(10:43):
But they focus
on building and refining a special series of
neural networks, and these are called generative pre
trained transformers
or GPTs.
Interestingly enough, the transformer architecture
was not discovered
by OpenAI, but instead Google. But it's the
way OpenAI has applied it that has been
(11:03):
so transformative.
Now these these types of, neural networks, they're
not just algorithms. There are vast reservoirs of
knowledge and these are capable of understanding and
generating
human like texts.
And in late 2022,
OpenAI introduced chat CPT to the world, A
tool that was so advanced that it could
converse and answer queries and even simulate reasoning.
(11:26):
And I'm sure that each of you has
at least heard of chat GPT
and likely
has interacted with it to some to some
extent.
But if we take a step back,
this reminds me
of a time when I experienced something
completely
new that really shifted my perspective.
(11:49):
This was back in college when Napster first
onto the scene,
and it's dating me, but
I didn't at that point really conceptualize
the impact that something like Napster
would have. Instead, I was really, really excited
about how awesome my summer of 2000
playlist was going to be.
(12:11):
And then everything changed.
The release of Napster was a watershed moment
for the the music industry.
It completely upended traditional business models, and it
paved
the way for the rise of digital
media and completely transformed
how we consume them.
For me, the release of chat GPT feels
(12:33):
like even more of a seismic shift.
This isn't just another step forward in something
like natural language processing.
It is the kind of we that disrupts
our notion of what AI can do, setting
the stage for a new era of human
machine interaction.
And what I'm most excited about
is how this new technology
(12:55):
could transform medical education.
So
what's the first thing that we do when
we encounter such a tool? We play with
it. Right? We test its limits, and we
ask it what the meaning of the world
is.
So I wanna give you an example. I
did this too, but because I really, really
like numbers, I asked it to write me
an infinitude of primes with every line that
(13:17):
rhymes.
And what amazed me at the time now
now mind you, this was last year,
was
that it it did it,
and not only that it was correct,
but it was also
witty.
Yes. I think I can, though it may
take a clever plan. I'll start by noting
(13:37):
Euclid's proof, which shows that primes aren't just
aloof.
Assume we have a finite list of primes
that none have been missed,
multiply them altogether,
and add 1 just to be clever.
So it goes on flawlessly rhyming or marrying
the rhyme of verse and the rigor of
(13:58):
mathematical proof.
So
I show you this not just because I
think it's, you know, kind of fun, but
I want to use it as an example
to explain how AI can accomplish something like
this.
At its core,
GPT is a language model, and it's designed
to predict
the next word.
(14:19):
It's at its core. It is just a
fancy auto complete.
So a user inputs a prompt into GPT
and GPT
generates
coherent and fluent text that reads like it
was written by a person.
GPTs
can do this because their training
is on massive massive data set of texts
(14:41):
such as the common crawl.
The common crawl is a collection of 1,000,000,000
and 1,000,000,000 of web page from
Reddit to
Twitter
to all sorts of things. Right? And it
has a vast amount of data and this
allows GPT
to learn the patterns of human language
(15:03):
and predict the next word in a sentence
similar to the way that your phone can
do that,
based on, you know, text that you you
you tend to off frequently,
send. But what sets GPT apart is that
it can generate entire sentences, paragraphs, or even
articles,
that sound as though they were written by
(15:24):
humans.
The last year of technological advancements
with generative AI has been an absolute
whirlwind.
OpenAI's GPT series were now up to GPT
4
turbo,
and we're about to get 5 Anthropics Claude,
where Opus is rivaling that. The open source
(15:47):
models like,
the French
Mistral series,
Falcon,
Meta's llama series. Last week, Meta,
Llama 3 dropped, and it's absolutely amazing. So
this goes on and on and on. These
models have progressed at a really, really rapid
rate, and it's not just in the area
(16:07):
of text, but also images.
The majority of the slides that I've shown
today, and I will show to you,
have been developed with the a of aid
of AI models such as DALL E 3,
Midjourney,
stable diffusion.
This image here, which looks like a professional
headshot of me, was generated
using
a combination of text prompts and real life
(16:30):
photos of my of my cell.
But AI can generate more than text and
images.
Let's consider for a moment the unique qualities
of human speech.
Its ability to convey not just information,
but emotion,
intention,
and personality.
We are now capable of harnessing AI to
(16:51):
replicate these qualities.
In fact, with less than
60 seconds of audio record
recording, it's possible to train an AI model
on your own voice, enabling it to generate
highly realistic speech in a range of languages
and tones.
(17:11):
Now here's the twist.
For the past few minutes, you haven't been
listening to me.
Or to be more precise, you haven't been
listening to the real me.
The voice you've been hearing is an AI
generated replica of my voice.
So
we always that was,
you know, something that that we we made,
(17:32):
and we actually ended up using this and
and testing it
originally
by putting it into,
a chatbot and creating a bot that we
could type in and it would just talk.
And we we actually had it call my
mother,
and it's my mother talked to this, to
this bot thinking that it was me for
(17:53):
a half an hour. And after that half
an hour, it wasn't because she figured out
that it actually wasn't me. I just I
felt so badly for Turing testing my mom
that we had to we had to stop.
But my mom has kind of been become
our witness test for all of our new
technology. We just basically test it on my
mom and we see what happens.
But AI
Here we go.
(18:16):
So another example of this,
just a couple week months ago,
Soera,
the most advanced
text to video model was released.
Once again, accelerating
our paradigm
of what's possible with AI.
Now I wanna start to shift. I want
(18:37):
you to start to think about the possibilities
for AI in medical education.
Imagine a world where medical trainees
can interact with virtual patients,
practice complex procedures
on digital models,
and receive real time feedback
from an AI system.
(18:58):
The traditional models
and methods
of
of acquiring knowledge
are being transformed,
ushering in a new era of medical education
that was once only
imagined.
So
(19:19):
you may have heard,
about the GPT models,
taking and passing step 1.
There's been tons and tons of papers published
on this. I particularly like this one,
because it goes through the different ones and
and and shows it kinda longitudinally. Also,
this, was done in 2023
(19:40):
by,
a bunch of people from Microsoft and OpenAI.
And what's remarkable to me,
was it didn't achieve perfect scores, of course,
but GPT 4 did surpass the national average
of medical students.
Moreover,
(20:00):
you know, this this blow this blew my
mind the first time I saw this because
GPT 4
wasn't, like, exposed to a bunch of USMLE
style questions or even old USMLE questions during
its training.
Instead, it learned from vast Internet database like
the common crawl, and this contains the best
and the worst of humanity. Right?
(20:22):
It wasn't said issue after issue of the
New England Journal of Medicine,
but it did watch every single episode of
house.
So
it goes further though. In this paper, they
actually,
they they actually go and look at the
the the questions that GPT got wrong. And
when asked to explain its reasoning behind the
(20:44):
incorrect answer,
the rationales
provided
by g p g p t 4 were
given to physicians, and those physicians actually agreed
that under certain circumstances,
GPT's answer
could be correct.
And, not too long ago, early in, 2023,
(21:04):
the New England Journal of Medicine started putting
out this new podcast. It's called AI Grand
Rounds.
This particular episode is really, really good. Some
of the other ones are not too bad.
At the end of March last year, they
did an entire episode on GPT,
and this had a bunch of different use
cases in medicine. And, they particularly,
focused this interview,
(21:25):
with Peter Lee, who's the corporate vice president
of Microsoft.
And
he was describing how in Boston,
they put GPT into the hospitals and asked,
physicians to use it in their day to
day work flow. And one particular case, really
stood out to me. They were describing,
how they, you know, ask physicians to use
this, and there was this one oncologist.
(21:47):
And this oncologist had a patient
with late stage pancreatic cancer. And this this
patient really wanted,
surgery and experimental immunotherapy.
But the oncologist,
you know, had determined that this really wasn't
the right path, and it wouldn't
lead to extending the patient's
(22:08):
life.
In fact, it could have have
negative effects.
But
the physician was having a really, really hard
time,
kind of communicating this to the patient because
the patient was very fixated,
on on this one form of of, treatment.
So, the doctor came back to Lee's group,
and Lee's group was like, hey. Like, why
(22:29):
don't you,
interact with GPT and see see, you know,
what comes out of it?
And so,
the oncologist,
told GPT everything.
And,
GPT said, you know, provided some examples about
how to talk to the patient and explain
to her,
why they weren't gonna go with the surgery.
(22:51):
And, it produced it produced great results and
great ideas, and the oncologist used these and
moved back to the patient. And the patient
ultimately,
you know, they were able to come to
an understanding and
the patient agreed to not go for the
experimental
treatment.
And at the end and that's that's amazing
in and of itself. There's been a lot
(23:11):
of studies showing that the patients actually prefer,
explanations
by,
large language models like GPT.
But what was really stunning to me was
at the end of all of this, Peter,
Peter Lee reported that the oncologist went back
to GPT and and and thanked it. And,
GPT responded,
(23:31):
well,
what about you?
How are you holding on?
Are you getting all of the help
that you need?
So I've, you know, given you some examples
of some pretty cool emergent behaviors
that have been exhibited
by
AI,
and specifically large language models. But it would
(23:54):
be really unbalanced and irresponsible if we didn't
also take some time to address the undesirable
hate behaviors
which have surfaced.
One such behavior that you're probably
most familiar with would be hallucinations.
And hallucinations are a phenomenon in which a
model generates text that appears to be coherent
(24:16):
and meaningful,
but it's not grounded in reality.
And this is really, important because the models
produce this information with such confidence.
It's important. It's a really important thing, I
think, for us to think about,
especially in medicine as it can lead to
misinformation
or misguided
(24:37):
recommendations.
And this could have detrimental effects. Right? Not
only in physician
training and education,
but also in patient care.
So let's start with a more benign example.
Early in 2023,
I asked,
GPT, and this was the 3.5 model. This
was before 4 existed,
(24:57):
for a list of articles on natural language
processing in medical education because I I was,
doing a lit review, and it responded
education because I I was, doing a
lit review and it responded really confidently, and
it gave me a bunch of citations,
from real medical education journals authored by
real people.
So here's what it returned. And I'm looking
through these, and I was like, oh, okay.
(25:18):
And I noticed this one here. So Dan
Schumacher. Dan Schumacher is a colleague colleague of
mine. He works, right across the street from
me at Cincinnati Children's Hospital.
And,
I'm looking at this and I'm like, I
don't I I know Dan really well. Like,
we're we're in a lab together. I know
his work. I don't remember seeing this this
this paper.
That's odd.
And then I saw this one by Sanjay
(25:39):
Desai, who's the head of the AMA.
And again,
like, know that work, but
don't remember seeing him ever publishing anything about
machine learning.
So I was a little bit
taken aback, and it turns out that it
responded
with references to imaginary papers authored by real
(25:59):
people with relative relevant experiences. Right?
So did it lie to me?
Let's consider why this happens.
At its core, remember,
these large language models are just fancy
auto completes.
The training data
included information like doctor Desai's affiliation with the
(26:21):
AMA,
and it included
information
about how the AMA is outstandingly
interested in medical education and precision education.
It included doctor Schumacher's scholarly pursuits in qualitative
analysis, which he is known for, and narrative
assessments in medical education.
But remember, these models don't actually
(26:43):
know
anything the way that humans do. They don't
understand truth.
They can't verify with authenticity
the information that they're generating.
When GPT
or any large language model
creates a citation
or generates any output,
it's not checking databases or confirming with PubMed,
(27:07):
right? It's estimating in this example, what a
citation
ought look like
based on the patterns that had been trained
on.
Sometimes it gets it right. Other times it
combines real elements, genuine journal titles, real researchers
names
into a citation that sounds
plausible
but is entirely fictitious.
(27:28):
This is a major barrier and we have
to be able to
navigate it. And ultimately, we have to be
able to control for it if we are
actually going to apply large language models,
generative AI, or any AI
to high stakes complex environments like patient care
and medical education.
But this seems to be something that is
(27:49):
really hard to tackle. So how do we
begin?
I'm gonna give you one example
of
how we have started to approach this.
So one night,
I was sitting around
with many of my colleagues. We have this,
kind of, like, monthly,
big meeting of our lab, and this included
(28:10):
Dan Schumacher.
Right? And we were kinda half joking about
how AI is, like, padding his already impressive
publication record.
And this but this ended up leading to
a really serious conversation. Right? How do we
harness
technology responsibly
when it shows traits like hallucinations or, I
mean, potentially worse, right, like power seeking behavior
if you can imagine.
(28:31):
So ultimately, we,
we considered scenario based strategic planning.
This is a method that's embraced by visionaries
and strategists from the boardrooms of Shell to,
the command centers of the Department of Defense.
It's designed to navigate the uncharted
potential of the future.
And we thought, hey, like this this approach
(28:53):
is good enough for them. Like, maybe we
can apply it here and offer a unique
lens to envision the role of AI in
medical education, you know, in the future. So
let me let me take a minute to
describe
education, you know, in the future.
So let me let me take a minute
to describe exactly how this works. Think about
it as the art of preparing for multiple
tomorrows. It begins with developing a detailed description
of multiple possible futures. Right? So you're gonna
(29:16):
make a lot of these.
And these aren't just wild guesses. Right? But
these are informed structure narratives that explore different
outcomes in the future based on current data
that we have at our hands.
From there, the forces that shape our world
are brought together.
So they'll bring in people from science and
agriculture, finance, medicine, etcetera.
(29:38):
And
they're brought together and they're assigned
to one of these scenarios.
You have a group of all of these
people assigned to scenario 1, a group of
all of these people to scenario 2, and
they tease out the opportunities and the risks
that each of them might hold. It's a
collective effort, right, to kind of forecast and
strategize and adapt.
And by examining each of these results,
(30:01):
of each possible future side by side, you
can start to see common threats
across all of them.
And the trends that emerge are then used
to shape policies
and practice and even set research agendas.
Now
we're a bunch of academics and we don't
have access to the same resources as the
(30:23):
Department of Defense, so we tried to do
the next best thing that we could come
up with.
We actually took several different large language models,
and we had them play the role,
the various roles in this process.
And what happened was it resulted in 4
distinct scenarios of what 20 40 might look
like in the realm of medical training with
(30:45):
AI having a very strong role in it.
So I'll kind of share really briefly what
each of these were. The first role or
the the first world,
or future was one of AI harmony.
And in this future,
AI was embraced across all society.
It enhanced medical education
(31:06):
and health care.
It provided opportunities for personalized learning,
and personalized
health advice.
But the only way this was possible
was because there was extreme focus on ethical
management,
and this was crucial
for the responsible use of AI and also
(31:27):
equal benefit of distribution.
The second world, was exactly the opposite. It
was one of AI conflict.
And in this future, AI had been weaponized
to lead, and led to,
compromised health care.
Trust was eroded. Physician stress was through the
roof.
Disinformation,
(31:48):
was being spread across the world and disrupted
medical knowledge, medical education,
and health care.
And that ended up suppressing critical thinking and
diverse,
perspectives.
The next one next one was one of
ecological
balance. And in this future, there was, a
(32:09):
very heavy focus on
AI's impact on the environment.
AI aided in informed decisions
that were directed at societal benefits,
not necessarily
the individual.
And this transformed health care to prioritize wellness.
However, it also resulted in a lot of
(32:31):
ethical dilemmas
in how we align global health and AI
initiatives, but still balance individualized
care.
Another one that arose
was and this is the last one, was
one of existential risks where uncontrolled
AI propagated
(32:51):
and,
you know, was an existential risk.
And what resulted
was this prompted a shift away from technology
reliance.
And doctors went back to things like pen
and paper vials. Right? Abandoning electronic health record.
And this had significant
impair impact on health care efficiency
(33:15):
and ethical dilemmas arose for physicians trying to
balance this global crisis with individual patient care.
So
the resulting themes we actually,
had analyzed
across these worlds in which was much, much
more detailed than what I just showed you
with another large language model. And it identified
benefits,
(33:36):
such as streamlining health care systems, accelerating research,
enhancing personalized learning,
but also risks, misinformation,
loss of privacy, the erosion of human expertise.
So our exercise here resulted in 4 distinct,
but, I mean, admittedly exaggerated,
right, futures.
But here's the thing. Like, our this this
(33:57):
wasn't an attempt to predict anything. Right? We
were just trying to provoke thought and challenge
ourselves to consider the many different ways that
AI might be able to transform,
medical education.
And ultimately,
this this process that we went through helped
us,
create a focused call to action.
And this
(34:17):
was able to urge other medical educate people
in medical education to develop ethical frameworks,
foster collaboration,
and continuously
assess AI's impact on an ongoing basis.
And we had already see all of this
playing out. It wasn't just us. There's been
a lot of papers not exactly like this,
but these kind of call to actions of
(34:38):
how how we can navigate.
And over the past year,
we've seen a lot of examples of AI
being integrated into medical education.
And so for example,
the double AMC has has,
started to form
these, taskers. I'm actually part of 1. I'm
part of the,
(34:58):
technology advancement committee for selection. Right? And we're
writing guidelines on how institutions
can ethically and responsibly integrate
AI into the selection process, such as,
medical school acceptance, the match and beyond.
Also, we're seeing examples
in training already
(35:18):
in ways that amplify
it rather than diminishing it.
And I actually got to take a really
deep dive into these type of things, like
how is it being integrated into medical education
now,
when I was invited to contribute to a
supplemental edition of academic medicine
called the next era
of assessment
(35:39):
using prediction or precision education to center on
equitable
care of patients. And this just came out.
So in this paper, we were asked to
kind of reimagine medical education assessment through the
lens of AI,
and specifically
focus on precision medical education, which is what
AMA is really, on about right now.
(35:59):
And so we we use this to explore,
how current research in AI, especially in machine
learning and deep learning
could be used to augment this model and
tackle
inherent limitations
in traditional
assessment
methods.
A lot of them were really
powerful, and so I picked a few to
(36:21):
share with you right now.
So first, we looked at the category of
proactive
data collection because collecting comprehensive longitudinal data on
learners
is really important, both for competent competency assessment,
but also
implementing a vision of precision education.
(36:41):
But it's been really hard historically, right, because
lack of resources,
lack of faculty, and also lack of time.
So at NYU, Verity Shea and her colleagues,
broke this paper, which is super cool, and
it describes the development of a high performing
machine learning model. And what this thing does
is it classifies
(37:03):
residency
admissions notes
into,
different levels of quality. So how good is
it? Right? And it does this within the
clinical environment. So this study actually introduces a
scalable and reasonably objective way to assess
and provide feedback
on clinical reasoning documentation,
(37:23):
a task that has been historically
really,
hard to do because it's time consuming and
also really subjective.
It paves the way for a more personalized
and data driven approach to medical education.
And so here we see a step forward
where technology and pedagogy are are converging.
(37:46):
The next one was,
we looked at predictive outcomes. And in in
the paper, there's tons of examples. I'm just
picking out really cool ones to share with
you right now. They're all cool. But,
so so here,
we wanted to take a look in to,
how trainee performance can,
impact patient outcomes. Right? So
looking at that that relationship,
(38:07):
because that's one of the goals in in
medical education is to produce competent, compassionate physicians.
This is hard to do
in today's complex team based care environment because,
you know, attribution,
contribution
is a hard thing to tease out in
individual performance. And then how do you make
(38:27):
decisions about that?
So in this example,
this this paper was published in Nature
and it describes the, an approach to assessing
surgical performance
by analyzing
specific gestures during a nerve sparing robot assisted
radical
prostatectomy.
So how it does this is it breaks
(38:48):
down the surgery into discrete gestures.
And what they discovered was that certain gestures
correlated with improved patient outcomes.
So utilizing machine learning modules,
the gestures were then that that were seen
to correlate were then used to predict patient,
outcomes
more accurately than the traditional clinical features that
(39:10):
had been being used.
And what's really interesting to me is that
there's a growing evidence linking technical skills
to patient outcomes.
And this has led certifying bodies such as
the American Board of Surgery to explore the
value
of like, video based assessment,
as an adjunct to the existing mechanisms
(39:32):
for board certification.
Another one we looked at was bias in
assessment.
So,
you know, while AI may help solve a
lot of the challenges in assessment, we also
have to be,
vigilant about bias. Right? Because we know that
bias
is in textual
(39:52):
data. And we know that this these models
are trained on vast amounts of text textual
data that is created by humans and humans
are biased.
And so we need to be careful because
this could lead to skewed interpretations
or outcomes
when the data that's used to train the
AI models
could then lead to perpetuate
(40:13):
stereotypes,
discriminations, or other forms of unfair treatment.
And,
something that's kinda neat though, is that recent
studies have actually been used to detect AI
to mitigate bias like this 2023
study.
It describes,
the development
of a comprehensive and robust framework called NBIAS
(40:36):
that's able to identify
words and phrases in text that may be
biased. And so you can imagine running text
through something like this before it's chosen
to, train an AI.
Some of our work currently is actually, using
this, and we are developing
a,
a biased
(40:56):
AI agent. And this agent is going to
be sitting outside of,
our database
of narrative assessments of medical students in the
clinical year. And it will, in real time,
read those narrative assessments
and flag potential,
statements that may be biased for CCCs
to review.
(41:19):
And so
lastly,
I wanna talk a little bit about personalized
analytics and precision medical education.
This is something that I am very passionate
about. So individualizing
the educational journey
to match a learner's needs. It's a it's
a component of
many, many, many educational
theories. Right? And it's the core tenant of
(41:41):
precision medical education.
But the prevailing
paradigm
in medical education
operates under a one size fits all approach.
And there's been tons of discussion,
in the literature about the potential
that AI has in solving this issue.
(42:01):
However,
most
of the projected
benefits and challenges are just speculative. Right? They're
perspective pieces.
There's not a lot of actual studies out
there, and the ones that do exist are
at a very localized level. Like, somebody, like,
has Chat GPT on their computer, and they,
made some clinical scenarios and,
printed them out and then somebody used them.
(42:22):
Right? They're they're not,
like, using these platforms at a scale level.
They're not scaling them to large number of
learners or across institutions.
So I'd like to end by sharing some
of the work that we are doing
to address this gap.
So, again, this this medical education experience fails
(42:43):
often to,
address the diverse
learning needs of trainees.
This results in hidden gaps and learning,
which can compromise readiness for residency,
future practice, and ultimately
impact the quality
of patient care.
And this isn't a new phenomenon. Right? So
so,
(43:03):
a educational
philosopher
and psychologist,
Benjamin Bloom,
in the 19 eighties demonstrated that 1 on
1 tutoring could actually augment
learner performance by 2 standard deviations or 2
Sigma,
making
a below average student
above average
and an average student exceptional.
(43:25):
However, offering
1 on 1
experiences
to every single student has been historically
impractical
due to resource constraints, faculty availability, and financial
barriers. Even if we had piles of money
laying around, there's just not enough faculty or
time to accomplish this.
(43:49):
So,
over the last couple of decades, though, there's
been an emergence and a push,
toward
individualized learning. And this has resulted in a
bunch of different theories, like the master adapted
learning, learner,
competency based medical education, and now precision medical
(44:10):
education.
So there's a lot of calls to action,
to address these barriers and challenges,
but
not much headway on a scalable solution because
these are all really great ideas, but
very challenging
to actualize.
So in 2023,
we developed 2 Sigma, and 2 Sigma is,
(44:33):
the the this is, the prototype version that
I'm showing you
here, and this was developed in an effort
to start to address this need. Two Sigma
is a generative AI platform, and it personalizes,
learning in medical education with the aim of
actually advancing precision medical education.
(44:53):
It utilizes
intelligent algorithms for natural human computer interaction.
And this particular one did it through text
to text, but our, more advanced version does
voice to text, and we even have some
other multimodal,
processes right now.
This specific prototype focused on the generation of
clinical
scenarios. So it was designed to simulate real
(45:15):
world interaction and foster clinical decision making skills.
Each session began with the AI introducing
a patient case with observable details.
This was the goal was to mirror real
life clinical situations
where additional information had to be sought out
by the student.
The students then interacted with this AI as
(45:36):
if it was a real patient making decisions,
requesting actions like vitals, IV placement, diagnostic tests.
The AI would respond conversationally as the patient.
If the student talked to as the patient,
it also would
return results
of requested actions or tests.
Basically, it continuously
challenged the student to diagnose based on evolving
(45:56):
information.
Most cases
ended upon the,
the identification of a diagnosis
with a couple of cases,
continued for management practice.
Following each of the interactions, the AI,
delivers comprehensive
feedback on various competencies,
including
diagnostic
(46:17):
accuracy, decision making,
efficiency,
cost effectiveness,
etcetera.
And given that the pre clerkship students at
the University of Cincinnati often have very limited
opportunities for real world clinical interactions,
they were identified as the ideal candidates for
(46:38):
piloting
the 2 Sigma platform.
So following,
rigorous,
development and AI training, we deployed it to
nearly 200 second year medical students for testing.
And this was funded by the American Medical
Association.
And in just under
or sorry. Just over a month's time, the
students generated
(46:59):
almost 2,000 unique sessions
lasting an average of 16 minutes each.
And this was exciting because it established proof
of context for leveraging generative AI to create
novel experiences with tutoring to enhance medical education.
We were excited,
because now we proved that we could actually
(47:21):
scale this.
And
one one,
anecdote that I like to to to tell,
though, we clearly have a lot of work
to do to to to see if this
pans out was,
my my colleague who did this with me,
doctor Matt Kelleher,
at the end of 2nd year, all of
the students have to take, this in person
(47:42):
OSCE.
And
the results are reviewed, like, in real time,
1 on 1 with the students.
And Matt,
you know, does this. He he's the director,
and he, you know, kept noticing, like, oddly,
in this year, unlike any of the other
years that he had been doing this for
a long time,
students were really hesitant to,
(48:05):
order too many labs or tests. They were
very concerned
about cost effectiveness,
and you couldn't figure out why. And then
finally, at the end,
of all of these interviews, there was, like,
1 or 2 last students. And finally, one
of the students said, well, yeah. Like,
I was really I didn't wanna order too
many tests because 2 sigma kept giving me
feedback on how, like, when I was ordering
(48:26):
extraneous tests, like, that wasn't cost effective. So
I've been trying to, like, tone it down.
Not proof, not even correlation,
but interesting nonetheless.
So we believe, yes, that this initial deployment
represents a significant step forward. We've got a
lot of work to do. And,
we're in kind of the next phase where
(48:47):
we are scaling across the
across the, curriculum of undergraduate medication, medical education,
and going into residency and even fellowship. But
we have work to do in act understanding
accuracy and reliability,
case demographics,
and safety.
So what we've been doing is,
(49:07):
we've expanded the module beyond just the clinical
scenarios, which I explained, but we have question
banks now,
that generate NVMe style questions.
We have a tutoring function, patient presentation skills.
I talked a little bit about our bias
agent. We've made something called the the supervisor,
which will go in and actually pause these
things,
at certain points and then
(49:30):
kind of query the student about their their
their clinical decision making.
We're actually creating a coach, a reflective coach
that will learn all about the student in
real time,
understand all of their assessments,
and be able to help them debrief in
a compassionate way about their, interactions with patients.
So we have a lot of these different,
(49:50):
modules that are, that are currently in process.
And we're actually trying to scale this in
a way that we can create what's called
an agentic AI system. So this is something
that is becoming very popular
in AI circles. And this paper came out,
from open AI
about the development of agentic system. So an
(50:11):
agentic system is some a system that integrates
AI and can be trusted,
under certain circumstances
to carry out actions
completely autonomously without the supervision
of a human.
So why is all this important?
This is important because
(50:32):
we believe
that
this type of AI has the can be
a solution to actualizing precision
medical education in a meaningful and scalable manner.
By doing this, we can empower trainees and
students,
to, take a hold of their own learning.
And by highlighting areas of strengths,
(50:53):
opportunities in a safe way though that actually
aligns with growth objectives.
You can offer every learner
personalized
learning experiences,
allowing them to reach
their full potential.
We also believe this has implications for the
system level.
We can improve medical education,
aligning assessments with growth, providing new levels of
(51:15):
understanding about the development of complex skills like
clinical reasoning,
using more relevant data sources like conversations instead
of just, you know,
multiple choice tests.
And in turn, these advancements
could positively impact transit transition points
across the medical education continuum,
reducing the risk of learning plateaus, making handoffs
(51:37):
more transparent.
And finally, we believe it has the potential
to promote equity.
The AI doesn't know the background of a
student. The AI is actually a lot cheaper
to run than to buy something like UWorld
if we can get it to generate questions.
And beyond that,
you know, we could use it to detect
potential bias in narrative assessment.
(52:01):
And finally,
we believe that, you know,
AI could be the tipping point that actualizes
precision medical education,
ensuring that every physician is exceptional in improving
health care outcomes for everybody.
Thank you.