All Episodes

March 20, 2025 25 mins

In this episode, Tom Varghese, MD, FACS, is joined by Haytham Kaafarani, MD, MPH, FACS, and Vahe Panossian, MD, from the Department of Surgery, Massachusetts General Hospital and Harvard Medical School. They discuss the recent article by Drs Kaafarani and Panossian, “Validation of Artificial Intelligence-Based POTTER Calculator in Emergency General Surgery Patients Undergoing Laparotomy: Prospective, Bi-Institutional Study.” This study found that POTTER accurately predicts mortality and postoperative complication, and the superior accuracy, user-friendliness, and interpretability of POTTER make it a useful bedside tool for preoperative counseling. 

 

Disclosure Information: Drs Varghese and Panossian have nothing to disclose. Dr Kaafarani receives honoraria payments from UpToDate. The POTTER calculator is available online for free, and Dr Kaafarani has not been compensated for the development or ongoing use of the calculator. 

To earn 0.25 AMA PRA Category 1 Credits™ for this episode of the JACS Operative Word Podcast, click here to register for the course and complete the evaluation. Listeners can earn CME credit for this podcast for up to 2 years after the original air date.

Learn more about the Journal of the American College of Surgeons, a monthly peer-reviewed journal publishing original contributions on all aspects of surgery, including scientific articles, collective reviews, experimental investigations, and more.

#JACSOperativeWord

Mark as Played
Transcript

Episode Transcript

Available transcripts are automatically generated. Complete accuracy is not guaranteed.
(00:03):
You are listening to The Operative Word,
a podcast brought to you by the Journalof the American College of Surgeons.
I'm Dr Tom Varghese,and throughout the series,
Dr Lillian Erdahl and I will speak with recentlypublished authors about the motivation
behind their latest researchand the clinical implications
it has for the practicing surgeon.
The opinions expressed in this podcast

(00:24):
are those of the participants,and not necessarily
that of the American College of Surgeons.
Hello, loyal listeners,
welcome to another episodeof The Operative Word,
the podcast of the Journalof the American College of Surgeons.
My name is Tom Varghese, and I'm the hostof this particular episode,
and I am unbelievably honoredand thrilled to be joined

(00:47):
by two pioneersin the world of surgical research.
First, Dr Haytham Kaafarani,who is a professor of surgery
at Harvard, as well as, Dr Vahe Panossian,who is one of his research associates.
I will turn things over to themfor them to formally introduce themselves
as well as tell us about any disclosuresfor this episode?

(01:10):
Dr Kaafarani, let's start with you.
Go ahead.
Well, Dr Varghese, first, I'm thrilled.
I'm glad. I'm honored to be with you.
You're a leader.
You're somebody I look up to.
Briefly, I'm a professor of surgeryat Harvard Medical School.
I'm the trauma medical directorand the hospital director of safety
and quality at Mass General Hospital.
And I'm a trauma surgeon by background.

(01:30):
Very excited to be here.
I don't have any specific disclosuresrelated to today's talk,
but I usually declare three things
that I do contribute to UpToDate,and I get honorarium from that.
But this is unrelated to this,
that I served at the national leadershiprole at the Joint Commission.
And this work is unrelated to my roleat the Joint Commission before.

(01:52):
And I do have research grants.
None of them are related to this work.
Thanks, Dr Kaafarani.
Dr Panossian,go ahead and introduce yourself.
Thank you, Dr Varghese.
I’m Vahe.
I'm a postdoc research fellow at MGH trauma.
I have no disclosures.
I'm very excited to be here today.
Beautiful. Well,thank you both for joining us.
The article we're going to be discussingtoday was published

(02:15):
in the October issue of JACS.
And the article specifically
is the “Validationof Artificial Intelligence-Based POTTER
Calculator in Emergency General SurgeryPatients Undergoing Laparotomy:
A Prospective,Bi-Institutional Study.”
Dr Panossian is the first
author of this article, and Dr Kaafaraniis the senior author of this article.

(02:36):
And, they did this on behalf of the POTTERValidation Group.
Dr Kaafarani, let's start with you.
Tell us what the heck is POTTER?
Let's start there firstand probably talk about AI.
Before we go deep dive into this article.
Go ahead, Dr Kaafarani.
Yeah, absolutely happy to.
You know, when I I'm justgonna kind of start by prefacing,

(02:58):
I chose the name POTTER,which is an abbreviation
of, of what it stands forand the methodology we used in
AI to do this, this,but came from in 2017, my,
my daughter was obsessed withHarry Potter, and it was always on my mind
and I didn't realize itcompletely backfired on me

(03:18):
because it's an applicationthat anybody can download for free,
and I get zero moneyout of on androids or iPhone platforms.
The problem if you search POTTER,it will never show up.
There's so many other POTTER applications there,
so you got to put “POTTER calculator”for it to show up.
But what it is in a nutshell,it started as an idea,

(03:40):
back in 2016,2017 very early days of AI. I
was working on risk prediction, I wasusing the classical methods to do that.
And I was on a dinner with a friendwho is a professor of AI at MIT,
and we were discussing this and he gothe got he says, I got methods that will
outperform whatever you do in a heartbeatand became a challenge.

(04:03):
And then it became a project collaborationbetween Harvard and MIT.
And what it is, is a in a nutshell, POTTERis we used a national database,
actually, the National Surgical QualityImprovement, data from the entire country.
And, and we,we trained artificial intelligence
called optimal classification trees,similar to how decision trees are made.

(04:27):
But think about AI.
So it's a reiteration, continuousreiteration with branching points.
And we asked the question
is the data from the entire countryfor emergency surgery?
How can we predict in a
nonlinear and interactive fashion,using AI,
all the different relationshipsbetween variables to try
to predict outcome ahead of timebefore the patients have surgery?

(04:50):
In other words, the concept behind
it is really, really interestingand I had to learn it myself.
The concept is the presenceor absence of certain variables
impact how much another variable impacts outcome.
Let me give a very, very simple example.
It's a simplification that, Tom, sure.

(05:10):
The question is as follows.
If I have one patientthat I'm about to do a colectomy on
and I say this is the risk of,
complications, is xif they have hypertension,
then all the additional risk that comesto this patient from the average person
without hypertension,it's just due to the hypertension.
Right.

(05:31):
Now imagine that same patientnow has liver cirrhosis and hypertension.
You canimagine that the relative contribution
of hypertension to the complicationsbecomes minuscule
because liver cirrhosistakes over a lot of the risk.
Now this is too valuable.
Imagine the same concept blown upmultiple variables at the same time.

(05:53):
And you can see it becomes very difficultfor the human mind to just keep track
of all of these interactionsbetween variables.
So that's what AI does.
So we createdwe did this project and we created this
calculator pretty muchthat you can download on your phone.
And it's asked questions.
And based on your answer,
it takes on a different branchingpoint of a different variable

(06:13):
and just keeps going until it predictsthe risk of mortality
and the risk of total complication,individual complications for patients.
So that'swhat POTTER is in a nutshell.
Yeah. No.
So from the,the inspiration from your daughter
to Potter, Harry Potter to that beingfound on Google search, that's fantastic.
Haytham. That's great.

(06:33):
You gave us that perspective.
But yeah, no, this is fascinating.
And I really applaudyou guys for being pioneers in this space.
I mean, obviouslynow that ChatGPT and other
AI devices are all aroundus, it's become more part of our normal
day-to-day vocabulary.
But the fact that you stumbleupon this years ahead of all of us,
I applaud you for that.

(06:55):
Let's deep dive into the article.
I mean,so people have heard about the why.
So this essentially was another validationstudy that you were doing
with a different population.
Dr Panossian let's walk us,walk us through the study itself.
So from the article, obviously,the study design was patients undergoing,
an emergency exploratory laparotomyfor non-trauma indications

(07:17):
at two medical centersand was from between June 2020 and March
2022 were included.
Talk to us about how you were ableto structure, the, analysis
and then whatthe findings you found from the study.
Right.
So the main reason we did this study,first of all, was
we had some retrospectivevalidations of POTTER, but

(07:38):
what was remainingwas actually having a prospective,
data collectionand validating it in real life.
So, as you said, we included patientswho underwent emergency
ex lap for non-traumatic indicationsat two centers here in Boston.
And we included around 260 patients.
And we gathered all the variables thatPOTTER needs to come up with a prediction.

(08:00):
And we also collected the outcomesthat the patient had in real life.
Which were 30-day mortality, septic shock,pneumonia,
prolonged mechanical ventilationand bleeding requiring transfusion.
And we put all of the data in POTTER
and checked what was the POTTER,prediction of those outcomes.

(08:20):
And then we use the C statistics
or the area under the curve to calculatewhat was POTTER’s
performanceand predicting the real-life outcomes.
And what we found was,
it confirmed our biasesthat POTTER performed excellently.
Mortality had a C-statistic of 0.90.

(08:42):
And for the other outcomesit ranged between 0.80 and 0.90.
That's incredible.
Talk to us about the logistics.
I mean, I want to take a step back.
Because obviouslyto do something like this
when you're doing itas a prospective study,
you have to do a workflow integrationthat doesn't interfere
with your day-to-day work.

(09:03):
Like, talk to us about how you setthe study up to be able to run this.
Like, I'm fascinated by thisbecause I'm sure our listeners,
probablya lot of them are also thinking about, oh,
AI is out there,what the how the heck do we do this?
But talk to us about how, Dr Panossian,how you went about workflow.
It should toso to ensure that we had a quality
data collection in a prospective manner,

(09:25):
we included all the team membersin the trauma research program,
and we also screened day to day,who is going into emergency ex lap,
and identified them and checkedwhat their preoperative variables
for their labs, collected all of those,and followed them up
for 30 days, toso have their in-hospital outcomes.
So essentially you were shadowingthe teams at all time.

(09:49):
I mean, you also have a surveillancenetwork out there is like to perform
a study like this. Correct? Very much.
Yes. I'm gonna say, you know, Vaheand the rest of the team
members were joining the sign uptrials for,
the trauma teams in two hospitalsfor two years.
Wow.
I just, I mean, butI think that the positive about this is I,

(10:10):
this is where, like,I was drawn to the study
and, you know, as a disclosure, of course,Haytham and I have been colleagues
for many, many years.
And so whenever a group of studycomes from this group, immediately
my eyes are drawn to this,but the fascinating aspect
I thought about is,is that I love this concept of,
I like to call them, you know,there's a term in economics called

(10:31):
field experiments.
You know, and what that is, is realworld studies, real world applications.
Obviously, it's cumbersometo set up and everything, but this is kind
of, to me, a great example of essentiallya surgical field experiment.
Wouldn't you say?
I would say that that was the,
I mean, we've had enoughwe had enough data to be confident

(10:52):
with the performance of POTTERfrom the like national database.
But then we were faced bywe were faced, Tom,
with two questionswhen this was presented in other forums,
whether big surgical meetingsor smaller forums.
And the two questions was,
how does that compare to the surgeonsgestalt of predicting risk?

(11:14):
And the second question, well,this is nice,
but you know, you did it of aof a database, blah blah blah.
How does it perform in real lifewhen the rubber hits the road?
So the we did the,you know, the last three years.
These are the two areaswe focused on, which
we're happy to talk about the gestalt one,I know that's also a lot of interest,
but the question isif we actually do the prediction upfront,

(11:37):
like we're not doing a retrospective way,we just do the prediction upfront.
We follow them for 30 daysand see what happens.
Can we find the same things? And we did.
Yeah.
So it's a field experiment as you said.Yeah.
Yeah.
It's I mean like I said I mean to methat was where my mind was immediately
drawn as like this is a phenomenalopportunity because, you know,
all of us have been attacked by, you know,using large retrospective databases.

(11:59):
And of course, you get enough
variables and you're potentially ablestatistically to prove anything you want.
But that but the thing I really love aboutthis study is that prospective
nature, Dr Panossian,a couple of questions to you.
So, fromthe I'm just going to read the sentence,
from the paper, andI just want you to react back to, this.
You said that, in the paper,it said “the primary advantage

(12:21):
of using a prospective designin this study was to validate the model
in a real-world setting with datathe model had never encountered before.
This prospective approach also ensuresthe model's robustness against...
you called this “‘data drift’,
which occurs when a model depends on variableswith properties that may change over time.
Can you reflect back on that?

(12:42):
I mean, I think that that's it'sprobably a term that a lot of people
aren't familiar with, “data drift,”but kind of talk to us about
why it's critically necessary to do
exactly what your group did by doing thisprospective study.
Go ahead. Right.
So the first reason, we’redealing with the patients
that we collected for this studyand the actual design of POTTER.

(13:05):
And there was essentialto make POTTER objective on the data
that it’s never seen before.
And it had to come up with the predictionsin real life
and compare itwith what it was designed for
initially, in the ACS NSQIP database.
And, since the POTTERthe POTTER algorithm was designed,

(13:27):
we have made some iterations,updating it with the ACS NSQIP data.
But those data variables could change overtime.
Sort of the patient populationand in a regional manner.
Each region has a differentcharacteristic of patients.
So really we're interested inknowing how POTTER would react in

(13:49):
an academic center,a referral center with where
baseline the patients come in more sicker,more critical.
And I'm also curious on DrKaafarani and his thoughts, on this.
And the data drifts, that happens.
Yes, absolutely.
I mean, the data,the data drift definition, if you want,
Tom, like in a very statisticalpurest way,

(14:12):
is when the statistical properties of dataare changing over time,
which can make the data models,if you want, less accurate,
it can occuralso in AI, in machine learning and
but it also happensin the classical methods.
We do this.But the idea is the following:
when POTTER.
So back in back in the first studyto the initial project

(14:35):
with the way we did, we took, you know,unlike this,
you know, this paperwe have 361 prospective patients.
We actually had about 3to 4 million patients.
Out of them, about 500,000were emergency surgery patients.
But what we did,we divided that data set into what
we call the derivation and the validation.

(14:55):
Right.
So that was the main validation, meaningwe trained it.
We trained it on about 80% of the data.
And then we said, okay, now you havethe algorithms, you trained on it.
Can you predict the other 20%?
But there's an inherent bias in that,meaning, well,
this is how the dataset itselfwas created.
So, if we trained it on it, is goingto predict very well on it too, right?

(15:19):
Yeah.
Not what, what Vahe was trying to say,you know, with time
and with a changeof how the nature of the data set, we said
now we're going to createa completely different data set.
Has nothing to do with NSQIP.
And we're gonna seeif POTTER still performs to its promise.
And that's what it did,even though the number of patients, much
less, took us two years in two centers,361 patients.

(15:43):
But it proved that on a dataset,
it's never been trainedon, it's still performing well.
Yeah. You're right.
And I think that that's the fascinatingaspect.
I think as people are tryingto wrap their minds about AI applications
that, you know,that's the unbelievable potential,
you know, ratherthan doing the traditional way of,

(16:03):
you know, a retrospective analysis,you know, we do a regression
analysis, and then we have just a staticrisk calculator.
I mean, I think that this correct meif I'm wrong, the AI application,
this is what it takes to the next leadis it's a it's almost adaptive.
That is like whateverreal-world information is coming in.
The risk assessment is performingrobustly.

(16:24):
Correct?
You got it exactly right.
It's adaptive. It's reiterative.
And POTTERhas one more additional characteristic
which is really important, Tom,
and if you’ll allow memaybe to point that out.
Yeah.
Go for it.
It's reiterative adaptive, meaning it'scontinuously learning from its mistake.
That's what the adaptiveit's like the more mistakes it sees

(16:46):
and it knows about,it will keep correcting.
That's 1.
2, the reiterative,which is really important.
Meaning I mean you know, Iyou can download the app and see that
when you answer a questionthat really analyzes
the entire database that it was trained onto give you what the next question is.
So if you answer a question,let's say is the patient

(17:06):
in the hospital, is the patient intubated?
Not intubated?
The yes or no dictates what the next setof trees, is going to happen.
And then the next questionis, does is the patient,
you know, have prior history of something?
If the answer is yes,it goes into totally reiterate,
just reanalyze this again and does it.
That's why the third characteristic,which is really important,

(17:27):
and I do want to take the opportunityto point it out
because not all AI isthis way is POTTER is transparent
and that is a very, very big deal. Why?
Well, because a lot of the AI methodsare what we call black box, meaning,
but almost like a religion,
you just say, I'mgoing to give you the data,
you do your magicand you give me the output

(17:49):
and you just need to believe in itright?
The POTTER does not do that.
POTTER,because of the optimal classification
trees methods,You can follow the logic of that AI.
Why is this important?
Because, AI, one of the thingsthat a lot of people are pointing out day
in, day out, AI has the risk

(18:10):
and the potential of bias.
If your database is biasedand has disparities,
let's say if the care in the countrythat we provide to people, minorities,
whether African-Americans,whether Hispanics, you name it,
that,
that that if you trainon a biased database, you can incorporate

(18:30):
that bias into your algorithmwithout ever knowing.
Correct.
What this is.
I'm big advocate of transparentAI because you don't want this to happen,
because you will pretty much consolidate
bias and algorithms that we are tryingto tell people to how to do care.

(18:50):
So transparency is really an importantcharacteristic.
POTTER does that.You will follow its logic.
You will see why it's doingthat tells you.
The reason I'm doing this is because
those are the characteristics I tookinto consideration, one after the other.
And that's the number of patients
who died out of the total numberwho had the same characteristics.
And this is why I'm giving you thisprediction so you can follow its logic.

(19:11):
And I think that's really,really important.
So we don't we don't create problems.
We don't run down the line.
Amazing amazing amazing perspective.
Well, towards the tail end of thisinterview, let's focus on, two things.
The first thing is I want to read,two sentences from your paper.
And then really talking about what's next.
So from the introduction,from your manuscript,

(19:33):
you started out by saying,“Postoperative complications occur in 15%
of the 19 million surgeries performedyearly in the United States, resulting
in significant morbidity and mortalityand additional health care costs exceeding
$31 billion per year.” I mean, I just--
-- massive amount in terms of impact,
“Predicting postoperative outcomesis critical for appropriate counseling.”

(19:56):
I mean, that's the reasonwhy we did, this study,
“as well as resource allocationand benchmarking of quality of care.”
And the question I really haveis about that benchmarking,
because the conclusion of this manuscriptwere really was eye opening
when you said thatthis is the first prospective study,
obviously, showing that AI-poweredsurgical risk calculator POTTER

(20:16):
accurately predictsthe postoperative outcomes.
But really, it's more about,
you know, this smartphone-basedapplication,
like, how do we make it widely available?
Right.
Like, so the question really comesis, what's next for your research group?
Are you planning on doing this on other,you know, prospective studies or,

(20:38):
maybe another question to DrKaafarani is like,
what's the health care policy implicationsfor something like this?
I'll start with Dr Panossian first,and then we'll go to Dr Kaafarani.
Go ahead Vahe.
From my end,I think, the next two steps for POTTER
would be updating itwith new data from the ACS NSQIP,
making it more robust,with the more recent data.

(20:59):
And also,
having a comparisonof non-operative patients.
So the very nature of ACS NSQIP is allpatients had surgery, right?
So when we're interacting with POTTER,we're interacting
with a patient samplewho actually had the surgery.
So we cannot really decideshould I do the ex lap or not.

(21:20):
If the patient asks, what are my chancesof dying if I don't do the ex lap?
Well, I don't know.
POTTER cannot answer that.
So, I think in its current version,it can't answer.
In its current version, it's kind of,but I think it would be interesting.
And, validating it in a patient
population that didn't have surgery,who were managed non-operatively.

(21:40):
And seeing how it performs.
That's a it's incredible.
Dr Kaafarani. Yeah.
I mean Vahe is very insightful in thisI mean that's one of the downsides
actually of POTTER is it's been onlytrained on patients we operated on.
So when people tried to use it to sayshould we operate or should we not
operate it, it's reallywas not designed in that fashion.

(22:05):
So I my advice usually is only use POTTERif you already decided to operate,
you know, because like for example,the highest prediction
of mortality in POTTER is 73%.
And peoplesometimes try to use it for patients
who are clearly we should operate onbecause they're gonna die either way.
And they're like, what, 73%?

(22:26):
This is a 95% mortality rate.
Well, because POTTER does not becausethey're not trained on patients,
we never operated onbecause they're too high risk.
So that's one.
But to go back to your question,where do I see this big picture of health policy
I mean, I have a small, small,very tiny dream and it's very convoluted.

(22:46):
What I’d like us to seeis the one problem you're
dealing with on a very daily basisin your own leadership roles.
Tom, which is you bring data to your teams
and they, you say, well,your risk of infection is much higher than
your compatriot or the national average,you know, can you tell me what's going on?
And everybody's response is, well,my patients are different, right?

(23:10):
My patients are high-risk.
I get and you know what?They're not completely wrong.
I mean, we do know the patients
who take the surgeonswho take the high-risk patients.
And it's not fair that if we, you know,hold them to the same standards
because it's that it's,almost disincentive for them
to take care of the sicker patientswho probably need the surgery the most.

(23:31):
So, the risk adjustment is alwaysat the problematic component.
And every time we try to measurequality at a very big level,
and I and we,
we do a pretty good jobwith a lot of the methods
we have, whether it's Vizientthrough administrative database.
You know, I don't think it's as good.
But NSQIP, for example, has a very robustrisk adjustment model.

(23:53):
But they're almost like they're
the Priuses of the car of, you know,the equivalent of the Priuses in the cars.
And can we get a Ferrari to do betterrisk adjustment.
And I think that's what AI can do.
We started dabbling, from a researchpoint of view, can you use AI to benchmark
to better risk adjust where you areactually taking into consideration

(24:15):
these not visible to the human eyeinteractions between all the variables.
And we we've published a paperthat you can conceptually do that.
There's no question.
The concept is we proved the concept.
But I think we're still inthe very early days
of using AI in optimalclassification trees for benchmarking.

(24:35):
That's what I'd like it to use.
I'd like I'd like us to useAI so that we can better
compare apples to appleswhen we're benchmarking quality of care
across hospitals, acrosssystems, and across individuals.
Well,
I can think of no better wayto wrap up this episode.
I mean, it's really a call to actionfor all of us.

(24:55):
Opportunities for all of usas surgeon leaders to engage.
And, optimistically,I think the future is very, very exciting.
Dr Kaafarani, Dr Panossian, thank you forjoining us today on The Operative Word.
The podcast of the Journalof the American College of Surgeons.
Thank you for listening

(25:15):
to the Journal of the American Collegeof Surgeons Operative Word Podcast.
If you enjoyed today's episode,spread the word on social media
by using the hashtag #JACSOperativeWord.
Subscribe to The Operative Wordwherever podcasts are available, or listen
on the American College of Surgeonswebsite at FACS.org/podcast.
Advertise With Us

Popular Podcasts

Dateline NBC

Dateline NBC

Current and classic episodes, featuring compelling true-crime mysteries, powerful documentaries and in-depth investigations. Follow now to get the latest episodes of Dateline NBC completely free, or subscribe to Dateline Premium for ad-free listening and exclusive bonus content: DatelinePremium.com

24/7 News: The Latest

24/7 News: The Latest

The latest news in 4 minutes updated every hour, every day.

Therapy Gecko

Therapy Gecko

An unlicensed lizard psychologist travels the universe talking to strangers about absolutely nothing. TO CALL THE GECKO: follow me on https://www.twitch.tv/lyleforever to get a notification for when I am taking calls. I am usually live Mondays, Wednesdays, and Fridays but lately a lot of other times too. I am a gecko.

Music, radio and podcasts, all free. Listen online or download the iHeart App.

Connect

© 2025 iHeartMedia, Inc.