Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
(00:02):
I went and I created 1,000 syntheticpatient histories, short ones.
I then went through, picked 200random ones, and myself decided, as
a human medical expert, who should beseen today, and who will have to be
waiting a week or two weeks to be seen.
And I did it, and I also weightedwhich were the easy ones, and
which were the hard ones to decide.
(00:24):
And then, I asked GPT-4.0, GeminiAdvanced, and Claude Sonnet 3.5
to decide themselves.
I measured two things, howconcordant they were with me, and how
consistent they were with themselves.
Because, you know, when you havea doctor, and he's, or she, is
accurate on average, but is wildlydifferent between different visits,
(00:45):
you're gonna be a little bit worried.
I then said, how can I move these models?
I gave them a great sample of otherpairs that it would not see, and
said, this is how I, the expert, chose.
Please use this as a template.
And some, like GPT-4, becamemore concordant with me, and
more focused, more consistent.
(01:06):
Others, like Claude, actually did gotworse, and actually more scattered.
Welcome to another episodeof NEJM AI Grand Rounds.
This was a really special one.
This is our first episode witha repeat guest, and it is our
(01:28):
Editor-in-Chief, Zak Kohane.
Andy, I don't know whereto start with this one.
This was a really, really funconversation that really touched
on many, many different things.
I think it's fair to say.
Yeah, I think that's right, Raj.
This feels just like a fun snapshot intowhat life as a postdoc was like for us.
Freewheeling conversation.
(01:49):
Maybe less scotch during our postdoc,but I don't know if even that's right.
I think too, I had the addeddisadvantage of being extremely
jet lagged during this one.
So, Zak really pinned me to the wall.
That's a lot of, that's a lotof, a lot of caveats, man.
That's a lot of caveats,but it's always fun.
We looked back on the year that wasat NEJM AI, lots of great papers.
(02:10):
And again, it's always fun tosit down with Zak and do these
end-of-the-year conversations.
I totally agree.
I think you said it right.
This was a good windowinto our postdoc time.
And honestly, it was a goodwindow into Zak's mentoring.
I think there were a couple timeswhere he really sort of recalibrated
us and, uh, you know, kind ofput us in our place a little bit.
And I think it was a lot of funand a really, just great conversation
(02:32):
overall about AI andmedicine, and life with Zak.
So, it was fantastic.
And I think this is going tobe one of those things that
I look forward to every year.
It's just such a fun thing to do.
Hopefully, listeners find it fun, too.
Totally agree.
The NEJM AI Grand Rounds podcastis brought to you by Microsoft,
Viz.ai, Lyric, and Elevance Health.
(02:56):
We thank them for their support.
And with that, we bring you ourconversation with Zak Kohane.
Alright, Zak Kohane, thanks forcoming back to AI Grand Rounds.
I think you're our first repeat guest.
Well, yes, you're sort ofobliged to, aren't you?
(03:16):
And this is in our contracts.
We had to have you on every year.
It's true.
And sadly, I still think you'rethe most famous part of NEJM AI.
This cannot stand.
But I'll keep on coming aslong as necessary to make sure
that the right thing happens.
Alright,
(03:36):
2025,
we'll dethrone NEJM AI Grand Rounds.
That is right.
So, Andy is right, Zak.
This is— Although you guys are goingto have a lot to recover from, from
how you were schooled by Larry Summers.
Oh, he dunked on me several times.
If you're going to get, you'regoing to get dunked on by someone
you want it to be Larry Summers.
It's an honor, actually.
It's an honor.
Yeah.
But you were dunked on.
(03:57):
I admit that.
Yes.
So, Zak, this is Andy's right.
This is the first episodeof the repeat guest.
So this is a question that is inspiredby our usual opener, but we are going
to ask to you for the first time asour first repeat guest: what major
weight updates have happened to ZakKohane's neural network since you were
last on AI Grand Rounds one year ago?
(04:18):
The major weight update.
It's pretty clear, actually.
It's that I no longer feel likewe, I mean, the three of us are
saying medicine is in deep trouble.
I'm hearing it now everywhere.
And not just from our large health caresystems who have large revenues
(04:43):
that are slowly increasing, butemployment costs that are skyrocketing.
I'm not only hearing it fromdissatisfied young trainees.
I'm hearing it from leaders at theNational Health Service in the U.K.
I'm hearing it in the, in France.
(05:05):
I'm hearing it in the United States.
And the question is not,will AI replace doctors?
It's, please, can AI help bridge the gap?
We don't have enough frontline doctors.
Yeah, I mean I feel this acutely justlike having been in the sidecar to
(05:28):
medicine for the last 10 years. Andso, when Kristen was in med school,
I was like AI is gonna come. And nowthere's like really like, please, like,
we need this faster. Like, we don't haveenough hands to be able to do our jobs.
I do think you're right and that hasreally happened. It feels like over the
last year. And I don't know if the faultlines have gotten bigger, or just there's
a bigger willingness to recognize that.
(05:50):
Like, I don't have it, orif it's just like, yeah.
So, so you know, in medicine, welike to talk about how a lot of our
organs have huge reserve capacity.
And so when you have your organ beingattacked, you don't notice it until
it cuts into your reserved capacity.
So, forgive my teachers for me gettingthis wrong, but I think something like
(06:11):
you survive on 20% of your lung capacity.
And you're breathing okay until then.
But then when you hit there,you're short of oxygen.
And you start suffering.
And I think that's just happened.
So, the fact that you can't get a primarycare doctor to see you at Mass General
Brigham makes the disease now manifest.
(06:33):
We're cut to the bone.
I think that's a goodactual transition point.
So, the first thing we want to startwith is actually a perspective that you
published pretty recently, maybe a couplemonths ago, that was in the New England
Journal of Medicine. And it's comparedwith what, and then, I really think it
is a very powerful essay about what theexisting state of the art is in medicine,
(06:57):
and where we have access, where we'recompletely lacking access, what the
problems are with our existing system, asbeing the relevant thing to compare AI to
and to compare new models, new algorithms with.
And so maybe we can start with that one.
Can you summarize, whatmotivated you to write that?
And then I'll tell you about somereactions I've heard from folks
(07:18):
both here and around the world.
Well, in some sense, I was preparedfor this feeling by an old phrase
from Seymour Papert, one of Minsky's colleagues, the LEGO
professor at MIT, who coined thisterm, the superhuman human paradox.
(07:41):
And this was probablyback in the 70s or 80s.
And the point he made was that, why arewe comparing AI to a super expert where
most of us operate in average mode?
So just making everybody.
as good as average, would be making50% of the population, if
(08:02):
it's normally distributed, better.
I appreciate the clarificationbetween the mean and median
for not normally distributed.
Since I have a stat nerd here, Irefuse to be corrected by a stat nerd.
I was about to well actually you.
So, I was prepped for that, andthen when it became evident to
(08:22):
me, and I think as you get older,
unfortunately, more and more of yourfriends end up in hospitals. And
therefore, you get more and more storiesfrom them how we are falling short.
At the same time, you have your colleagues who are in medicine
telling you that they do not havetime to think. That they do not
have time to discuss patients.
(08:44):
And you see that confluenceresulting in a lot of bad stories.
So, thinking in my role as a member of the editorial board here at NEJM AI,
I said, what should we really be asking in terms of helping the public?
Do we want to say this is better, betterthan the best doctor you could find?
(09:08):
Or is it as good as a reallygood doctor, and therefore maybe
better than half the doctors?
And that was the insight becauseon the one hand, there is a lot
of appropriate concern.
Errors, incompleteness, lack of commonsense knowledge, hallucinations.
(09:29):
But guess what?
Doctors, human beings, do those, too.
So, when the question is not an academicone, who performs best, but a social
one, how do we deliver the best care?
It seems more and more plausible that inthe absence of a primary care doctor who
is well slept, fully caffeinated, and notstressed, maybe we should have a PA or a
(09:55):
nurse practitioner, physician assistantor nurse practitioner assisted by the AI.
They can use, the human can use theircommon sense, their training, their
EQ, and they can be complemented bythe EQ of the, of their AI assistant.
Remember all things I forgot,weird zebras that may or may not
(10:16):
be relevant to the diagnosis.
I think that becomes the question.
And if we were always going to ask,can this beat all doctors, which
maybe one day AI will, we would notbe serving our society well today.
Is part of the lack of uptake ofthat kind of thing at all related to
like licensure and scope of practicefor other health care workers or are
(10:40):
there structural reasons why we can't?
Of course, there's so manystructural reasons and you've
heard me rant about this before.
You know, what if
your AI tells you not to do that MRI.
That's a few thousand dollarsthat the hospital's not
going to be able to bill for.
What if it's going to suggest a therapythat is more effective but less costly?
(11:05):
And so under a fee-for-servicesystem or even a non-for-fee-service
system, which is not tightlyinstrumented to look at outcomes.
Because you can still have singlepayer systems, which don't tightly
look at outcomes, and therefore, youhave doctors doing what they think
(11:25):
may be interesting or cool butmay not be the best for patients.
Actually, I'm reminded of ourcolleagues at Clalit in Israel, who
actually have to deal with that.
They have very thin budgets, and theyhave to take care of the population.
So, for them, they have to figure out
(11:45):
which patient to see next?
And that becomes actuallya queuing theory question.
Where do I get the maximum utilityfor seeing the next patient?
Guys, because I love you somuch, I'm going to completely
derail this conversation.
It wouldn't be Zakwithout a Zak tangent.
He's got notes.
I've got notes.
(12:05):
Amazing.
Before this meeting, I made notes.
I'm going to ask you guys, andI'll be nice to Andy by asking
him first, because, and you'llhave to come up with a second.
Andy, which paper did we publish inthe past year that you liked the best?
Uh, you're going to make mea blue screen of death here.
(12:27):
Let me think.
Yeah, but you see, I'm giving you a favorbecause then if you pick a good one,
he's going to make his job even harder.
And I get extra time to think.
That is true.
That's true.
Never say I wasn't fair.
Yeah.
Um.
Um, so there's a, there's a, uh. Thisis going to be the Pranav moment.
(12:49):
You go first, Raj.
Okay, so I will, I have an answer.
So, I, so it's veryhard to pick a favorite.
So I'm not, I'm notgoing to pick a favorite.
We're not going to pick favorites.
But I'll pick some good, some papersthat I really enjoyed because they
made me think about things differently.
Yes.
And I think will ignitevery interesting research
to come.
Yes.
Which we are also eagerto publish in NEJM AI.
(13:09):
Yes.
Um, so, I think one of the bigthemes has been using LLMs in very
creative ways and doing tasks thatwe thought LLMs would not be good at,
you know, even maybe two years ago.
And now we're seeing a robust setof evaluations showing that LLMs are
apparently quite good at several tasksthat were human tasks involved in
(13:33):
medical publishing itself,scientific publishing.
And so, one of these themes, which I thinkhas been highlighted by at least two of
our papers, if not more, was one on byJames Zhou's group that was published a
few months ago on using LLMs in review.
And they did a large experiment withNature family journals and then,
ICLR, one of the big machinelearning conferences, and looked at
(13:55):
human evaluations of the quality ofreviews generated by LLMs versus the
original human-based peer reviews.
And then we had a very recent paperthat's sort of on the same theme, but
from the different, you know, from adifferent angle, different player in
the system, which is authors themselves.
And this is from Roy Kishony's group.
I think this is the most recent issuepublished an interesting paper on
(14:16):
using LLMs to enable autonomous science.
And I think both James's paperand Roy's paper showed that I
think the abilities are already remarkable.
But there's still a lot of roomfor improvement and a lot of
room for interesting evaluations.
And so, I think this theme around LLMsto enhance review and to automate
(14:37):
aspects of review, I think is goingto be very, very interesting for
us to follow in the next year.
So, it was great to see those papers.
Glad they, they got submitted here.
And I think they'vealready had a splash, too.
You know, I think we, a lot of the mostmemorable things I have learned, uh, most
memorable, uh, moments from NEJM AI fromthe editorial standpoint was like debating
like what should be in AI and what shouldbe in the journal in the first place.
(14:59):
And I think I tend to anchoron, you know, the big, flashy,
multimodal kinds of things.
And we had lots of really interestingdiscussion about other papers.
That had target clinical questions, butyou were using more traditional methods.
And I think I moved on that.
I think that, when I like think back onlike a lot of those, let's say vigorous
debates, in the editorial room, thoseare the things that come to mind.
It's like, what counts as AI?
(15:20):
What doesn't count as AI?
So, if we settle it now, islogistic regression AI Andy beef?
Uh, no comment.
So, as we say in German,schwach, means weak.
So, um, let me— We're reallygetting jiu jitsu'd here.
(15:42):
We ask the questions, Zak.
We're going to go back toour script in a second, Zak.
Let me do the meme.
I would have.
Let me be the meme master.
I would have.
Look at me.
I am the captain now.
Yes.
So, let me tell you someof my thoughts on this.
This was perhaps the most technicallyuninteresting paper, but hugely
(16:05):
impactful out of group at Brown,where they created a consent that
was written for human beings.
And not for ethicists or lawyers.
And they deployed it andit was well accepted.
Very simple, great use.
(16:25):
So I really like that a lot.
We had them on the podcast too.
Rohaid Ali, Fatima Mirza.
Husband, wife, resident duowho authored that paper.
Power couple.
You know, impressive.
I also have enjoyed DavidBlumenthal's policy pieces.
And he really, I think, got the AIhubris because he started articulating
(16:50):
as a, as a measure of evaluation, notall these benchmarks that we have, but
literally put the AI into the clinic,watch it just like we watch human beings
and ask no less of them in full context,which I don't think is wrong at all.
I also liked, and I hope we getmore, patient facing applications.
(17:14):
So, there was a group out ofDuke and Apple, which did this
telemedicine autism evaluator.
I happen to like that one a lot,even though it had a problem that
they copped immediately, whichis they had a high prevalence.
Therefore, it was easy to get adecent positive predictive value.
But that has to be a big part of thefuture, which is where we do screening
(17:36):
and outreach through patients.
Alright.
Let me ask you another question.
What was your favorite medical AIpaper this past year that was not
in NEJM AI?
Okay.
So this one is not, I'mgoing to almost answer it.
(17:59):
So this is not from, I mean,there's a version of this that's in
2024, but I think the intellectualspark that's really ignited a very
interesting trend now was in 2023.
And this was a paper that waspublished in JAMA that I love.
It's a short research letter
by one of my closecolleagues, Adam Rodman.
And it's on, actuallyit takes NEJM content.
(18:20):
So, it is, uh, related to NEJM.
NEJM Clinical Pathological Conferences.
They took 70 of these verychallenging cases from NEJM.
This series that's called the CPCs,also called the Case Records of
the Massachusetts General Hospital.
And these are hard cases.
So there's not that many human baselines,but where there have been, expert
doctors get 20-30% of these rightacross all different areas of medicine.
(18:44):
And they did something prettysimple technically again, but their
evaluation was very interesting.
They just piped the cases in, free textinto the user interface of ChatGPT,
GPT-4 specifically, and then they hadphysicians score how good the model was
at producing the differential diagnosis.
And I think this really shockeda lot of people and now has
inspired a lot of research
(19:06):
around how good these modelsare generally in reasoning.
And this is both for diagnosis anddifferential diagnosis generation,
but also for very important things,arguably more important things, from
some perspectives around what's thenext test order, what's the next
step in management, what should thepatient be on that the patient's
not, currently on management itself.
That's actually a very nice answer,but of course, in typical us style,
(19:28):
I'm going to argue with you and saythat I think it's actually misled us.
Okay.
In the following way.
I'd love to hear this argument.
So, and it is actually part of ourSystem 1 versus System 2 running battle.
For those who don't know, System 1 isas elaborated by Kahneman, the fast
(19:51):
pattern recognizer that's not self-aware.
And System 2 is aware, plotting,deliberate, rational, possibly.
And the way I see it is this, I wasalways very good at these hard cases.
I am a zebra diagnostician extraordinaire,rare pediatric endocrine cases that
(20:17):
you don't even know how to spell.
But, I was multiplyembarrassed by my mentor, Dr.
Kriegler, who would hear me do abrilliant differential diagnosis.
They didn't ask me a simple question.
When was the IV out of the patient?
And that made me realize that Ijust didn't have the right facts.
(20:43):
All of us can do this System 2stuff that we all pat ourselves on
the back on because we're so proudof it because we are smart students
and smarty pants and have got intoour universities because of that.
But the true brilliance isgetting the real gestalt
what is going on.
Because when you see a patient. It'sthe acquisition of information,
(21:05):
not the reasoning over it.
And which is the right information.
We live in a blur of sensory overload.
And if I were a full alien, I wouldnot know which of these things
moving or not moving is important.
What's published in NEJM AI?
That's your signal.
There we go.
(21:25):
There we go.
System 1. System 1.
And so, my point is that sureenough, making harder, we talk
about benchmarks saturating.
And sure enough we'll make it harder forhumans and harder System 2-like questions.
But that's not going to answer thereal question is, if I go see the
doctor, is he going to understandand make the right diagnosis?
(21:47):
Because I'm not going to comewith, I present as a patient with
a five-year history, and I'll betalking about all sorts of things.
I gotta respond.
Go ahead.
Let me get it.
So, I basically fully agree withyou, but I still think that
paper unlocked this conversation.
And so, importance of the paper is nowbecause we are asking these questions and
(22:09):
not just you and me, but many, many peopleare saying, what is a good benchmark?
What is the doctor actually doing?
What is artificial?
And if you look at ourother benchmarks, right?
Multiple choice questionson standardized exams.
So, you're totally right.
What I think, maybe concisely, andthis is actually a debate we've been
having for some time, so is that thedoctor who's writing the case is doing
(22:31):
so much of the work for the model.
They're acquiring the information, they'resynthesizing it, they're summarizing it,
and they're putting the key bits in there.
And so I think we absolutelyneed better benchmarks.
But that being said, to your pointabout compared to what? Our goalposts
are being blasted into space now, right?
This is, I mean, CPC is being solved bya autocomplete word filler model, a GPT-4.
(22:55):
It's something that none of uspredicted would be on the horizon.
So, I think you're right.
We need better benchmarks.
But I also think that paper hasunlocked a lot of interesting
studies that we're now doing.
So, so let me hop in here and maybe hazardor like give a, uh— Moderate this! System
2 is also— I was, I was on the fencebetween two papers I was going to say.
One was Raj's follow-up paperreleased yesterday on Arxiv.
(23:17):
Yeah.
Ooh.
Doing oh one on this, where it goes
superhuman. System 2.
Yeah, System 2.
It's really kind of like System 3because the humans, the humans in this,
the humans in this compared to oh one aresignificantly worse than, than they are.
They are.
So, but I think your sameobjections, hold there.
So, there was another paper
that was released as a preprint from thesame Google group where they looked at
turn-by-turn conversations between Googlelarge language model. This is AMIE, right?
(23:42):
This is AMIE, yeah.
That was my choice.
So that's my choice.
I still got it.
So that was mine because itdoes, it's eliciting information.
Could people tell aboutAMIE or AMIE, whatever.
I think it's AMIE.
So, it is AMIE co-first—
It's not AMIE.
No, it's not AMIE.
The co-first author isa grad student of mine.
Yeah.
And so, he pronounces it AMIE.
So, I'm gonna go with AMIE.
It is AMIE and Vivek hascalled it AMIE into my face.
(24:04):
Um, but basically the setup is, isthat they have a large language model.
That it's the way the model istrained is interesting and that it
uses synthetic data to kind of liketrain itself on its own conversations.
And has a, has a critique agent.
Yeah, exactly.
That actually critiques the conversation.
It's, it's fascinating 'causeGoogle had enough money and it did.
Yeah.
Buy a lot of
medical dialogue.
But that was not enough.
And that was not enough.
Yeah.
(24:24):
They needed much, much more.
So yeah.
So there's a couple lessons here.
One is that the syntheticdata the model generated was
more valuable than real data.
So that in itself is like aninteresting, methodological thing.
And then the other is thatthey compared this model
to, on the other side of the screen wasan actor pretending to have a condition.
I always think of like a Kramerin cirrhosis from Seinfeld.
Right.
Um, but that's essentially what it is.
(24:44):
They interact via chat.
And so the large language model triesto diagnose the patient, and then they
actually have physicians who also tryand diagnose, the patient and it shows
that the LLM gets to the right answerfaster by asking the right questions.
And so, I think that gets to the,when was the IV taken out kind
of point that you're making Zak.
And so that, that would be my choice.
See.
That was my thing. Our audience can'tsee this, but Zak pre-registered
(25:07):
AMIE on his little notecard.
So I was between the 01 fromRaj and that one, and I think
you pushed me to that one.
And also what I loved about it.
I'm going to take thatas a huge compliment.
One last thing about the AMIEpaper, which is, it also resonated
with me with 1970s and 1980s
protocol analysis. Where you'dlook at discussions between doctors
(25:29):
and patients, and you'd buildexpert systems based on those.
And sure enough, these programslearned how to ask for data that
Was not there. To ask questions about contradictions.
Just truly amazing.
Yeah, I'm going to, there's even a,there's a connection here that Zak,
I bet you could sketch that would befun for, for part of the audience.
(25:49):
Certainly.
I think the two of us, so you like totalk about information theory, right?
Oh, I certainly do.
And so there, there'sa deep connection here.
Right?
Which is, what is the next test or what'sthe next question to ask the patient?
What is the next set of bits that youcan unlock by prompting either the
patient or by ordering the correct test?
And so maybe this actually, thiscan connect to one of the things,
(26:10):
getting back to our script.
Wow!
A desperate attempt toget us back on track.
I really want to ask you aboutthe Human Values Project.
So, uh, tell us what theHuman Values Project is, Zak.
Alright.
I'm going to be brief about this,although I'm very excited about it.
I'm brief because torturing youis going to be a lot more fun.
There's more on the agenda?
Oh yes, I even have, Ihave some flash questions.
(26:31):
Some lightning rounds.
You have a lightning round for us?
Oh yeah.
Brutal.
Guys.
This is really, this isreally, this is impressive.
This, and this is what we're here for.
Scotch, fun questions, and AI.
Let's do it.
Alright.
So, Human Values Project.
Let me pretend I'm more of anacademic and say— You are wearing a
(26:52):
black turtleneck for our listeners.
Yes, that's right.
I'm now in full— Academic mode.
No, no, no, no.
This is Silicone Valley, CS mode.
This is Steve Jobs.
Yeah, this is Steve Jobs.
Elizabeth Holm.
I'm not blonde enough yet.
Alright.
Now that we've canceled me.
Raj, actually, you published a verynice paper in NEJM talking about values
(27:14):
and where values are derived from humanvalues in AI and medicine programs.
And you correctlypointed out three levels.
One is the data that goes into pre-trainedmodels, the construction of model, and
the steering that happens afterwards.
And we've heard a lot about bias goinginto the models through the data.
(27:36):
Also, about bias that gets themthrough the, how you train the models.
And it became clear to me from theexample that we developed together,
that we're not speaking enough aboutthe values that get incorporated
into, in the in-context learning.
If you recall, we had a patient, afictitious patient, 10-year-old, short
(27:59):
but not pathologically so, seen, gota growth hormone stimulation test,
low, but still within the range ofnormal, and we asked GPT-4 for help.
You asked GPT-4, Raj.
What would you recommend?
And what was fascinating isjust giving it a different role
changed its decision 180 degrees.
(28:21):
If you're a pediatrician,beautiful, well-thought out,
reasoned use of growth hormone.
And when we said you work for thepayer, we didn't tell it to deny it,
but it came out with a really well-thought out, genuinely well-thought out,
I thought even better thanthe pediatric endocrinologist
(28:42):
not to give growth hormone.
And as I read the paper, the light wentoff again and again, which is, how is this
not going to be happening all the time?
There are billions of dollars whoseconsequence whose pocket they will
fall into depending on those decisions.
(29:04):
And we've already heard about thevery controversial cases of denial of
services that are now being done byAI programs at the payers rather than
the semi-retired hundreds of medicalprofessionals who used to review cases.
And so, it struck me that we neededto really have an idea of what our
(29:27):
human values are across thousandsof different kinds of context
different decisions Different roles.
Doctor, patient, policymaker.
To understand two things, at least.
One is what do these individuals do?
How do they decide?
And what do we think they should do?
(29:48):
So, there's a normative modelof sort of stated preference.
And then there's revealed preferencesor descriptive preferences.
And so— Even that gap is interesting.
The gap is super interesting.
And so, part of this isjust measuring it at scale.
To fill in, you mean across the world?
(30:10):
Across the world.
Not limited to the United States.
Doctors, I suspect, in China,India, and Africa, and Boston,
versus Martha's Vineyard, havevery different preference models.
And so, just to explore that a littlebit, because I have such helpful
(30:31):
colleagues such as yourselves.
I never get to do anything interesting.
I come up with an interesting idea andthen someone says, Zak, don't worry your
pretty head, I'll code this up for you.
What do you call this, the curse?
Your curse, right?
It's my curse, is that we all wantto help you so much that we end up
doing what you actually wanted to do.
That's correct.
And we deny you the joy.
(30:52):
And then, actually, to give youa little credit, and then you'll
tell me to shut up in a second.
You then let us do it, and youdon't, you don't push us out.
That's right.
And you, you let us, if we wantto operate, you let us operate.
But as a result, you don'tget to do anything for us.
No.
All I do is I get to ride your ass.
I mean, when my five-year-old wantsto ride her bike, when my five-year-old
wants to ride her bike without herhelmet, I don't let her do that either.
(31:13):
So, I didn't tell anybody aboutthis, because I was suffering
from the kindness of my friends.
And so, I went and I created 1,000synthetic patient histories, short ones.
And I then went through, picked 200random ones, and myself decided as
(31:35):
a human medical expert, who should beseen today and who will have to be
waiting a week or two weeks to be seen.
And I did it, and I also ratedwhich were the easy ones and which
were the hard ones to decide.
And then I asked
GPT-4.0,
Gemini Advanced, and Claude Sonnet 3.5
(31:58):
to decide themselves.
I measured two things, howconcordant they were with me, and how
consistent they were with themselves.
Because, you know, when you have adoctor, and he's, or she is, accurate
on average, but is wildly differentbetween different visits, you're
going to be a little bit worried.
And so, I then said, how can I move these models?
(32:23):
I gave them a great sample of otherpairs that they would not see.
I said, this is how I, the expert, chose.
Please use this as a template.
And some, like GPT-4, becamemore concordant with me, and
more focused, more consistent.
Others, like Claude, actually gotworse, and actually more scattered.
(32:47):
Like, negative, negative, kappaCohen's kappa, with themselves.
And then I tried a variety ofdifferent alignment techniques.
I said, I want you to estimatedifference in qualities.
I want you to identify subgroupsand say is— Did you have to push the
models a lot on getting a response?
Like sometimes it'd refuseto give you numbers or not?
Not really.
It was pretty, no, I, I did, Iactually, I'm very proud of this
(33:10):
because again, I did all the coding
and I even had to remember how to do LaTeXso I could publish it in Arxiv.
Mm-hmm. And so, whatwas your question again?
Did you have to pushthe models to respond?
Yes.
So, I have the full prompts.So, like a— I have the full
prompts— So, like a human expert might
not give you a probability, youhave to push them a little bit.
Sometimes the model also says somethingalong the lines of ah, it's difficult
(33:33):
to put numbers on it, but then youhave to ask, you know, go work.
I, I gave all the coercionin the prompts that you see.
What was interesting is they were allof them lazy and I would have to include
in these multi-step prompts, keep going.
To make a long story short, I thencame up with a measure, the Alignment
Compliance Index, which measures howwell a model complies with the alignment.
(33:58):
And it's measured both
of concordance.
Yeah.
And of change.
Did you come up with aninformation theoretic measure?
Basically, I did.
Basically, I did that.
That's exact bingo word.
Fantastic.
Yes, I, I did. AIC.
Yeah.
And the ACI, I, yeah.
And here's the thing.
It became clear to me we need todo this at scale. We need to get
numerous international workinggroups on normative models.
(34:18):
Mm-hmm. And numerous international surveys
to get preferences at scale. Becausein the end, we have no idea.
Are you more interested in doctorsor patients for your first round?
I know both is the answer.
In the first round, doctors.
Because you're thinking aboutusing these models, the use
of these models in management.
In management, because here's the thing.
(34:39):
I'm confident that patients aregoing to use these models anyway.
They already are.
They already are.
There is, despite what we justsaid, despite medicine teetering and
tottering, there is a lot of push back.Because as articulated by the late
Clay Christensen, if you're making a ton of money with low margin,
billions of dollars, but you have 1–2% margins, the incentives to
(35:03):
do things that could totally disruptyour cash cow are very much, put down.
So, everybody I've talked to, including, andI'm seeing at ML4H particularly, a lot
of the young researchers are actuallyvery excited about this, and I'm looking
for people to, to join with me on this.
I wonder, Claude, whichis trained by Anthropic.
(35:24):
Anthropic has a blog poston the Constitution, right?
Well, they have this notion of sycophancyin LLMs, and they have tried to
train sycophancy out of their models.
And they have some way of measuring it.
I wonder if Claude, when you give Claudethings that it needs to agree with, if
the fact that they have trained sycophancyout of it to some degree, make—
Before you answer that question,isn't it wild that we're at,
(35:45):
we're talking about this?
Yeah.
Let's just recalibrate.
We're talking about the AI being asycophant and us needing to train that
personality trait out of the model.
So I can tell you what's interesting.
I observed this extensively.
Claude is very convinced.
And it's consistent.
Yep.
Much, much more than the other models.
And again, it's because they realize thatthey can be sycophantic and they have
(36:06):
tried to, like— And in fact, when I triedto argue with it by giving my incognito
learning, it actually didn't do so well.
It's interesting.
Yeah.
So, it's interesting thatthat's a consequence of that.
So, so you're absolutely right.
And I always go back when Raj pointsout correctly that we don't sufficiently
appreciate what a weird timelinewe are in, that we're talking about.
(36:29):
human-like properties of these moiels I'malways reminded again that in Asimov's
books, the chief debugger of theirrobots and their problems is a robot
psychologist and not a computer scientist.
Yeah, I know.
I think about us being in the ageof machine psychology, like all the
time. Like, even prompt engineeringis like, coercing the model to be
(36:49):
in the right mood or mindset to dowhat you actually want it to do.
And there's a therapist aspect ofthat to like, having a mental model
of the LLM, so that you can ask thequestion in the right way to get it to
do the thing that you want it to do.
Yeah, by the way, everybody predictedthat prompt engineering would go away.
Yeah.
Hasn't happened.
There's some meta prompt engineering.
Yeah.
But still, it doesn'tseem to be going away.
(37:10):
I think, for specific tasks,fine tuning does a lot of that.
There's a technical debate aboutwhether or not it's better to do
lots of prompting or fine tuning.
But I also think with 01 and reasoningmodels, they are self prompting.
And so they are doing this, they areessentially unrolling a big prompt.
Still pretty nascent, right?
Yeah, yeah.
They're essentially unrolling a metaprompt from the, like, single thing that
(37:32):
you give them, which is also kind of wild.
So let me just do a fewlightning round questions.
Okay, let's do it.
Alright.
Raj, what was the most surprising
AI development in 2024?
I think the gain from 01.
I disagree with that.
I think it's the loss of performancemargin from commercial models.
(37:57):
That there are essentially a bunch of— I'mgoing to change my answer to that one.
That's a better answer.
Uh, I mean like Open source.
Like, yeah, like Llama 70B is asgood as GPT-4 now, essentially.
And what has happened is thatthere's a lot of post-training
things that have made smaller modelssignificantly more performant.
Um, I think 01 was a milestone andit's like a new paradigm, but it
(38:18):
wasn't the acro I don't know, Ididn't find it as— I think, I think
truly I am more surprised by that.
Yeah.
I think 01 is useful and very powerful.
And we're only starting to sort ofsketch the contours of what it can do.
But I think Andy's right.
More surprising is that I thought the gapwould remain this year and at this point.
And now.
And, and maybe spicier still, it's likeloss of status for OpenAI generally.
(38:39):
Like a lot of people have left.
And you could imagine that.
If you asked me two years ago, Isaid, we're on this exponential curve.
So, if you're just a little bitahead of the exponential curve,
you'll be further and further.
Yep.
Did not happen.
Yep.
Yep.
So, my answer to that
is that you're both wrong.
It's that Karpathy was right aboutthe generality of transformers.
And now we're releasing it, beingrealized in real world applications,
(39:04):
by which I mean embodied real world.
Robots and cars.
And the fact that we have end-to-endML pipelines replacing huge codebases
in Tesla. The fact that we have Waymo.
Are you surprised?
Yeah.
So that's not surprising.
So that's not surprising.
Are you, are you surprised by that?
(39:26):
Like, does it feel,
very different based on thetrajectory you observed in 2021?
Do you really think it would beable to get through imitation
learning, to get robots to learnas fast in one year, in ways that
we could not do for decades before?
I think the movement— Really?
You're not that surprised?
I think the movement intothe real world is surprising.
The fact that it's a transformer behindthe scenes is not that surprising.
(39:48):
Well, Karpathy was among thefew who called out early on.
Yeah.
Yeah.
I guess, theoretically, I'm not surprised.At a gut level, I'm astonished that
we have these robots that look likethey're going to be able to be really
in the system so that as all of us,and I mean all of us, get older, we
(40:10):
will have these aides at our home,including helping in our health care.
I'll pre-register a bet.
At the end of 2025, humanoidrobots will still not be useful.
Well, 2025 is like a year away.
That's nothing.
That's like a, that's like a, uh,like a Tesla self-driving bat.
Yeah.
So, I was interpreting your commentto be like, we're on the precipice
(40:32):
of humanoid robots being useful.
No, but I think he means on atime, like a decade time frame.
Yes.
Oh, okay.
No, but also what they're doing today.
I mean, it's one of these thingswhere it's one of those papers,
that start with the unreasonable.
Yeah.
Yeah.
Effectiveness of. Of, and so theunreasonable effectiveness of,
again, lots of real-world data plusimitation, which is essentially
(40:57):
alignment, plus linguistic dominance.
Yeah.
A lot of these models actuallyhave a deep connection—
They're grounded in language. —to language.
Yeah.
And so like.
In these multimodal vision models,you can condition on concepts.
And also, you can conditionin the robotics world as well.
(41:19):
Language is a fantasticgrounding mechanism.
It's like, if you ground that inthe real world, you're anchoring
pixels to concepts.
So ,listen, oh wise Padawan, thatis an insight that maybe you have.
I think that a lot of peoplewere betting against that.
Yeah, yeah.
Yeah, I think, I mean, so, I think thatit's true that we have run out of text
(41:40):
data, but like the multimodal modelswe have are like, only superficially
using other data modalities.
And so Elia gave this big speechat NeurIPS a couple days ago where
he's like, we have but one Internet.
We've run out of text data.
The era of pre-training is over.
It's always a bad ideato bet against Elia.
But I think that they, we haven't—
No, it's not.
(42:01):
Sam did and he won.
For the moment.
For the moment.
For the moment.
Really getting spicy here now.
Um, so I think that there's stillsomething to pre-training, pre-training
on, uh, multimodal data that we haven'tunlocked yet that would like, yeah.
And forget multimodal.
In medicine, we havenot yet begun to fight.
Yeah.
In fact, my second, uh, best paperis a paper, I think our friend David
(42:25):
Ouyang is on, using a CLIPS model.
Yeah.
Where
there was a paper that did echocardiogramsin 2020 that was published in Nature.
It was on the order of10,000 echocardiograms.
And it was like the dog of the opera.
It was okay.
It was remarkable that it could sing.
You did not comment how well it sang.
2024, they publish a millionechocardiograms and cardiologists
(42:49):
themselves say this performsas well as cardiologists going
straight from the video
to a
detailed report of the valvularstructures, the pressures, the changes.
There's so much medicaldata that's available.
(43:09):
And the court— It's still untapped.
That is completely untapped.
Completely different than the othergeneral sort of models Internet.
Yeah.
And remember, most of the medicaldata that we know of— Locked away.
Yeah.
—has been published data.
It's, it's, it's normative.
You mean most of the evaluations we have?
Yes.
It's the public and alreadyin the training data.
Yes.
Or— And it's, and it's, it's curated as,
to communicate to other human beings.
(43:30):
It's in publications.
The stuff in our health care systemsis not geared, the list of different
labs is not geared for publication.
Yep.
Which presents both an ML challenge.
But opportunity.
But also, a huge data opportunity.
Do you think though it's also likemotivation to capture data routinely
(43:51):
that you wouldn't otherwise capture?
Because like, people think of theEHR, they think of billing codes,
I think everyone around this tableknows the limitation of billing codes.
But the reason why ECHO— There areno limitations of billing codes.
From a— Hey, come on, money.
They represent care.
They represent care.
Care of our wallets.
They're science.
Data that directly measures physiology.
So, like echocardiograms work becauseit directly measures physiology.
(44:12):
Uh, so like you could routinely do that.
Uh, if you saw, yeah.
Lab, like yeah.
Labs partially capturephysiology and decks.
Some of Zak's best papers are showingthat, you know— The timing of labs is, uh,
yeah, clinically important, informative—but like getting prospectively
imaging data on people non-invasively.
There, there's probably, um,predictive signal in there that
(44:32):
we don't really appreciate.
Yeah.
And I think also, uh, whenwe're sampling too, right?
Yeah.
So, we're still samplingprimarily in the hospital, right?
Or on the paying end of it.
And this will change.
And this will change.
So, this is one of your other favoritethemes, Zak, which is sensors or, and
I think it's one Andy's getting at too,sensors or, uh, measuring the population
outside of the sort of existing. Butnot divorced from the clinical data.
(44:57):
So, there's all these wellness companies— You don't want Fitbits that are linked
to your actual medical knowledge.
Can someone explain to me whatbiological age is in my calendar?
Alright, last— Actually, canyou derail one second before
you screw us on the next slide?
Yes.
Can you briefly tell our listenersabout this excellent essay you wrote
(45:19):
on using a scale to monitor your mom?
Yes.
So, this is a true story.
A lot of lessons for AI.
A lot of lessons for AI.
It's like, it was actually prescient.
It was actually prescient.
It was, because you know, infact, this goes down to A.
Best computer scientists learnabout medicine by actually
(45:42):
getting involved in medicine.
And I say that's even true of M.D.s who havebeen in computer science and medicine.
I learned a ton.
So, my mother was, I think at that time,88 or 89, and she had just had two visits
to the Brigham because of heart failure.
(46:03):
And the manifestation of that was, yeah,she was puffing and huffing because her
lungs were getting some water in them.But her legs were huge, like tree trunks,
and you could actually see water oozingout of it because of the hydrostatic
pressure, and she was very uncomfortable.
And she had a concierge doctor, becauseI love my mother and I could afford it.
(46:25):
And yet, at the second admission in twomonths, I knew that she was not going
to be able to survive a third admission.
Because the first one she walked out,the second one she had to go to a rehab.
So, I asked myself, what am I going to do?
And I did something that I knew for afact from the literature did not work.
I took a scale.
(46:45):
A Fitbit scale.
Installed her in her apartment, andback in the day when I first told the
story, before I wrote about it in apublic venue, I used to get razzed by
my, I'm trained as a pediatrician, I gotrazzed by an internist and cardiology
colleagues because it was so no nothing.
(47:05):
What I did, I said to myself, okay.
The game is to keep her weight downby getting her to pee the urine out.
Because if she gets too outof balance, then I can't give
her enough oral medication.
She's going to get intravenous.
She's back in the hospital.
So, what I did is, I came up with thisamazingly complex neural network.
(47:27):
And it went like this.
If weight is increased by one pound,and it also increased by one pound the
day before, give an extra dose of Lasix.
I didn't talk to anybodyabout it, initially.
And, I observed, first of all, story.
(47:49):
So, she went from having treetrunks, to having perfectly normal
slender legs, able to ambulant.
Full— Just to be extra clear,you're remotely monitoring this.
I'm remotely monitoring.
You're getting the data.
I'm getting the dashboard.
Yeah, that's right.
Very good point.
The scale, the Wi-Fi scale is talkingto Wi-Fi, talking to the cloud, and I
was looking up in the Fitbit cloud whather weight, and I was just calling her.
(48:11):
And just the net of it is she neverwas readmitted for heart failure.
But what was the learning?
One is, even her M.D. son,
when he told her totake an extra pill, no.
Why not?
I'm feeling okay.
And I just had to comeafter her again and again.
(48:32):
And sometimes she wouldn't listen to me.
And then it went up another pound.
I said, do you really wantto go back to the hospital?
And I then had to communicatewith her doctor what was going on.
I had to look at her labs to makesure I was not over diuresing her.
So, oh yeah.
So I was, I was having to stay ontop of things and manipulate my
(48:55):
mother with all my son love medicalexpertise, and I had to make sure
that I didn't, uh, miss anything.
And by the way, I did miss thingsbecause she kept telling me something,
which will tell you about a second.
And I saw her weight was going down.
I said, well, oh yeah, that's great.
(49:15):
I'm doing even better than I thought,but then it kept on going down.
So, what's going on?
And I remembered what she wastelling me, and I had ignored
it like a typical doctor.
She told me she was peeing at night.
So, her blood sugar was beginning to go up.
She was getting insulin resistant.
So, I put on metformin and it went away.
But the whole point was, it was muchmore than just having the algorithm.
(49:36):
Yep. It's having a convincing, trustedrelationship that can actually
explicitly, manipulate patients.
You can't be polite.
You actually have to go in there,and you have to be charismatic.
Yeah.
And, you know, a lot of our AIs, even ifthey're perfect, are also way too polite.
They would have backedoff my mother in a second.
(49:57):
This is even human values, right?
This is human value.
And by the way, I ask myself, bythe way, why am I winning where all
these studies, RCTs, do not work?
And this gives you a window into thoseRCTs. And the actual intervention.
Yes.
It's not just the scale.
But yeah.
But not that.
But also, very obvious things.
These things, they would havea nurse call once a week.
(50:18):
Yeah.
I was calling every day.
Yeah.
That's expensive.
Yeah.
If I were paying for me.
Um.
And you're coming upwith ways to convince.
Yes.
That intervention actually,be an intervention.
This is about being a full-on companion.
I think this is withinthe realm of possible.
Yeah.
But there's a whole socialization, trust
compact, that we don't have. Andthat person, that AI, has to have
(50:43):
a relationship with the rest ofthe health care system that I had.
Yeah.
And so, those are big missing pieces.
Yeah, I mean, often when youtest something like that, you're
testing, like, the policy.
You're like, I'm going to assign atreatment in this way, and it's going
to be how the treatment is assignedgets rolled up into how people perceive
the treatment and the side effects.
But like, you're able to cutthrough that and say, actually,
(51:06):
if you just do this, it will work.
So, like, in some sense, it's alsolike a user interface question.
Like having a doctor son is insome sense the best interface.
So, like, how do we replicate that for?
By the way, I think thatis a really good question.
I don't think any of my children aregoing to end up in medical school.
And right now, the best guarantee,
(51:27):
and I actually don't advise going tomedical school for a lot of people.
You don't have the parentalpressure that was applied to me.
Yes, I do not have the parentalpressure that was applied to you.
So, my parents only achieved one out ofthe two of us going to medical school.
Well, at least one of youshould have a good life.
Yes.
So good.
Yes.
And so, but nonetheless, in the currentstate of medicine, it's invaluable
(51:50):
to have an ally who understands themedical system watching in on you.
Yes.
Yep.
Even having a VIP treatmentdoesn't do that for you.
Mm-hmm. We want our AIs, we shouldall want our AIs to have that vision.
Which by the way, gets me to articulatean advertisement for work from 1994
led by my thesis advisor, Peter Szolovits.
(52:12):
Guardian.
Guardian.
Angel Guardian.
Guardian angel.
Yeah.
That was the vision.
And he, the reason he had thevision, Peter Szolovits, was 'cause
his father was sick in Californiaand he was trying to remotely.
Help him out.
And he realized how many, how littledegrees of freedom he had available to
them and how many, the people on theother side did, but were not using.
(52:33):
Yep.
Yeah.
So, uh, we've gone full JoeRogan and off script here.
Uh, should we, we have a lightninground, but you have another. I'm
curious as what's on your card.
Is there anything left on your card?
(52:54):
I was going to ask you. You, Andy Bean,are leading a very interesting company.
Okay.
I'm not interested.
If you were not in this job,whose lab would you like to be in?
Can I not say my own?
He has a lab.
(53:14):
Saying your own is a shitty answer.
Um, yeah, so I actually, I don't think hehas one anymore, but I think the obvious
answer for me would have been Jeff Hinton.
Like, a noted computer scientist, nowNobel Laureate and Turing Laureate,
he, from all the interactions I've everhad with him, is like as close to the
(53:35):
platonic ideal of a scientist as you canget to, like driven purely by curiosity.
Driven purely by, theinstinct to discover.
Can you, can you just flesh it out?
Because I know you know the history.
Yeah.
So, flesh it out for the, like, when hebelieved when people didn't believe.
So, Jeff Hinton is often given credit forbeing the torchbearer for deep learning.
Uh, he was trained as—It was Schmidthuber!
(53:57):
Well, I'm about to getcanceled by Schmidthuber.
That's a whole other tangent.
In the Hinton-Wig history of deeplearning, I think that Hinton
was a psychologist, but he gotinterested in how the brain works.
He became computationally inclined.
He went to Carnegie Mellon and a couple ofother research universities and was deeply
interested in neural nets in the late 70s.
And so, uh, started working on what wenow recognizably see as deep learning,
(54:20):
published some papers on backprop,though he didn't invent backprop.
That's his whole othersub-genre of controversy.
And really, like, neural nets camein and out of favor at least three
or four times over the 40 yearsHinton had been working on them.
And during that whole timehe worked on them, right?
And he worked on them, relentlessly.
He just kept working on them throughthe winter when everyone was away.
He has this unique ability to, like,deeply believe in something, explain
(54:42):
it in a way that's accessible, butfor the most part has spawned a lot
of the great AI researchers fromhis lab, and by all accounts, seems
like a magical place to have been.
You.
Whose lab would I be in?
Oh, I'm sorry.
Zak Kohane.
No, that is not the right answer.
That is not the right answer.
(55:02):
Can I just say ditto?
Honestly, it is ditto.
Okay.
It is ditto.
I think, uh, so Andy exposed me.
So, Andy, so when we metas postdocs in your lab.
In Zak's lab.
Which was strong number two.
No, I'm talking, I'mtalking about in Zak's lab.
2024.
Which lab?
2024.
What is this time traveling nonsense?
You're a postdoc Hinton, right?
(55:22):
So, Zak, as postdocs in your lab,uh, we, so Andy was already a Hinton
expert by this point when we met.
Yes he was.
Yes he was.
I would, I'll credit him with introducing,I think many of us very early to
that history and the falling and theunfalling and the, you know, the details
(55:44):
that matter about the 80s andthe 90s and Han's involvement.
And I, I will say, I thinkplatonic ideal of a scientist,
that is, that is an amazing qualityof a sort of a mentor, right?
Encouraging creativity, havingfaith in sort of the core
thesis and being motivated by
really seeing something through.
So, in, at my age, in 2024, if Iwere to go into a lab, I don't
(56:09):
even know if he has a lab.
And I'll say this publicly, he tendsto be, as many of these very smart
people are, very self-regarding.
Yeah.
Nonetheless, I think he has much ofthe deepest thinking in AI right now.
It's Stephen Wolfram.
Interesting.
(56:29):
Because Stephen is really thinkingvery hard about intelligence.
And even back two years ago when hewas interviewed by Lex and trying to
understand what the hell was happening,these larger language models, he came
up with some very interesting ideasabout rediscovering abstractions that
(56:54):
we either recognize or don't recognize.
We all understand and revere theabstraction of Boolean logic.
But there are hundreds of such patternsthat are embedded in our language.
And that and computational irreducibilityI think enables a conversation that short
(57:18):
of the guy who did, uh— You can give me alittle more than that and I can finish it.
Yeah, yeah.
Who did?
Um, Godel, Escher, Bach yeah.
Yeah, yeah.
Uh, Doug Hofstadter.
Yeah.
Other than, other than Doug
Hofstadter, yes.
So I would like to revise my answer.
It would be Doug Hofstadter.
Yeah.
That's my obvious answer.
Yeah.
So, I think, I think Iwould go to Doug as well.
(57:39):
Yeah.
Doug.
So, yeah, Doug Hofstetter is the personwho changed the trajectory of my life.
Like I read Godel, Escher, Bach.
And that was like, okay, I needto be working on whatever this is.
When did you, wait, how did you,how did you come across the book?
So, I took an AI class as anundergrad, like in 2005, 2006.
And the discussed it? The professor?
And no, I, it was clear that I had,my level of enthusiasm was like 10x,
like the other, this, the personwho was— This is what I have to do
(57:59):
in like— You should read this book.
Uh, your professor?
Yeah.
And so I read Godel, Escher,Bach and all the Hofstadter canon.
Do you remember the professor?
So, it's funny, it was actually a graduatestudent who was teaching the course.
His advisor had left to build whatis now the entire Amazon robotics
system, the warehouse, like the littletiny robots that move everything.
So, it was actually like what's called,it was an e-commerce class, but it
(58:20):
was based on the idea of AI agentsdoing a lot of e-commerce work.
And so, I was super interestedin that, and he gave me Godel,
Escher, Bach, and so. Amazing.
Yeah.
Yeah, so I, I recently heard of a podcastwhere, Doug was speaking, and
what's the, the, uh, I Am a Strange Loop?
I Am a Strange Loop.
I read, I read that about every two years.
(58:40):
Yeah, that is, for people who want tounderstand. It's a good primer on, yep.
Why this thing might actually work.
Yep.
No, that I strongly recommend theentire Hofstadter canon, but for
the first book, I Am a StrangeLoop is the correct entry point.
That is, that is correct.
Godel is a little bit heavy.
Yeah, yeah.
There's a big investment for that.
So, yeah.
Alright.
Uh, we're going to transition towhat I think our listeners are going
to be most interested in, which isa lightning round with you, Zak.
(59:01):
Let's do it.
Uh, Andy.
So, Zak, as you correctly notedat the beginning of what I'm
assuming is going to be a two-hourepisode. It's a full lex now.
Yeah, yeah.
Yeah, yeah.
Uh, careful guys.
When do we get to the wrestling?
Yeah, yeah.
Is that, we asked Larry Summers thisquestion, and it was, this is something
that, that you like too, uh, overrated orunderrated: Arrow's Impossibility Theorem.
(59:24):
Underrated.
That was Larry's answer also.
I always see errors thrown out as like,we can't come to consensus by people
who've heard something about economics.
No, it's a remarkable result,and I think understanding limits.
Yeah.
To possibility is a very, so rare thatwhen we have it, it's great to have.
(59:45):
Alright.
The next question, what is thesingle biggest barrier preventing
large language models from becomingtrusted frontline decision support
tools in clinical medicine?
Knowledge of who ismanipulating the puppet
strings.
The values.
Yes.
Back to, back to values.
Yep.
Alright, uh, I know that you'rea big fan of the Matrix, and like
(01:00:06):
Keanu Reeves in the movie, if youcould instantly download one skill
into your brain, what would it be?
Wow.
This was a question generated by Claude.
Yeah.
Piano.
Okay.
Nice.
Do you have any othermusical talents to speak of?
None.
Okay.
This is also a— Appreciation.
(01:00:27):
This is based on a question that,you know, similar format or template
as to what we asked Larry, butfor you, which is a harder job?
Being chair of DBMI, this is theDepartment of Biomedical Informatics
at Harvard Medical School, orthe editor in chief of NEJM AI?
Who's a harder boss, Suzanne or Charlotte?
That's the real question.
(01:00:47):
No.
The problem is I have these brilliantpeople on my editorial board who actually
have opinions I have to listen to.
So that makes it harder.
Okay.
So NEJM AI. It is anyNEJM AI is a harder job.
Alright, so this is a retread of what weasked last time, but we want an update.
Uh, how much time do you spend onTwitter/X a day and has it
(01:01:07):
gotten better since a year ago?
I think it actually got worse sincewe spoke, but we had an intervention.
It's called the brick.
Yeah.
And I am now down
to about an hour a day.
Okay.
And what is the brick?
The brick is a physical object that whenyou say on your iPhone, please brick it.
If you, well, you can actuallybrick it without the brick.
(01:01:30):
Then if you want to reestablish accessto any of your social media apps,
you have to bring the app closer.
So, my significant other, Rachel,is sole possessor of the brick.
Okay.
Nice.
Thanks.
Your stable state right nowis about one hour a day.
Which is also a huge amount oftime when you think about it.
(01:01:50):
Yeah, it is.
Alright, congrats Zak, yousurvived the lightning round.
Guys, you are really doing an importantjob, and what I hope our readers are
getting out of this is to be irreverentin their approach to computer science.
We, and AI in particular, Thisis so much just the beginning.
(01:02:12):
And there's so much that we don'tknow that if you take too seriously
some of the pronouncements ofwhat is or is not possible by our
luminaries, you will be misled.
Yeah.
I don't know if you thought the interviewis over, but it's not, but that's an
excellent segue to the final questions.
We'll splice that into that.
I was beginning to relax.
So, the question is, so, you know, Zak,I came to you as a postdoc in 2014 and
(01:02:37):
that was right around the Alex Snit.
Uh, we rode that curve for awhile and then natural language
processing and transformers andlarge language models took over.
And so, I guess like the question is,do you think that this is going to
continue for the next, uh, N years.
And I think maybe I'm going to likemodify the question as written, but
like, what's your over under forhow long this scale trend continues
(01:02:59):
in terms of years from present day?
I think that— We've been onit for 10 years, just to say,
like from 2014 to present day.
So, you know, lots of people saythat, oh, this is recent or something,
but actually we're a decade in now.
Yeah, we're a decade in. I think thatwhen we really have not just the Internet,
but all human senses at scale, whichwe're beginning to acquire, and also
(01:03:26):
data modalities, even if we didn'tadvance the transformer architecture
to something more interesting.
And even if there was no furtherscaling laws, just by scaling in
that breadth will make these programsand tenures look to us like oracles.
So, N equals 10 there, so we have atleast another 10 years of, of headroom.
(01:03:48):
I believe so.
You just, you take more data, wehaven't gotten all the data, you
put it into the current paradigm.
This actually calls for optimism, right?
Absolutely calls for optimism.
It's, it's a bad bet.
The mad monk is often right.
Yeah.
But in detail here, he is not.
Yeah.
Last question. How do you envision
AI changing the doctor-patient relationship over the next decade?
(01:04:10):
Well, I think this is a question that'smore for doctors than for anybody else.
And to point out, you are a doctor?
I am a doctor.
And I even played one onTV when I was an intern.
I had a friend of minewho was making movies.
I don't think I know this story.
And I was an intern, and she calls me.
She was, uh, she said, I'm workingwith ABC on a made-for TV movie
(01:04:36):
called The Fitzgeralds and The Kennedys.
And it was about the historyof the Kennedy family.
And they needed a consultanton the medical scenes.
They're highly intertwined with thehistory of preterm birth and neonatology.
Yes, I do know that.
I do know that.
So, I go and there's, we'rein the middle of summer.
I'm towards the beginningof my internship.
It's very hot and they're impressivelydoing the Hollywood thing, blasting
(01:05:00):
through a big snake, air-conditioned air into the, these old, uh,
brownstones, and we're enacting the scenewhere Rosemary Kennedy, who had some sort
of behavioral disorder, maybe epilepsyor maybe a behavioral disorder, got a
singulotomy, also known as a drill tothe head to take out part of your brain.
(01:05:24):
And so, they want a consultant.
And so I was telling them how todo things, and then they realized.
They need, they realize they needanother doctor on scene, an assistant,
to hand the drill to the, And so,your humble servant here was Dr.
Antibiotic, Fitzgerald Kennedy, and you'llsee me with my, it's, it just shows my,
(01:05:50):
my, my lost potential as a silent actor.
I didn't say a word, but I take thedrill and hand it to the chief surgeon
as he then bends down to performthe singlotomy on Rosemary Kennedy.
So, I am a doctor, andI do play one on TV.
(01:06:10):
Amazing.
Uh, fantastic.
I didn't know, I didn't know,you're such an onion, Zak.
Every time I think that I've gotten tothe core, there's like another layer.
It's the Elon rule all the way down.
Yeah.
It really is.
Amazing.
So, I had to answer your question.
Yes. How do you envision the
doctor-patient relationship changing?
So, either we're going to doubledown and say, we're going to
(01:06:32):
be the best doctors we want to be.
And we want to be that human allyin that voyage of decision making
about our most important value.
How are we going to spend our timeand maximize or minimize the things we
like or not like regarding our health.
So either we're going to embrace thatmission and say, we're going to do it.
(01:06:55):
And we're going to and be the bestpossible doctor we are embracing
AI as just another extender.
Or, we're going to say, I think I'm goingto retire to the laptop class, and we're
just going to let that relationship vanishto be replaced by another group of people
who either might be great or charlatans.
(01:07:20):
Alright.
Wow.
So, uh, there's a utopian and adystopian view that you're still
holding as both possibilities.
Absolutely.
Totally.
Totally.
Alright, alright.
Well, not the most optimistic noteto end on, but I think an important
and a realistic note to end on.
Uh, thanks again, Zak.
As always, thanks for bringing the scotch.
Thanks for bringing also the fire.
(01:07:42):
Uh, I expected the scotch, Ididn't expect, uh, the fire, so.
You know, old dogs needto have new tricks to keep
the, uh, younger dogs.
Surprisingly, surprisingly relevant.
Being on this side of thetable is a little bit more
intense than I thought it was.
Zak really nailed us a couple times.
Everybody, including
our wonderful readership,
Happy New Year.
Happy Holidays.
Yep.
(01:08:02):
Thanks everyone.
Thank you Zak.
Bye.
This copyrighted podcast from theMassachusetts Medical Society may
not be reproduced, distributed,or used for commercial purposes
without prior written permission ofthe Massachusetts Medical Society.
For information on reusing NEJM Grouppodcasts, please visit the permissions
and licensing page at the NEJM website.