Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
(00:02):
Microsoft Research came outof a discussion between Bill
Gates and Nathan Myhrvold.
He was the Chief ScienceAdvisor to Bill Gates.
Rick Rashid was pursuedas the Founding Director.
I think that was 1991.
We were brought in as the first AI group.
There was a specific call out tool,gains and contributions we had made with
(00:28):
probabilistic or Bayesian inference.
And that approach resonated deeplywith Nathan and with Bill, per what
Nathan was telling us at the time.
When we came up, we wereprobably within the first 10
people, three of us, at the lab.
There was a group that came over fromthe IBM Research Center in Natural
Language Processing. There wasa small compilers team at the time.
(00:51):
Microsoft Research grew veryquickly in its first few years.
We spent a tremendous amount of timerecruiting, and it was interesting, I
remember back then, to even think about.
What would it be like to build a lab?
We established certain principles. Forexample, this amazing principle that
there would never be any controls on aresearcher's decision to publish anything
(01:15):
they thought was important to publish.
That would not be, there would not bea tower of reviews on that process.
And that has stood the test of time.
That really has given the labsquite a different feel than you'd
have in maybe a traditionalcorporate R&D environment.
The lab grew to comprise allareas of important areas of
(01:35):
computer science over time.
And people at the lab would do a mix ofapplications as well as foundational work.
Welcome to another episodeof NEJM AI Grand Rounds.
I'm Raj Munrai and I'm herewith my co-host, Andy Beam.
We have an amazing conversationto share today, and it's with Dr.
(01:56):
Eric Horvitz, who is the ChiefScientific Officer of Microsoft.
Eric told us about his work at Microsoft,which goes back several decades, including
how he got started at the companyall the way to his work on AI today.
Andy, I know this word is overused,but I think Eric is truly unique in
being both a card-carrying M.D. and aPh.D. computer scientist who works not
(02:16):
just at the intersection of the twofields, but who's made fundamental
advances to both fields on their own.
As one example, he wrote the classicmachine learning paper on a Bayesian
approach for identifying spam email.
And he's leading efforts at Microsoftnow to understand the strengths and
limitations of large language modelslike GPT-4 and medical diagnosis.
We had a really wide-ranging conversationspanning medical AI, decision theory, and
(02:40):
computer science, and it was a lot of fun.
He's also done really interesting thingslike the 100-year study on AI, where
he gave a donation to Stanford to helpthem study AI over a century time horizon.
That's the kind of long-term forwardthinking projects that have been
the hallmark of Eric's career.
He's also on something calledPCAST, which is a White House level
(03:01):
organization that advises the President.
And one of their key mandates isadvising the White House on sensible
AI policy and sensible AI regulation.
The number of areas that he has touchedin computer science, medicine, and
public policy has been very impressive.
The NEJM AI Grand Rounds podcastis brought to you by Microsoft,
(03:23):
VisAI, Lyric, and Elevance Health.
We thank them for their support.
And with that, we bring you ourconversation with Eric Horvitz.
So, Eric, thanks for joiningus on AI Grand Rounds today.
It's great to be here.
Thanks for having me.
Eric, great to haveyou on AI Grand Rounds.
So this is a questionthat we ask all of our guests.
(03:45):
Could you walk us through the trainingprocedure for your own neural network?
How did you get interested in AI?
And what data and experiencesled you to where you are today?
It's an interesting reflection for meto go back and think about all that.
I have always beeninterested in mechanism.
I've sought explanations for thingsin the world, how do they work.
And that led me to doing abiophysics degree as an undergrad.
(04:09):
I was excited about learning aboutphysics and biology, chemistry, how
they come together in various ways.
And during that time, I ended upgetting more and more curious about
this interesting black box we call
nervous systems, in particular humannervous systems, intrigued me.
What the heck was going on and who were we?
(04:30):
And that led me to a neurobiologylab. And I did a deep dive and, you
know, when you're an undergradlooking at neurobiology, you
get placed into very specificsegments of research and questions.
And I was becoming expert at lookingat single neurons, unit activity.
Building microelectrodes, pulling them, asthey say, and sticking individual neurons.
(04:53):
And in dark rooms, I'd watch theclicks and clacks of single neurons.
And that raised my curiosity even higher.
How on earth did even a trillion or100 trillion of these little
units generate this fluid consciousexperience that people have?
And then when I when I was applyingto grad schools, I pursued a Ph.D.
(05:16):
in neurobiology and thought I wouldadd an M.D. That would give me more
depth and the future possibility ofworking more with the grand challenge
of understanding human minds.
And by the time I got to Stanford to doan M.D., Ph.D., I was already so intrigued at
the modeling approach, the AI approachto getting around the mysteries and
(05:37):
complexities of dealing with many, manyneurons and trying to build a larger
understandings from the activity there.
And so, I was taking lots of courseson main campus, grad courses in AI.
And before long, I moved my Ph.D.over into AI at the foundations
at the principal's level.
And then later came back to TedShurtleff's lab doing AI and medicine.
(05:59):
That seemed like a nice unification,even though I stayed on more
on the principal side, probabilitytheory, decision theory, and so on,
of principles of bounded rationality.
But I found an interesting direction,which was how do you apply formal methods,
bringing decision theory and probabilitytheory into what was then the dominant
paradigm of AI in medicine, rule basedexpert systems, as they were called.
(06:20):
There was such a lot of excitementabout those systems, but they didn't
seem to handle uncertainty very well.
And by pushing into probability theoryand trying to bring that back, I would
say, to AI, AI research endedpushing me into understanding better
what we meant by bounded rationality.Systems that can't possibly do it all.
They can't possibly finish thinkingwhen it's a hard problem, like a trauma
(06:44):
care problem under time pressure.
What do they do?
And coming up with a formal methods forhow they would back off and become bounded
rational in a way that was justifiablebecame kind of a passion of mine.
And that led me into broader areas of AI research.
Can I, can I hop inhere and ask a question?
Cause I'm super intrigued by the sort ofscientific origin of your interest in AI.
(07:06):
So, you were trying to understand from aneuroscience perspective, how intelligence
arises. I wonder if you see any parallels
in what's happening now with tryingto do interpretability of large
language models or other big complex.There's a movement right now called
mechanistic interpretability, which isessentially applying like neuroscience
principles to large language models.
(07:27):
How promising do you think that is inlight of your experience in neurobiology?
Well, popping up alevel, it's interesting.
So here I was making, in some ways,a crisp decision at a point in my
career that I would not be pursuingthe complexity per the black box
of nervous systems, heading intothe world of Bayesian networks
and more generally probabilisticgraphical models where you knew the
(07:49):
semantics, you knew the procedures,you knew every arc and node.
Getting into machine learning, itgot a little bit more complex and
black boxy, but still understandable.
And as we got into the world of
neural models, neural network models.
And now today it's interesting to comefull circle and say, well, Eric, you
couldn't escape, you're back in thisworld again, where you're grappling
(08:11):
with complexity of the form we can'tunderstand and explain very well.
Some of the most interestingbehaviors that we're seeing now,
some people refer to them as theemergent behaviors of abstraction,
generalization, and composition,that we see, for example, with GPT-4.
So here we are back again.
And now I find myself,
(08:32):
and recently, working on a paperthat's coming out at iClear, you
know, where we're thinking, lookingat what happens, looking at the actual
activations of single neurons andpatterns of neurons when a system can't
answer a problem versus when it can.
And I had to, like, giggle tomyself saying, you know, Eric, there's
probably going to be no easy way intothe world of understanding deeply
(08:55):
mind, or even our small versions of
clickers of minds we're seeing inthese large scale language models now.
And so tools like that you referredto just now as mechanistic analyses
and explainability, understandability,I think will be very, would
be more and more important.
It's almost like here we are in
2024, if you ask me back in 1989during grad school years, what
(09:23):
would I even be doing in 2024, letalone what it looked like. The
Jetsons in that futuristic world.
This idea that we'd be back again,and I didn't escape my pursuit of
clarity and crisp understandings withthe complexity we've now encountered.
And also the incredible capabilitieswe're seeing at the same time.
I would be shocked probably to hear,to learn, about this back in 1989.
(09:47):
Yeah.
I thought that hearing about your earlydays, the symmetry just popped out to
me from where you started to where youended up. Everything old is new again.
So I don't know if that is exciting toyou or frustrating that this problem has
followed you over the many years, butit seems to be like you, you are kind
of back to where you started. I guess I wouldsay that as we pursued the path that I was
(10:08):
on, we'll call it for now, probabilisticand decision theoretic systems, where we
would, you know handle the representations.
Bayesian networks, influence diagrams,foundations of utility theory in
medical AI systems, for example.
Not to geek out the listenershere, but you know, the idea of
doing cost benefit analysis underuncertainty, doing diagnosis with
(10:30):
methods we understand quite well.
And by the way, those methods have reacheda point where they're quite mature and
we should not forego looking at themvery carefully in their applications to
heath care while we're in the superheatedtime about neural language models.
But that to the side, there is asense of excitement, to answer your
question, that it didn't seem we weregoing to get to anywhere deeply close to
(10:57):
what we see in, like, vertebrate nervoussystems on the path that I was on.
We have methods that will do tremendouslywell in being competent collaborators
when it comes to heath care. Sort ofmore precision in our diagnosis and
our therapies with those methods.
But now, with what we're seeing now, I'mnot going to say that we're necessarily
(11:21):
getting closer to how minds work.
But let me just say, I see a path tobeing surprised about large scale systems
and interactions with tremendous amountsof training data and certain principles
by which we train models to discovernew, surprising capabilities that I
(11:41):
didn't see on the path that I was on.
So, I'd say I'm surprised in that way. And delighted.
Yeah, it's certainly easier toexperiment on a virtual brain than
it is on a real one when you'retrying to understand how it works.
I guess too, I'd like tounderstand a little bit more about
the M.D. side of your training.
So, I think clearly you have aninterest in intelligence and AI.
It sounded like the medical training wasin service of those scientific goals.
(12:05):
But did you do residency, for example,and sort of how has the clinician
side of you informed your career?
Yeah, when I first came to mydoctoral work, my M.D., Ph.D. program, like
other medical, first- and second-yearmedical students, I dove in with my
class, you know, a full vibrant member ofmy four-person cadaver team in anatomy,
(12:27):
neuro, you know, physiology classes.
But at the same time, I was like stayingup late at night and spending all my
day also going from my anatomy class,smelling of phenol into my AI class work.
And people looking at me like,what's that odor on your body?
Where are you coming from, man?
In my experience, that odor is not thatuncommon in computer science classes.
(12:48):
So maybe you weren't that obvious.Maybe a different odor, but anyway
but back, back to the question.
I stopped out after like acouple clerkships to really
go full bore my doctoral work.
I got so into it, like so into it.
Most people in the world stillknow me as a researcher in AI,
and many are surprised with it.
Wait a minute, you have an M.D.?
(13:10):
And the story with that is that I was,got more and more interested in clinical
medicine through my AI applications andtalking to physicians, the experts on
teams and so on, getting more immersedinto that world from the point of
view of trying to build systems todo, to be competent and to be helpful
on those challenge problems.
I finished my Ph.D. work. In the meantime,a colleague and I started a startup
(13:35):
company that was actually taking someof our Ph.D. work and actually making
clinical tools on this new thing thatwas getting more powerful called a PC.
That company evolved into a largerscale company that we started taking
our medical principles of Bayesiannetworks and decision making to
other areas. To United Airlines forjet engine diagnosis, and to NASA for
(13:56):
space shuttle monitoring, and so on.
And right around this time, Microsoft, thisstrange company to the north of the Bay
Area, reached out with Nathan Myhrvold,a close associate with Bill Gates.
And we heard news that we're trying tostart a research team, and we're really
interested in the research you'redoing, and in your company, and so on.
(14:20):
Of course, we were stunnedthat, you know, why would this
company that makes Windows 3.1,
and a word processor, and a spreadsheetbe interested in even a research
group and let alone AI technologies.
That's a whole other story aboutwhere Microsoft Research came from
and our work there when we came, cameup. But we ended up being acquired and
(14:41):
I promised my two colleagues I wouldstay no longer than six months.
That was 31 years ago.
And when we made that deal,I hadn't finished my rotations yet.
My clinical clerkships. I wasso passionate about my Ph.D. work.
And I told the folks at,my two colleagues, Jack Brees
and David Herkerman, hey,look, you go up immediately.
(15:02):
I really want to just, I've gottenmore interested in clinical medicine.
I want to just experience thewhole, the whole. I want to do a
deep dive into, into this area.
I had like 10 months to go to finish upeverything, given my earlier clerkships.
And I just stopped, droppedeverything, got completely focused.
And I remember even like the tensionbetween like writing, finishing up
(15:23):
writing journal articles based onyour dissertation with, like, holding
a clamp during surgery, duringpediatric surgery and thinking,
oh, do I really want to be here?
But I started really getting into itas an, as maybe a more mature person.
Like, I started to love clinical medicine.
I wanted to excel in every rotationthat I did, you know, get those
top-notch recommendation letters and,you know, recommendation for house
(15:46):
staff when you come back and all that.
But in the end, I did a very rarething for most M.D. training programs.
On my last day of my last rotation,for my commitment, it was, I remember
it was September and the sun wassetting on a kind of cool, fall day.
I got into my car, went to my lockerfirst, I mean, got my, all my stuff, and
(16:08):
I drove away without doing a residency.
And I always felt like, oh, like it'sreally difficult to do that kind of thing.
But I made my decisionswhere I'd focus my attention.
But per your comment, the clinicalexperience has proved to be
extremely valuable. Especially whenit comes to, and I don't mean to,
(16:30):
I know both of you are, don't haveyour M.D., but you're doing lots of work
in AI right now and AI heath care.
When it comes to talking to AIscientists who are largely in their
labs, like, they have no concept ofsome of the actual real-world problems.
And, you know, why is it so hardto like translate this really
cool technology into practice?
(16:50):
To having been there and understandingdaily life and the workflows. It gives
you a different appreciation for, like,how hard it is to really innovate and
to bring innovations into the clinicalrealm to make a big difference.
I totally agree.
And I was fortunate.
So, I did a Ph.D., not an M.D., but duringthe Ph.D., it was kind of a medically
(17:11):
themed Ph.D. in this sort of unique programbetween Harvard and MIT called HST,
where we take a good chunk of the firsttwo years of preclinical coursework.
And then,
even more importantly, spend twosummers in the clinic, taking
histories and physicals, roundingwith the teams, presenting cases.
And I think, I really, now I'm lookingback on this, if and when we are
(17:33):
clinically relevant, I attribute somuch of it to sort of that time that
was spent just learning about howdecisions are made, and working with
clinicians, building real collaborations.
And so, you know, you havethis, I think, as Andy
said, very unique background.
You're a card-carrying M.D. anda serious computer scientist.
(17:53):
I think it's a pretty rare breed.
I think it's, it's interesting tothink, especially in GPT-4 era, post
GPT-4 era, what education for quantsand AI researchers to help them be
clinically relevant will look like.
I think that's a good point to actuallytransition to some of your work, Eric.
So, we want to dig into your workat Microsoft and your research there.
(18:17):
And I think you mentioned thisa few moments ago, you, as I
understand it, you helped to startMicrosoft Research. And you've been
at Microsoft for a few decades now.
So maybe we could start with tellingus about the founding of Microsoft
Research and some of your work onfoundations, and applications of AI, and
maybe human AI interaction, as well assome of the themes that you work on.
(18:41):
Yeah. So Microsoft Research came outof a discussion, from my understanding,
between Bill Gates and Nathan Myhrvold,who was, you might call him the CTO of
the time. I don't think he was given thattitle at the time, but he was, you know,
Chief Science Advisor to Bill Gates.And they have a beautiful set of slides,
which I think are available, which waslike, why is a research lab important?
(19:04):
What would be the nature ofMicrosoft's research center?
And so on.
That was remarkably interesting.
They, they were remarkablyimportant slides and thoughts for how
to, for framing Microsoft Research.
Rick Rashid was pursuedas the Founding Director.
I think that was 1991. When we, ourteam was approached by Nathan Myhrvold,
(19:26):
who was doing, you know, passionaterecruiting of core initial teams.
We were bought in asthe, the first AI group.
There was a specific call-out tool,
gains and contributions we had madewith, with probabilistic or Bayesian
inference. And that approach resonateddeeply with Nathan and with Bill,
(19:49):
per what Nathan was telling us at thetime. When we came up, we were probably
within the first 10 people, three of us,
at the lab. There was a group thathad been, came over from the
IBM Research Center in NaturalLanguage Processing. There was
a small compilers team at the time.
Microsoft Research grew veryquickly in its first few years.
(20:11):
We spent a tremendous amount of timerecruiting and it was interesting, I
remember back then, to even think about.What would it be like to build a lab?
We established certain principles. Forexample, this amazing principle that
there would never be any controls on aresearcher's decision to publish anything
they thought was important to publish.
(20:32):
That that would not be, there would notbe a tower of reviews on that process.
And that has stood the test of time.
That really has given the center ofquite a, of the labs, quite a different
feel than you'd have in, you know, maybea traditional corporate R&D environment.
The lab grew to compriseall areas of important
(20:54):
areas of computer science over time.
And people at the lab would doa mix of applications, as well as
foundational work, combinations thereof.
On our team, we started out being calledthe Decision Theory Group because we
were so excited about that at the time,as a foundation, including probability theory.
(21:15):
Eric, I love to dig really intodecision theory, but I suspect that many
of our listeners actually don't have toomuch familiarity with decision theory.
So could I just interrupt you for justa moment to give maybe our clinician
listeners, other folks who aren'tas familiar with decision theory,
an overview of what it is and justyour definition of decision theory.
(21:36):
So, there's decision theoryand there's decision analysis,
which is the engineering real worldincarnation of how you apply decision
theory on real world problems.
It goes back to the idea that thefoundations of modern approaches
to what are ideal decisionsgoes back to probability theory.
(22:00):
The axioms of probability, and ontop of which, the axioms of utility
or utility theory. And that's anotherset of assertions about preferences,
about what's good and bad in the world.Like, what are desires under uncertainty.
But an important test area or anapplication area for decision science
more generally has been medicine. Hardmedical decision problems where you face
(22:25):
a set of outcomes under uncertainty.
You have a set of actionsthat one might take.
There are different likelihoodsof different outcomes happening
following each of the actionsthat you might commit to.
These are irrevocable commitments,decisions in the world.
And the idea is, how do you discoverthe best action to take when there
(22:49):
are cost benefit tradeoffs, orthe outcomes are quite different,
potentially great uncertainty.
And so, there are processes by where youframe the decision problem. Which is,
what am I trying to do here exactly?
What are the, you know, what are the goals?
What are the key possibilities?
What are the actions possible?
(23:10):
What's the disease process in thiscase, in a particular case, for example?
What are all possible outcomes ofeach action that might be taken?
And then for each outcome, even beforeyou get there, you can sort of try to
really push on a patient's preferences.
Like, what does this mean for the patient?
If I do this kind of prostatetherapy, surgery, radiation.
(23:33):
What are the tradeoffs?
And then how do I choose the best action?
Especially given that there'llbe uncertainties in what happens
because you can't know for sure.
So, decision analysis is the engineeringapproach to taking those principles
of decision science or decisiontheory and bringing them to life.
And they typically involve notions ofestimating probabilities, computing
(23:57):
expected values. To come up with thebest decision, which is typically an
expectation given the uncertainties.
A lot of us do this kind of thingqualitatively on the, in the clinic.
Just by looking at literature,thinking through the key actions
and outcomes we care about, and thentalking with the patient to understand
(24:18):
preferences, and looking at best practicesand protocols, and making a call.
Typically, it should be thepatient's decision, of course.
But sometimes you have hard problemsthat you really are hard to, you need
to have work on paper, paper and pencil.
And when it comes to bringing AIto bear or to leverage harder
(24:39):
AI technologies, we want to have systemsthat can give us estimates of the
probabilities of the outcomes, and thenunderstand how to encode preferences
of patients, for example. And thenpropagate them through to tell us what's
the expected value, expected utilityof each action that I might take, and
let's pick the best one. But make surethat that really resonates with what
(25:02):
the patient has in mind for preferences.
As we get into talking about large scalelanguage models, one of the interesting
challenges is, what's the role of thesemodels when it comes to working in a
decision theoretic or decision analyticway? Which is the classic best practice
or approach for hard decisions andmaking calls on the best action to take.
(25:26):
And that's going to get, that gets intothese questions about can these large
scale language models really give usprobabilities, for example, that are
credible, that are well calibrated.
So, questions are coming up at the,you might say, at the intersection of
traditional best approaches for how youharness AI to do hard decision problems
and what these systems now can offer usin terms of their powers we're seeing,
(25:48):
which are remarkable in themselves.
So, I think when I first seriously got intodecision analysis and decision theory in
grad school, I was just blown away thatthere was so much thought and that it was
several decades old when I was startingto go through this. And how to formally
reason, over diagnoses, and over patientutilities, formally elicit utilities,
(26:13):
and then come up with rationaldecisions and also collect information
in a very principled way, right?
Even what tests to order, what utilities need to be
measured, things like this.
And then I was struck, so I rememberdiscussing this with Zak, who's
my Ph.D. advisor and, you know,our Editor-in-Chief of NEJM AI.
I remember discussing with him,I was, like, why isn't all of
(26:36):
medicine like this right now?
Like, why aren't clinicians formallyeliciting utilities and estimating
probabilities and applying Bayesrule and doing this and that?
And you know, it started thissort of multiyear discussion that
has of course, not resolved,but is extremely entertaining.
I think also informative about the sort ofdifference between some of the theoretical
(26:59):
frameworks and then the practical demandson the clinician, who has a very limited
amount of time and is maybe informallydoing some of these things, but not
approaching with all this machinery.
And so, I have to, you know, Ithink when we think about the sort
of population health level guidelinesthat are set by some of the national
organizations, I think they do take veryformal decision theoretic approaches
(27:23):
to making clinical recommendations.
But for this sort of individual clinicianseeing the patient, it seems that
large language models, like GPT-4 and itscousins, may actually solve some of these
fundamental problems that have seemedvery elusive, right? Or at least
offer a path to solving some of theseproblems around both the provision of
(27:44):
information, although of course thereare many limitations, as well as the
kind of interaction with an individualto collect utilities in a formal way.
Do you see decision theory potentiallyhaving some type of revival with
the advent of large language models?
You know, we have work to do.
I should say that my teamgot access to GPT-4, an early
(28:07):
raw version of it,
as part of our responsibility todo safety studies for Microsoft.
And we started, that's whenthis work that led to this paper
called "Sparks of AGI" came from.
We started looking atthis very, these models.
And of course, you know, imagineI would be diving in with all sorts of
hard challenge problems in heath care.
And I even pushed thesystems back in those days
(28:28):
just on the topic of your question. Cana system be told, you know, the chain
of, I'll use these fancy terms now,chain of thought reasoning, or other
methods, other prompting methods, tothink with Bayes rule. To explicitly,
to use probabilistic reasoning and tellme what it was doing on a worksheet.
Could it then do decision theory?
(28:50):
Could it do what we call hypotheticaldeductive reasoning in a loop,
where we look at the symptoms?
It's a phrase that's been used, that'sused over the years for, look at some
initial signs and symptoms, formulate adifferential diagnosis, list of diseases
by likelihood. Based on that, compute,you mentioned this earlier, you know,
(29:10):
the expected value of new information.
What's the value of, and sort by theexpected return of collecting new
information, given the cost of makingthat collection and then getting
the new information and going in aloop, hypothetical deductive loop.
Could I drive GPT-4 todo this kind of thing?
And did some early experiments in this.
(29:31):
It was so interesting to even push hard ongetting this, the systems to openly show
their knowledge of probabilistic reasoningbecause they've, they've read about it.
And they've learnedabout that kind of thing.
And to, you know, keep track ofprior probabilities and posterior
probabilities, you can imaginethese classic chest pain examples.
(29:51):
Forty-five-year-old white male, no history of cardiac illness, comes in
clutching his chest, pain thathe's never had before in his chest.
I'd like to also try to put it in herchest, because we're going to look at
gender issues, and statistics and soon. But having the system write down
its reasoning from the point of view ofBayesian updating, Bayes rule and so on.
(30:14):
And the system, you can see it struggled.
You can see steam huffing and puffing,but it was making progress in this space.
It wasn't always correct insome of the calculations.
We know that's not the strength ofthese systems. But to me, I'll use the
word again. Spark. There was thespark of the prospect that someday
(30:34):
these systems could explicitly betuned and trained, and call other tools,
for example, and they needed to docalculations that couldn't do themselves.
Preference assessments to become fabulouscompanions as decision analytic consults.
And so, I'm hopeful. You can tell thatour team, we're still exploring.
(30:56):
I'm personally stillexploring the possibilities.
I'll ask questions at times,you know, to even OpenAI.
Hey, can we have thesesystems better calibrate their
probabilities about X or Y?
And can we get into the log probsand see how we can sort of do some
research to make these systems morecompetent at Bayesian diagnosis.
(31:18):
I think, let's listen to thispodcast 15 years from now.
I'm guessing that we'll see an incrediblesynthesis of what we call traditional
Bayesian inference and decisiontheory, decision analytic consultation,
and large scale language models.
I mean, even now there are somebasic things we can do that are just
(31:39):
no brainers. Like, framingyour decision problem.
What am I not thinking about?
What more options might I have?
People often say in a decision analyticproblem, whether it be in heath care,
or finance, or in public policy, that anew option or a new consideration of an
outcome can dominate the whole analysis.
(32:01):
So, on that front, if we believe that,and I think we're seeing signs that
large scale language models reallycan be, let's use the phrase, mind
expanding for humans, for what they cando with their, I'll use another word,
with their polymathic skills, theirability to compose and synthesize, and
(32:23):
bring in real information, and ideas, anddistinctions that one was not thinking
about when they first started sketchingout even a traditional decision tree.
So yes, I think there'llbe lots of touch points.
So, you have a term that I think I'veseen you use before instead of the
history of present illness, right?
(32:44):
It's the history of future illness, right?
Yes.
Which is this sort of simulated possiblefuture states. Which is both a wonderful
potential educational tool, but I thinkalso a new way for a physician to
consider maybe things that they haven'tconsidered, that don't necessarily
align with their differential so far.
So that's super interesting.
So, you know, you mentioned also thatyour group has done work on evaluating
(33:06):
GPT-4 on medical challenge problems.
I think you had a preprint.
Maybe it's amazing also just howfast this field moves, right?
How the progress has movedfrom January of last year to
now. But I think it was a preprintin the first half of last year on
GPT-4 on medical challenge problems.
You had found that GPT-4 performedvery well on USMLE-style questions.
(33:30):
I think you had exceeded the sortof state of the art at the time.
And then perhaps, you know, I thought thatpaper was very interesting. And perhaps
even more interesting to me was yourgroup's work just a couple months ago
on prompting, and how effective somethingwhich seems simple, but is increasingly,
you know, we have this very rich evidencebase is extremely important for eliciting
(33:54):
the behavior out of these models thatwe want for a given task of how chaining
together a few different techniques intothis kind of meta technique that you
call Medprompt really unlocked certaincapabilities from this generalist model,
GPT-4 on challenging medical questions.
And so anyone, you know,I've also noticed that
(34:15):
GPT-4 users and other language modelusers, they sort of seem very strong kind
of bubble effects where some people areusing the model every day for hours a day.
They almost have a relationshipwith it, with the AI.
And other folks have used it once ortwice, maybe tried to use it as kind of
a Google device to look themselves up,
were not impressed anddidn't really use it.
But anyone who's used it a lot knowsthat prompting really matters and how
(34:38):
you sort of craft your interactionof the model really matters.
So, from that standpoint, it's notsurprising. But I think what's surprising
when you start to learn aboutit is what you actually do.
And so it's some things likesaying, think step-by-step, right?
Which you call chain of thought, or Ithink for another one of the models, it's
take a deep breath. Take a deep breath.
That's my favorite.
Yeah, so I, I literally, you know,Andy and I are both fathers
(35:00):
of young, of young children.
And it's literally the sameadvice that we give on a new task
that our daughters are learning.
I think my six-year-old was jumpingin the pool, and I was like, take
a deep breath and think abouthow you're going to do this.
And it's literally what seems towork with some of these models.
So maybe we
could just take just a couple minutesand then I think we want to jump into
sort of your efforts on responsible AI.But maybe just on that Medprompt
(35:25):
paper, could you just tell us about, maybe briefly about the background of the
paper and then what your findings were.And then maybe also just like signal
where you think this field is moving.
So what those results meanfor what's coming next.
Yeah, so the, the background on that paperis, of course, teams, the team under
(35:48):
my office, the Microsoft Research teams,we're very close with, have been looking,
you know, since we've been playingwith large language models at the power
of, in some ways, the power of, ofhow we communicate with these models.
It's so interesting, given how theywork, to understand how powerful it
(36:09):
is to set them up, to be generalizing,or synthesizing, or abstracting.
Composing based on subtleties inlanguage of how you describe who you
are, your role and what it is thatyou want, and how you characterize the
nature of what would be a goodanswer or output in this whole dialogue
(36:33):
as you get, as the dialogue continues.
I think Greg Brockman from OpenAIhas put it in a very pithy way.
He said, a surprising amount of AIresearch is getting large language
models to be in the right mood.
And saying the right prompt andcoaxing them to sort of be in the
right headspace is where a lot of theengineering effort actually gets applied.
It's interesting.
(36:54):
English is the hottestnew programming language.
And it's, it's amazinghow powerful language is.
We're discovering the incredible foundations of how concepts are embedded
in how language is used, and the meaningof words, and how they're strung together
and then encoded, and then representedand then reasoned with by these models.
It's, I'd say, yeah, theright mood or the right mode.
(37:17):
Even saying things like, one way thatI've used GPT-4 lately is as an expert editor.
So, I like to write my own stuff.
I don't like to have the system generatelike the content for me, but I don't
mind having the system and actually,you know, sort of enjoy, and I'm very
thankful when something's an importantdocument that I've, I've written or a
few paragraphs that I want to do a postto take a look, you know, to have GPT-4
(37:41):
take a, take a look at what I've done.
And the way I do this kind of human-AIcollaboration session is to tell
GPT-4, you are just such a talented,
insightful editor in how youcan take my material and really
find places, precision places whereyou might want to refine it, and
it can even be better than it is.
(38:02):
Put those comments into angle brackets.
Don't touch my text directly, butgive me expert editorial remarks
because you're just so great at this.
Really, really thanks.
Thanks so much.
You know, the expression of, thiscamaraderie and appreciation for the
kind of capabilities the system has,I haven't done a formal study of this.
But I find myself not doing that justbecause I'm anthropomorphizing, because
(38:26):
I want to squeeze top notch performanceout of these systems, and in some
ways, per Greg Brockman's comment.
It's putting the system inthe right mode slash mood.
And I think we'll learn moreabout this over time, but what's
interesting in the Medprompt work, aswe call it, Medprompt is, and you said
stringing together or composing.
(38:47):
We're at a point now where we have notionsof fused shot, prompting or learning.
Different ways of doing that random,based on similarity metrics.
Then there's the idea of using a chainof thought, like reason about your steps.
Then there's the notions of ensembling,and shuffling, and ways to combine
(39:08):
different answers all in a singleprompt to come up with the best
answer, looking for consistency.
And so, what we did with Medprompt is wetook several of these known methods and
did a very careful layering of them. Andthen going backwards, ablation, ablating
them to see what was each componentnow adding to the power or the accuracy
(39:29):
of the results we were getting, forexample, on the MedQA, very challenging
medical challenge problem benchmark.
The Medprompt work itself was kindof a fun experience in that it came out
of what we called the Medprompt Marathon.
I think I talked a little bit about thispublicly where we just said, hey, we have
a bunch of smart people. Let's gettogether and we're going to have teams.
(39:53):
And when it's going to really go for itwith some dedicated resources, we can
get fast answers. Sometimes getting,having a cluster so you can really cycle
fast, helps you think and be creative.
It was
interesting to, to see how far we gotwith that to be the top scores on not
just the medical challenge problems,but we went to this interesting
(40:13):
large scale set of benchmarks calledMMLU, which has challenged problems
in philosophy and law and electricalengineering, psychology and accounting.
And we said, wow, this Medpromptthing with these layers seems
to be pretty general purpose.
Well, we'll still call it Medpromptcause that's its roots, but what it's
(40:34):
kind of learning about, you know,what's the, how in 2024, what's the,
what are some best practiceapproaches to talking to these models?
Some of the magic that we discoveredwith the Medprompt work was to actually
use the model itself to do chain ofthought and to generate the few shot
(40:54):
reasoning, you know, write your ownchains of thought, uh, as examples.
We seem to be as, as good or better thanhumans could do with creating sample
chains of thought, as examples.
That was pretty exciting to us.
And it suggests that, which is comingto the fore now in multiple research
projects, that these models can playmultiple roles at different phases
(41:17):
of problem solving and prompting.
You know, and this in some waysframes work now on these multiagent
solutions like AutoGen, where you have,it's really the same basic language
model, but you're giving it differentroles as a programming construct and
telling you're the critiquer, you'rethe generator, you're, you're going to,
you know, work, check-in with the human.
(41:39):
And so on, and building setsof agents to take on various
aspects of the problem solving.
But one more thing about the Medprompteffort, you know, we were seeing a lot of
special case expert driven models goingon, where we explicitly in our initial
paper in the first half of lastyear, our original work on the
(42:00):
USMLE challenge problems. We explicitlysaid, we're not going to try hard.
We want to basically show you how powerfulthese language models are by just, just
talking generally and asking for answersto these questions the way anybody
might ask without doing any hard work.
And that was the magic ofwhat we were demonstrating.
(42:23):
You know, then we saw some competinggroups saying, hey, we're doing a lot
better than you, we worked really hardat this, we brought in 30, 35 experts.
And we said, you know, we were likesitting on a chaise lounge, smoking a cigar,
and that was the point. But youknow, let's, we'll lean in now.
We'll, we'll do a little marathonand we'll just explore the space of
what you can squeeze out of thesemodels by talking to them properly.
(42:47):
And that's what some ofthe background there.
What's your intuition forhow much farther we can go,
Eric?
So, if you, I think yougot up to 90% there.
Do you think just prompting on GPT-4can push us another, another 5%?
There's something to unlockin the model that we haven't
unlocked in, in prompt space.
You know, I, I'm, I'm sure thereis, but the question is what's
(43:10):
the, what, what are the margins ofreturn with effort right now?
Yeah.
I think when it comes to these largebenchmarks, like MMLU, all the medical
portions of it, or MidQA, at somepoint, some of that remaining headroom
is not going to be better thinking,but we discover, oh, that's really
(43:31):
a bad question, or those answerswere inconsistent in the benchmark.
So we're going to, we'd find somefroth that's really not going to
get better by being more brilliant,but by debugging the actual
benchmark itself.
Nice.
So I'd like to transition toyour work on the responsible AI.
(43:51):
So you've, you've had both a privatesector hat and a government sector
hat in both of these, in this effort.
I'd like to sort oftouch on both of those.
So, continuing with your work at Microsoft,I don't know whether or not this counts
as Microsoft or not, but you've stoodup this 100-year study of AI.
So I'd love to one, hearabout the genesis of that.
What we hope to learn from a100-year study of AI and
(44:13):
what you hope comes out of that.
Yeah, the 100-year study on AI, bythe way, it's, we call it that
just to, it's kind of catchy, but theendowment to Stanford that our family
did to stand up the 100-year study onAI is to have a report written with
proactive guidance to government, civilsociety, the public, academia, every five
(44:39):
years for as long as Stanford exists.
So, as John Hennessy told me,we can guarantee this will go
on as long as Stanford exists.
At the time, he thought that would be
pretty long time.
So it's probably, hopefullyit's more than a 100 years.
The background, the genesis ofthat study was that when I was
president of the Association forthe Advancement of AI, this is the
(45:02):
largest society of professional andscientific researchers in the world.
The triple AI, we're having the upcomingmain conference coming up in a couple
of weeks that I'll be at in Vancouver.
But when I was president in 2008 and 2009, I made the theme of my presidency,
because it was just coming, AI is coming of age.
(45:23):
Even with applications like thereadmissions work we were doing
at the time, for example, in realhospitals and testing things out.
I made the theme of mypresidency, AI in the open world.
And when I gave my presidential lecture,which is still online, it was all
about, like, we have to sort of thinkthrough the principles and mechanisms.
It gets back to bounded rationality,but how can we design our systems
(45:43):
to do well in the open, in thescruffy, uncertain open world
and still be robust and reliable?
And I talked about differentways to do that and so on.
For the last part of my talk, Isaid we also, we have to also start
thinking about AI people and society,and its influences in the open world.
And some people thought that was likeway over the top in 2008, but I called
(46:05):
together and I announced at thatlecture a study that would go on for
several months that led to a meeting atAsilomar, for symbolic reasons. We went
to Asilomar, called the PresidentialPanel on Long Term AI Futures.
It was just this greatstudy with top notch people.
You can go online and read about whowas there and so on back in, in 2008 and
(46:27):
2009 because it spanned those two years.
And in 2014, it was five years later,and I said, we should do this again.
That was so useful, but things arechanging so quickly, even back then.
How can we do this again?
And of course, as a computer scientist,we think about induction, N gets N plus
one, and how can we do this forever, every five years?
(46:50):
You just have to establish the base case and then – Exactly!
Just watch out for recursion.
But anyway, so we, webasically went to Stanford.
The development office thoughtwe were a little bit out of the
box and called the president.
And they were all kind of wishy washy.
I don't know if you can do this or not,and we can't guarantee this will go forever.
John Hennessey, who I've known for manyyears back when he was teaching,
(47:12):
when I was at grad school, saidto me, Eric, this is a great idea.
Let's just do this.
You know, it's funny cause then you justmaybe seven or eight years later when
we were standing up, and I was helpingon the advisory board for the human
centered AI, HAI program at Stanford,we had a big opening dinner, and John
was at my table. And he looked at meand he said, remember back a few years
(47:33):
ago we all thought this was crazy?
Well, I guess it was just afew years ahead of its time.
But anyway, that study is nowgoing into its third report. I
recently mentioned, God, fiveyears seemed like an appropriate,
you know, base cycle back in 2013.
When we stood this up, maybewe should speed this thing up.
I mean, in terms ofthe recurrence on that.
(47:53):
If you go online and go to the 100Year Study's site, you can go to my
initial document, what's called theFraming Memo, where I listed 18 issues,
challenges, and opportunities that Ithought would stand the test of time.
Like 100s of years.
So, maybe folks, your listeners can goout there and take a look at those.
We'll go in and check on those annually.
(48:15):
And see where we arewith these things, right?
Cause they're all kind of like, Ithought, really raised questions about
possible futures, you know, in 2013,and I think they still do. Anyway,
in part, it turns out that thatstudy, and it's the news about
that, came back to Microsoft.
(48:36):
And people started talkingabout it at Microsoft.
I mean, discussion in email, Satya andBrad Smith mentioned we were doing this.
And we also talked aboutwhat's Microsoft's point of
view on responsibility in AI.
Satya called together a meeting and westarted thinking together about what were
our principles about AI as a company.
(48:57):
And this work led to what wenow call Microsoft's AI Principles.
There's six of them and the standingup of what was called the Ether, it
is called the Ether Committee, whichstands for AI Ethics and Effects
in Engineering and Research, whichended up defining and scoping out our
earliest approaches to what, what thecompany would be doing when it to
our responsibilities when itcame to the development and
(49:20):
fielding of AI technologies.
Before we get you to the lightninground, I want to ask about your role on
the President's Council of Advisors ofScience and Technology or PCAST for short.
So this is a not specific to AIcouncil, if I understand correctly,
but a large focus of PCAST recentlyhas been on AI, and you've helped
(49:41):
advise the Biden administration onthe executive order they released.
So I guess to the extent thatyou're able to answer this. How
is the federal government thinkingabout AI and AI regulation?
So the President's Council of Advisorsin Science and Technology, PCAST, is,
it just has been a fabulous experiencefor me in terms of the colleagues
(50:02):
and collegiality and the variousprojects and working groups that I've
been involved with or contributed to.
One of the projects people out in,I'll call it NEJM podcast space, would
find very interesting is we finished,
and made public after about a year anda half of work, our report on patient
safety. Calling for a transformationalefforts, effort in patient safety with
(50:27):
set of recommendations to the president.
So we do a variety of projects.
I'm just coming off of co-chairinga project called Cyber
Physical Resilience for the Nation.
That's just was posted just last week.
And you'll see that there'sThere's quite a bit going on
with opportunity in that space.
Yes, AI lit up the WhiteHouse with interest.
(50:49):
It lit up the Office of ScienceTechnology Policy, which, which is
the housing organization for PCAST.
There's been a great deal of engagement.
The PCAST group briefed the presidentdirectly more than once on AI.
I've been very much involved withthose briefings, and we were engaged
at multiple levels, including levels ofcollaboration that had influence on
(51:14):
the president's executive order on AI.
There's both concern for theneed to regulate and to guide,
but perhaps even more so,
interest in making sure thatAmerica and the world more globally
harnesses AI for great benefit.
So those are coming together right now.
(51:34):
And in some ways, there's a sense thatif we could appropriately regulate
the technology when it comes tosafety, for example, in rights and
some specific concerns like Let'ssay biosecurity-related issues.
We're freer to innovate and to getthe most on the positive side of
AI innovation at the frontiers.
(51:56):
I'm going to try and be a realpodcast host here and tie this back
together to something you mentionedearlier, which is decision theory.
So, you will see people on both sidessaying that there's infinite benefit
from AI, but also infinite harm.
And so it's almost kind of this weirdPascal's wager situation where doing
a sensible cost benefit calculationis almost impossible by definition.
(52:20):
So how do you actually reason aboutcost benefit of AI, if not in the
short term, then over the longterm, when you have these kind of
infinities that crop up according topeople on both sides of this debate?
Well, I think many people who areworking in this space aren't necessarily
pinning their needles to infinities.
(52:40):
There's somewhere more in the middle andlooking at, for example, the downsides
as potential rough edges versus theend of the world. It comes down to, you
know, our democracy works in that we, andour elected officials and advisors
get to work together in a collaborativemanner to think through the various
(53:03):
inputs and possibilities to study them.
It takes wise committees.
And I say wise, because you can'tnecessarily converge on the ideal
answer, but you can do kind of adecision analysis and say, look, we're
seeing these kinds of benefits. Hereare some concerns. Maybe we can monitor
them over time and learn about them.
Maybe we could check back.
(53:24):
You can do some
limited previews.We can do red teaming.
What are some techniques that wouldprovide some fail-safe dimensions to AI
technologies and how they're being used?
What are the key concerns from thepoint of view of different stakeholders?
For example, if you ask me, whatare the biggest concerns with AI
(53:45):
going forward?
The two that come to mind for me are thetwo challenges are the flooding of the
use of AI to flood the world with, withsynthetic media, including nefarious uses
of the technology to persuade, and toimpersonate, to generate propaganda.
These are, these are I think aresignificant threats to democracies.
(54:06):
There was a recent example of a fakeBiden robo call in the primary in New
Hampshire, where they cloned his voiceand had a very convincing robo call of
telling people not to vote in the primary.
Right.
That kind of thing.
And of course we've tracked that veryclosely, and we're don't just, folks
aren't just sitting on their hands.
Our teams and other teams have beenvery interested in, for example,
(54:29):
media provenance technologies,watermarking technologies.
These were mentioned
front and central in the executiveorder. Some of these technologies
were based, you know, on my team andcame out of our groups at Microsoft.
And are now being shared in largercoalitions like the Coalition for Content,
Providence and Authenticity, C2PA.
(54:50):
C2PA.org
if you want to read about that work.
So, in the context of the electionscoming up around the world over the
next 12 to 24 months, there's lots ofinteresting work going on right now
among multiple organizationsas to how to grapple with
the prospect that AI could be usedto disrupt. The other area, the
(55:12):
other area I'm interested inand concerned about is biosecurity.
I mean, with look, the AI-powered proteindesign and bioscience more generally is
going to be game changing for heath careand for just even understanding the
foundations of biological systems.
It also can put new powers in the hands ofmalevolent actors to generate new toxins,
(55:36):
for example, or gain a function research.
We need to basically stay on topof that and come up with the right
kinds of approaches to, for example,screening DNA synthesis in new ways
and updating those screens over time.
So, there are mitigations and directionsfor research on mitigations that are
possible on all fronts of concernthat we need to make investments in.
(55:58):
At the same time, we have to really lean into, maybe with some courage,
exploring the upside to not be paralyzed.
Clearly, this technology is goingto change so many aspects of life.
I often say I don't want to scare people,but to me it makes me excited that, you
know, 500 years from now, the next 25years will be recognizable as a named
(56:23):
period of time because of AI advances.
It's up to us to guide that technology.
We can't shut it down or stop it. It'spart of our natural curiosity, our
science, but so are the guardrails, andso is our democratic world's approach to
grappling with the tradeoffs and doing,back to your comment, even qualitative
(56:46):
decision analyses. Like, what we wantto try, what makes the most sense, what
will be the best thing for populationsof people, for different stakeholders.
Alright.
So, I think that's a goodtransition to the lightning ground.
Lightning round is where we're goingto ask you a series of questions, maybe
serious, maybe goofy, maybe funny.
We'll let you decide whether or notyou think they're serious or goofy.
(57:09):
And the goal for the lightning round.
Okay.
So, the first question is Microsoft hasa lot of big AI for science initiatives.
And as you just mentioned, drugdevelopment is one of the most
important applications for AI.
So the question is, can you see a daywhere Microsoft ever makes a drug?
We would make drugs, but not produce them.
(57:29):
In other words, our folks wouldbe pursuing new approaches to
antibiotics and pharmacology aspart of their AI for science work.
But the idea of making thedrug and distributing it
would come through partners.
We're kind of a platform and, and, you know, research organization.
And, um, I see us not competingwith pharma, but, supercharging,
(57:52):
helping pharma to be supercharged.
We should have a drug, you know.
Yeah.
Yeah.
Anyway.
Yeah.
Good question, though.
Alright.
Here's our next lightning round question.
Eric, if you could roll the clockforward five years, what are Microsoft
Research's major contributionsto the field of biomedicine?
(58:15):
I would say that our contributions willbe new tools that enable drugs to be
discovered and tested in simulation.
And the other direction will be, we willsee advances that will ideally bridge
wet labs and the in-silico world bygiving in silico techniques the ability
(58:37):
to call and design experiments thatthey need to push forward on advancing.
So is that like automated lab?
Is that sort of like a fullstack AI-automated lab?
Full stack automated science withhumans in the loop, of course. But
this whole, this whole, there'sbeen a really interesting long-term
(58:57):
prospect going back to our decision
science conversation,expected value of information.
Can AI systems really use theircuriosity to drive experimentation
to collapse uncertainty by lookingfor the information they need through
experimentation and guiding that process?
To me, that would be just a beautifulsupercharging of science more generally.
(59:21):
Got it. So you've obviously been thinkingabout AI for a long time, and I think
what we've learned today, too, is thatyou're nothing if not a true Bayesian.
So I think going back 20 years fromnow, or going back 20 years from today,
what about your world model of AI hasbeen updated the most in light of evidence
that you've gotten over the last 20 years?
(59:42):
So what beliefs that you held 20 yearsago needed the most updating today?
My answer to that question would bethat the biggest updates I've received
happened in the last year and a halfover large language models and what I
consider the magic that they're showing.Which suggests to me that what's
going on with unexpected, surprises andcapabilities might somehow be related to
(01:00:08):
the surprises we see in the magic thatcomes from large scale tangles of neurons.
Will doctors still be responsiblefor documentation in five years
or will generative models likeChatGPT have taken over that task?
I certainly hope the latter will be true.
I want doctors, I still wantphysicians to be in the driver's seat.
(01:00:29):
I want to celebrate the primacy oftheir agency and I believe that
human touch and human connectionwill be even more important as
with the rising sea of automation.
So I think you have survived thelightning round, Eric. Congratulations.
So I, I don't know if you'venoticed, but I've been trying to
dance around the "Sparks of AGI" paper,cause I wanted to save it for the end.
(01:00:53):
So now I'd like to revisit that.
So you, and you know,Sebastian Bubek and other collaborators
at Microsoft wrote this paper with aprovocative title called "Sparks of AGI."
Where it was really like the first indepth look at GPT-4 and its capabilities.
So, I guess AGI has a certainconnotation and comes with certain
(01:01:15):
implications for better or worse.
I've always like said that ifyou say generalist artificial
intelligence, no one cares.
But if you say AGI, people startto really want to debate you.
So why did you feel that GPT-4 had hit orwas worthy of this, you know, somewhat
hallowed term in the history of AI. Weused to talk about narrow AGI versus AGI.
(01:01:36):
Like why, like what about yourinteractions with GPT-4 made you thought
that it was ready to be called AGI?
So many people in AI research,serious researchers have looked at
the term with raised eyebrow AGI,which came kind of late 90s,
early 2000s, because wealways thought we were doing AGI.
(01:01:59):
In other words, we always thoughtwe were pursuing principles
of general intelligence.
That's the idea with AI research.
In fact, Herb Simon's 19, I think itwas 1959, project with Alan Newell and
others was called General Problem Solver.
That was the original GPS in my readings.
And so we said, okay, okay, we get it.
(01:02:20):
Some people, many of whom camefrom outside the serious, I'd say,
central AI research community said, you know what?
I see successes with
narrow AI.
Like that, Eric, that readmissionsystem or that system that can
predict which patients will get
C. difficile, you know, in 48 hours.
That's narrow.
Look at humans, what theycan do, all these abilities.
(01:02:43):
We call that more general intelligenceand we should be pursuing that.
Now it's not that AI researchwasn't pursuing that deep down.
It was just that we were havingsome successes on the narrow front.
So we were kind of like, yeah, we get it.
Okay.
Now, AGI also became associatedwith kind of doomsday scenarios of,
you know, visions of Terminator.
(01:03:03):
And this is what therise of AGI would mean.
But if you go back to the earlydefinitions of AGI by people that were
using that term, they were calling out asgeneral a set of abilities as people have.
And they would, and we actually searchedaround as we were playing with GPT-4
as part of our studies, Sebastianand I in particular were trying to
figure out how to frame the work.
(01:03:24):
We wrote that first section togetherabout like, how do you frame this work?
What do you bring AGI in as a concept?
And we talk in the first section ofthat paper about the history of the
use of the term, just like you'reasking me now. Why we thought, you
know what, let's, let's use that term
because it really does mesh with theinitial intentions of that phrase.
(01:03:45):
And we don't need to go down thepath of changing it to generalist
AI or something like that, becauseit has been defined quite nicely
in different ways that are quitesimilar to what we're getting at now.
Now, I felt it was important in the titleto not say first contact with artificial
general intelligence, or it's here now.
Cause my perspective was we wereseeing glimmers, like true sparks,
(01:04:08):
like a snapping arc in places.
And at times that were like, bowledus over. Like, wait a minute.
Like, what's going on here?
This is really impressive.
And it does have a number ofthe capabilities that people
have talked about for years.
Let's just come out of the, out of theshadows of experimentation with the
(01:04:28):
system, show the examples as to whywe're seeing these sparks and look at
these 72 sparks and their cross links.
Now, for me, the sparks included, Ihave examples in what's called the AI
anthology online of some of the examplesI played with early on, but even the
fact that these systems could see whenthey weren't trained, on imagery,
(01:04:53):
like I was drawing faces in, you know,with if ASCII and talking to the system
and, you know, yeah, I see a face there.
And I would change it into a littlelunar lander, just like by editing.
And the system would say, well,I see a face, I think still, I
said, no, that's a lunar lander.
Oh, I'm so sorry.
Why did you say face?
Well, you know, faces are so common.
(01:05:14):
And so these systems were like almostlike having dialogues, even about
things that they weren't necessarilytrained to understand, learned through
language associations over time.I mean, over the training corpora.
So I can go on and on about this. Eachspark that came to our attention, but
(01:05:35):
the biggest word for me was polymathic.
I'll use that word again, like theability of the system to weave with
fluidity across different disciplines andcombine things together and do synthesis,
really was the main sparking arc forme in, in coming up with that title.
So, I agree.
I think if you don't view AGI in aquasi-religious kind of way, and you
(01:05:58):
just look at it in the strict technicaldefinition of the word, it seems
very hard for me to argue that it isnot a general kind of intelligence.
I guess so.
Let me just say, so we were basically,in some ways you might say, we
were claiming that term back.
We were calling it back fromthe quasi-religious and saying,
let's get serious about this.
This is computer science. Yup.
And I guess, but so then there's still thequalifier of the title, which are sparks.
(01:06:22):
So, I guess the question that I'd liketo ask you is when do the sparks hit
kindling and become a full roaring fire?
How far are we away from that?
It's hard to know.
Do you think it willhappen in your lifetime?
I think, it's safe to say frommy point of view and what I've
(01:06:42):
been seeing that the next paperlike this won't use the term sparks.
It'll be a different constructto capture a more comprehensive set
of capabilities we'll be seeing.
So I expect surprises thatare exciting in my lifetime.
And this is what
my career's been all about.
(01:07:04):
We're getting to this point now where,as I said, we're coming full circle
back to the deep interest in what neuralnets were doing. Was at the foundations
of our own cognitive substrate, and Ithink we'll be learning a lot from these
systems that has implications for that,
as well as learning more generally aboutprinciples or the physics of intelligence.
(01:07:28):
Alright, Eric.
Well, I think that's agreat place to end it.
I think we've come full circle.
So thanks so much for joiningus on AI Grand Rounds.
Well, thank you.
It's been a pleasure.
And thanks for all the greatquestions and conversation.
Thanks so much, Eric.
That was great.