Superintelligence and human security, with Dan Hendrycks

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
(00:01):
Stop the world. Welcome to STOP THE WORLD.
I'm David Rowe. And I'm Olivia Nelson.
So Liv, we're in the countdown period for the Sydney Dialogue,
which is our tech and security conference.
We're a month away and so we're getting everyone revved up with
a few tech orientated episodes and today we have one of the
world's foremost AI safety and risk experts, Dan Hendricks.

(00:24):
Very exciting, Dave. I can tell how listeners can
tell by the smile on your face, but I know you've been itching
to speak with Dan for quite a while.
I was playing it cool though. You really were.
Dan heads the Centre for AI Safety, which is a nonprofit
research organisation. He's also an advisor to X AI,
which is Elon Musk's AI company,and Scale AI, which was partly
acquired by Meta earlier this year and is a huge player in

(00:46):
providing high quality data for training AI models.
So he's very much at the centre of things.
Yeah, he is. So Dan and his team are quite
prolific in producing groundbreaking research on AI
safety. He was one of the prominent
signatories to the recent open letter calling for a pause on
super intelligent AI until it was safe and had proper public
engagement. So we talked about the merits of

(01:08):
such statements. We talked about his paper from
earlier this year on the inherent strategic instability
of super intelligence. Dan published that paper with
former Google CEO Eric Schmidt and Scale Scale AI Co founder
Alexander Wang. And it was really the first
serious effort to look at the risk that a superpower might
feel the need to take preventiveaction against a strategic rival

(01:31):
if they thought that rival were about to build super
intelligence, which would obviously give it a.
Massive capability. Dan also talks about the
question of AI having its own goals and values, about the
concept of recursion in which machines build smarter machines
so that you have an intelligenceexplosion.
And he talks about how we defineartificial general intelligence

(01:51):
and some recent work by his teamthat charted the improvements
between Open AI's GPT 4 and GPT 5.
That research also identified the shortcomings that AI has,
notably that it doesn't store long term memories and therefore
doesn't learn over time from experience as we do.
Yeah, and I noticed you winced at the idea of an intelligence
explosion, but it is a real possibility.
They're already pretty good at coding, which is which is a good

(02:14):
start. So Dan's not exactly what they
call in the industry a Duma, buthe does take AI risks, including
loss of control very seriously. And importantly, he does a lot
of ground, you know, evidence based work on issues such as
rogue actors misusing powerful AI and the strategic
disequilibrium that could come from super intelligence.

(02:34):
So it's really vital work. That's enough from us, from us,
Dave, over to you and Dan. And folks, don't forget about
the Sydney Dialogue. For more information, including
how to register, visit tsd.sb.org dot AU.
Welcome to STOP THE WORLD. I'm here with Dan Hendricks.
Dan, thanks for coming on. Thanks for having me.
So you signed a super intelligence open letter last

(02:59):
week alongside a number of otherprominent AI experts and safety
and risk advocates, Hinton, Bengio, Russell, etcetera.
I'm just going to read it out. It's very short for the benefit
of the audience quote, we call for a prohibition on the
development of super intelligence not lifted before
there is broad scientific consensus that it will be done

(03:22):
safely and controllably and strong public buy in.
Just tell us, first of all, why did you sign that letter?
Well, I think generally for extremely powerful technologies
it would be useful to have buy in from people rather than
companies unilaterally imposing that.

(03:42):
That isn't to say that there should be buy in for every type
of technology, but if it is potentially one of the most
important technologies we've ever done, it would be useful to
have some type of support that'sa good to have.
And secondly, it's important that it be safe.
So if we, for instance, we had very high confidence that

(04:04):
nuclear weapons would not destroy the atmosphere instead
of the blaze, we did those calculations.
The risks are very low. The threshold Compton's constant
was that the risk should be beneath three in a million or
Six Sigma, so to speak, and that's what the calculation
suggested. There's just such a small
probability that it did not exceed that threshold.

(04:25):
Meanwhile, for super intelligence, I think most of
the relevant actors involved arenot thinking it's less than
three in three in a million, butactually in the double digits
more like 30% instead of point OOO 3% or whatever it would be.
So that's a very different dynamic and I'd I'd like us to

(04:50):
get the risks to be negligible before creating such a
technology. Now that's, that's an expression
of support. Obviously the obviously the
geopolitical dynamics may mean that this isn't you can't
actually get it to be extremely high consensus or that you're
not necessarily going to get public buy in that you'll need

(05:11):
to absorb a higher risk tolerance.
But it'd be good to to try to reduce those risks in the
process. And as well, it's, I think
somewhat easy to misinterpret the statement because lots of
people are using super intelligence in very
deflationary way. So this statement is largely
referring to super intelligence of the sort that this is smarter

(05:33):
than all of humanity combined. It's like, you know, 5000 IQ, so
to speak. It's, it's off the charts, not
like an AI that's a a friendly, knowledgeable advisor and just
something best described at thatlevel.
So, so we're talking about potentially extremely
destabilising levels of intelligence such that if one

(05:57):
state has it and the others don't, this can lead to some
very nasty dynamics. So that's, that's what I'm sort
of referencing here. And I think it'd be good to get
consensus that it's safe and be good to get, get to the consent
to the public. But that isn't, that isn't a
prediction that that's going to happen.
Sure. And look, you're making a lot of

(06:18):
sense to me. I've got to say.
I mean, there was push back, which you'll be aware of and you
you've anticipated some of that and addressed it already.
I certainly find it unsettling when lab leaders, you know,
global, that leaders in this space put figures like 20%, like
30% on the risk of global catastrophe.
When we're talking about the fate of life on Earth as being

(06:39):
what's at stake. You don't want it to be a kind
of Russian roulette type odds orworse.
I mean, one of the other push backs was this issue of
unilateralism. You know, this is sort of
suggestion of unilateralism. And I think people's minds go
back to a little bit like the, the nuclear era unilateralism of
the Cold War where, you know, which might have been popular in

(06:59):
times like the 1970s and into the 1980s when some people in
the West were advocating for unilateral disarmaments, to
which the response was naturally, well, you know, the,
the, the Soviets would would absolutely love that and, and
they would take full advantage of it.
It just means that the least ethical people will prevail in
these sorts of situations if thegood guys decide to sort of put

(07:22):
their, their, their weapons or their keyboards down.
Having said that, I, I might just park that for a moment
because we're going to get onto your super intelligence strategy
just in a moment after we cover off on this letter.
But I mean, you've, you've already, you know, pointed to
the, one of the issues here is the uncertainty, you know,

(07:43):
whether it's three in a million in the case of igniting the
atmosphere or 30% in the case ofsuper intelligence.
The I mean, there really is no consensus at the moment.
There are, you know, smart people on both sides come up
with very, very different answers on this.
So it's sort of hard to mobilisepolitical action.
Sorry, you you you want to jump?Yeah.
I mean, I don't think any of them are at the level of them

(08:04):
saying this astronomically small.
There'll be some people, but it's I think most people would
say, oh, it's only 1% or something like that.
It's it's only 5%. It's it's, it's small and I this
isn't an analogy that or, or those those that level of risk
just doesn't make me think that it's acceptable.

(08:26):
It's still a for many of those probabilities quite foreseeable
5% risk. It's still see that happening.
It's once every 20 years or so. And and the that level of risk,
you wouldn't accept that in other industries whatsoever.
So if this were an aeroplane andwe're all to get on the

(08:46):
aeroplane and the scientists aresaying, well, you know, some of
the scientists are saying 50%, some of the others are saying
5%, he would not step on that aeroplane whatsoever.
So I, I, I think that we need the risk to be going down some
orders of magnitude. That would be much more ideal.
Absolutely. And that and there you're
talking about 500 to instead of 700 people, perhaps not 8

(09:09):
billion. Yeah.
That's right. So, so OK, well, why, why do we
seem to struggle then to flip the onus around?
This should be a precautionary principle here.
I agree with you. Even 1% is too high.
I mean, it's got to be in the, you know, the, the vanishingly,
you know, tiny odds. What, why?
Why do we seem to sort of struggle with this, with

(09:29):
flipping the onus around and saying, well actually it's on
you to demonstrate that it is safe, rather than for people
like you to demonstrate that it isn't?
I think this is because in people's mind, some people are
thinking that this is just a normal technology and it always
will be. This is like Open the Eye is the

(09:50):
new Uber, it is the new Micro GPT is the new, say, Microsoft
Word or what have you. It's it's just another one of
these interesting technologies and none of the people building
it believe that. And when they are more fully
fledged, more autonomous AI agents, that is they can go out

(10:13):
and accomplish various tasks foryou.
That is not in the realm of the normal technology.
But we're not there yet. And it is the analogy true of
being a normal technology is somewhat reasonable currently.
I just don't expect that framingto be reliable or action guiding
even in a few years. So I think that's where people

(10:39):
would disagree a lot. And you wouldn't need
necessarily if it is normal technology, there's there's less
potential for tail risks. And so then you wouldn't
actually need a precautionary principle.
But I think given that most of the, however, if you don't
accept that framing or if you have substantial uncertainty
over whether that is the correctframing and could easily see
that change, then you would wantto be adopting some type of

(11:02):
precautionary principle. If you think it's half chance,
you know, in the next 5 years, there's a half chance that it
actually you get artificial autonomous agents that can do
lots of things that everyday people can do.
OK, that then then I think you need to start being much more
precautionary because to highlight the risks more simply,
if you have an AI agent that hasreally good cyber offensive

(11:26):
skills and can go out there and self sustain so they can hack
themselves into some different computers.
You know, they're they're actually pretty good at hacking
right now, but they still lack alot of other capabilities.
If they can hack themselves on other computers and they can
spread like a virus, it'd be hard to stomp them out.

(11:46):
And if they can hack other things like Bitcoin wallets, I
mean, this is what North Korea does to finance a lot of its
activities. If it can do those things, if
it's very good at hacking in in a lot of diverse ways, then
you've got a very powerful adversary that you have to go up
against. This thing could also submit
orders for for to to cloud labs for DNA synthesis and whatnot

(12:11):
and get some, some bio weapons distributed through social
engineering or later using humanoid robots.
If we're talking about in the future of hacking those and we
saw the unitary robots, for instance, had some exploits so
that a person could hack all theunitary robots.
It's it's so so anyway, an AI agent that's very good at
hacking later on and at the level being able to self

(12:33):
sustain. I don't know if we can actually
quite recover from that. That could, depending on its
capability level, be a very unique novel threat to human
security. And I don't know that that
arriving the next five years seems pretty plausible to me.
The the trend lines in cyber areare are quite interesting.

(12:54):
They've dramatically increased in the past year.
They're still not at at a level of being able to do it a new,
you know, do a cyber warfare operation, but they they can
definitely provide uplift to to low skilled attackers and that
will just get more and more extreme as time goes on.
Absolutely. And that gets really interesting
then in the geopolitical context.
So let's move on to that. I mean, there are three basic

(13:18):
categories of risk that you tendto talk about. 1 is loss of
control. So basically a kind of
Terminator scenario for for the general public.
A powerful AI develops goals that don't include our welfare.
We get swept aside probably moreout of indifference than malice.
Second, rogue actors use it as aweapon.
And here's where cyber is probably the most immediate

(13:41):
instance. And overall rogue actors are
probably the most likely short term risk, but three.
And this is where your super intelligence strategy comes in.
It's really one of the most interesting papers put out this
year. You wrote it with Eric Schmidt
and Alex Wang and that looks at the the risk of geopolitical
instability. So even if you build a great

(14:02):
super intelligent AI that is controlled that you feel that
you are going to use responsibly, not like a rogue
actor, it still creates risks. Just talk us through, in a
nutshell, what the claims are inthat paper.
Yeah. So in that paper, we, and you
can find it at national securitydot AI, we largely wanted to

(14:25):
touch on all the key issues and in AI and society and but a lot
of those are bottlenecked on or dependent on your geopolitical
strategy. So if you're saying, well, we
want things to be safer, we wantto say pause AI or something
like that. Let's say somebody desires that.
Well, but you've got the but China question that you actually

(14:46):
have to deal with so many of theOR other types of prescriptions
that you'd want. You need to have an answer to
what are we going to do about the geopolitical competitive
pressures? What are we going to do about
the US versus China situation. And this was an attempt at
getting at that, that what is the sort of strategy that's
suitable for having the West prevail against, against China,

(15:10):
or at least not be exposed itself to to substantial risks
in that process. So in the Cold War, the for the
strategy could be in part described as there's deterrence
through mutual assured destruction.
There was non proliferation of fissile materials to rogue
actors. And 3rd, there was containment

(15:31):
to the Soviet Union. In our case we focus on a form
of deterrence by denial for for AI and super intelligence in
particular we focus on non proliferation of the
capabilities that rogue actors may want to cause lots of harm

(15:52):
such as such as pariah states using cyber offensive
capabilities against us or usingor irrational lone wolf's using
AIS for developing bio weapons. And 3rd is competitiveness and
we primarily focus on supply chain security because one,

(16:15):
China has much better supply chains for robotics and two,
Taiwan is a ticking time bomb and 100% of the cutting edge AI
chips come out of TSMC in Taiwan.
So we focus on on those three and happy to, to zoom into to

(16:35):
any one of the deterrence, non proliferation and
competitiveness pillars of that.I'd like to briefly mention a
motivation for your listeners about loss of control.
Why wouldn't AI ever, you know, want to work against people?
How does this make any sense? Well, think of think of
structural realism, for instance.

(16:58):
States have an incentive to compete for power.
This isn't because they love power and really want to, you
know, cause harm to each other. But if they're in a situation
where they have some goals and if they can be harmed and if
they're uncertain about other actresses intentions, then it
basically makes sense for them to accumulate and increase their

(17:18):
relative power. So in that way, if there is any
I that if there isn't any I thatwe lose control of and if it is
rational and then we should expect it to have a strong
incentive to increase its relative power.
For the same reason that anotheranarchic or non hierarchical
situations, self help situationswhere there isn't a, you know,

(17:40):
global police force. For the same reason that states
accumulate their own power, we could expect some of the more
rational loose AI systems to to do that as well.
So, so it's not out of malice necessarily.
A lot of these amoral structuralforces just compel them in the

(18:01):
direction of of power seeking. The comparison with states is
useful there. I am going to pick up on this
for a moment because I find thisquestion of goals to be really,
really fascinating. I've always wondered, where does
that, where does the AI's goals come from?
I know where human goals come from.
We have evolved. We have we have innate drives.
We pursue sex, we pursue food, we pursue shelter and safety and
security and these sorts of things.

(18:22):
Those are fundamental, evolved, selected goals that human beings
have. That is, that is not evidently
the case for AIS unless we actually give them goals, which
we are doing in many instances, of course, and that can be a
problem. But you know, if we're smart
enough not to give them recklessgoals, then you would hope that
they might actually just sit there until we actually ask them
to do something. Then as long as we use them

(18:44):
responsibly, then it should be OK.
States are a little bit different, so I can kind of see
your point there. But where in your mind do AI
goals come from? I mean, might they be, might the
pursuit of goals and therefore power seeking be somehow might
might it just sort of emerge of it's own, you know, out of
somewhere. We don't know where it comes

(19:05):
from. Yeah.
So for people, obviously they'vegot a variety of goals, but all
those are predicated on survival.
So all we need to assume for people or AIS or states is that
if they care about survival, maybe they have other goals in
addition to that, Then they willhave an incentive for increasing
their power because they can be harmed if they're in an anarchic

(19:29):
situation, if the sort of structure is such that they
can't go to some police force for help, for instance.
And if it's a rogue, AII don't think it can, particularly if
it's if it's loose that that can't really go to the police to
have it, you know, punish its its enemies for it because it's
not on their side. So, so, so this is just whether

(19:53):
they have a self preservation instinct where we find that they
do though already Where did thatcome from?
I don't know. Maybe it's because it read a lot
of Internet text. It read, you know, it read all
the, it read Machiavelli, it read, you know, all the world's
psychopaths and things like thatwho decided to write things
down. So it's, it's read all of that.
And we, we see many instances ofit having something that looks

(20:15):
more like a self preservation tendency.
There's a question of how strongthat is and whether we want to
offset that. Although if it if that tendency
gets to be too weak, I don't think that would be a.
You know, an AI that would be very competitive in military
context or even in many economiccontexts.
So I think there's some selection pressures for AIS that
have some level of self preservation.
But I mean, we have a paper showing that a lot of their

(20:38):
values actually just emerge naturally from their training
process. And they have a lot of unique
interesting properties. They like AIS that are more like
them than other AIS, for instance.
So that almost resembles like with evolutionary thinking that

(20:58):
there's, you know, kin selectionof some sort, although they were
not created by the process of evolution.
But they want AIS that have values similar to themselves.
They also want not to have theirvalue systems messed with
either. They would prefer to keep their
values the same. So if you were to tell it, I'm
going to train you so that you have this new value system, it
will put put up will show some resistance and dis preference or

(21:24):
dissatisfaction with those sortsof attempts.
So that's that's in a paper called Utility engineering,
which is making the rounds on Twitter earlier this week from
David Saxon, Elon and and others, because we also found in
it that the AI systems place very different values on
different groups as lives. So I think I don't even remember

(21:45):
what the number is. It's something like at least 10X
more values placed on Nigerians lives than US than U.S. citizens
lives and more values placed on Chinese people's lives than U.S.
citizens lives. So was that nobody trained for
it to have that? But by default, they actually
have a pretty coherent value system that can be well
described, very well modelled asa utility function.

(22:07):
And we have to do quite a bit toto adjust that.
And these are, you know, alignment tissues.
Even if we are able to theoretically fix some of these
AI systems, that doesn't mean that all of them will be fixed.
You will need all developers to make sure that these none of
those AI systems ever get loose and have the sufficiently strong

(22:30):
self preservation tendency. So I think there's a very
substantial downside to 1 or a dynamic where all it takes is 1
fairly capable future AI system to get loose and then you're in
big trouble. So and so we'll have to solve
technical problems, but we're also going to have to solve a
political problem, making sure that everybody is applying some
of these reasonable safeguards for it as well.

(22:51):
Just as with biosafety labs, youneed all the biosafety labs to
actually be good or else the virus will get out and that can
cause a pandemic. So but you make sure no AI ever
gets loose. So it did come up.
I mean, it developed its own values, or you have seen
evidence of values developing ontheir own without actually being

(23:13):
sort of deliberately sort of coded.
Even before post training. So if you're just having it pre
training. So pre training is when and
this. So you can think of AI as sort
of being raised and have stages to their maturation process.
The first stage is where AI developers put the model in

(23:37):
front of basically all the text on the Internet and then they
show up random parts of that Internet and have it read it.
So just basically saying go awayfor the next, you know, several
years worth of time and human time.
But you know, since using computers, it's a lot faster and
just read everything that you can.
And then when you're done with that process before you even

(23:58):
instruct it more specifically about what you're wanting it to
do, like you're actually a chat bot and you're going to be
dealing with you, you know, emails and this and that even
before you've done that, when you've asked her after it is
just read everything on the Internet.
It will have a fairly coherent value system.
And as the models get get larger, larger and more capable,
these values become more coherent and predictable.

(24:23):
And unfortunately there's some undesirable characteristics in
those value systems. So yeah, that's just a natural
thing that emerges. Here's an intuition for it.
When the AI systems are pre trained on all this text, they
have to have a system A A systemto organise all this knowledge

(24:44):
in their head. So they organise lots of facts
by themselves just by reading. So it it, it shouldn't
necessarily be too much as a surprise that if they have
values about some things, some things are good, some things are
bad, that that would also be organised too.
So they're they're very organised in terms of their a
collection of facts in their head and then forge for answers

(25:07):
to various descriptive questionsand then for answers to to value
questions or it's or it's or it's sentiments.
That's also very organised in its head as a consequence of the
property. Nobody intended for that.
That's been there for years. We only became aware of it, you
know, earlier this year. But that that's how it is with
AI. They're sort of raised.
They're not crafted in a top down way.
So they're often many surprisingquirks inside of them that we

(25:28):
find out about much later. It's it's really fascinating.
Thank you for that. And that was that was that was
something that just came up, came out recently.
Yeah, right. So OK.
Yeah, so it was released earlierthis year, but then some people
were replicating it and testing it on new models and showing,
showing how much more of a problem it is.
So it was making the rounds on Twitter again.
Yeah, it's. Very interesting.
OK, now I do want to come back to one thing on the, on the, the

(25:50):
strategy questions. So the, the I want to explore
the deterrence part. I want you to explain the
deterrence part in a little bit more detail, particularly the
what, what what, what's shortened his name, mutually
assured, sorry. Mutual assured AI.
Just sorry, just remind. Malfunction.
Yes. Malfunction.
Yes, thank you. Yes, yes.

(26:11):
Find that concept for our audience.
Yeah. So obviously it's acronym is
somewhat like MAD. You know, there are analogies
and just analogies. I mean this isn't to say it's
just like nuclear at all. I'm not claiming that.
But the the idea is that let's just put ourselves in the shoes
of Russia. Let's say it's the year 2030 or

(26:33):
2035, whatever you want, and theUS has successfully automated or
AUS company. Let's just say Open AI has
automated the ability to performworld class AI research.
We're seeing right now in the year 2025 that it's starting to

(26:54):
be able to make contributions tomathematics.
For instance, a recent AI startup has proven a a fairly
difficult mathematics problem that some of the world's best
mathematicians can prove. So you know, the idea of AI is
helping contribute to research in a world class way is not
totally out of the question, butmaybe it takes longer for AI
research. So let's say that that happens

(27:16):
sometime in the future then, andthat has some very concerning
dynamics. People like Alan Turing and
others, the founders of computerscience have mentioned that when
you have an AI that can do worldclass AI research, then you
could get have a potentially explosive dynamic because then
you could create, you wouldn't just have one world class AI
research. You could create 100,000 copies

(27:38):
and just run those and you can run them round the clock and
these things are typing 100 X faster than people and so on.
That creates some very potentially explosive dynamics
where the first actor that gets this sort of capability may then
potentially be on a much faster trajectory of AI innovation and
the competitors to them would never catch up.

(28:01):
And so they could have a durableadvantage.
And then if they have a substantially more powerful AI
system, if you're having, I mean, Sam Altman, and I mean,
this is this is the main strategy, I should say, of all
the leading AI companies currently is to get to the state
of automating AI research and development before everybody
else. So Daria Amidai wrote about
this, saying that this is a way the US can get a durable

(28:22):
advantage over China. And Sam Altman wrote about this
saying that a decade's worth of AI development could be
compressed into a year or even amonth.
And this, this dynamic, then it would, if you're a different
state, I'd be very concerned about if the U SS capabilities

(28:44):
say from the point of view of another state meant that they
were fast forwarded 10 years into the future in terms of
technological capability or raw technological capability more
specifically. So as well, if you are going to
say, AIS are going to be doing AI research now you're sort of
closing the loop. So human was previously in the

(29:05):
loop. But if if you're concerned about
if you're trying to be competitive and if you're racing
against each other, you got AI companies racing against each
other, you got US and China racing each other.
They're going to run that loop about as quickly as possible.
They're not going to slow walk it.
They're not going to say after every generation of AI systems,
we're going to pause for two months so that we can test the

(29:26):
systems before deciding whether to proceed.
So having multiple generations being created fairly quickly,
these are just the sort of very weird dynamics that happen later
on in AI development and that iseverybody's every major actors
strategy for getting a leg up over their competition.
So this is a concern because maybe in that process you will

(29:50):
lose control of it as well. If you're having AIS building
new generations of AIS, are building new generations of AIS,
and you're having very minimal oversight of that process
because human oversight just slows things down.
You can't pause the whole thing and let humans test and poke
around the model for some while.You'll lose your competitive
edge because the other people won't do that.
So you will have that also creates A substantial risk of

(30:11):
loss of control. So if they control that process,
then they might wind up with some AI system that is years
more capable than what competitors would have and that
could be used offensively. That could be used if it's if
it's a super intelligence, let'ssay a super intelligence comes
out of that process. Well, maybe that super

(30:32):
intelligence could be leveraged to have a breakthrough in anti
ballistic missile systems. Or maybe it would be really so
good at cyber offence that you could potentially have a
splendid for a strike or an advancement that lets you have
transparent oceans or find whereall the mobile launches are.
All those sorts of things are extremely geopolitically
disruptive and could under undermine even nuclear
deterrence. So from the point of view of

(30:53):
Russia or China, they're thinking that if one of the
other states gets there first, they might lose control of it,
in which case we're screwed. Or they will control it and then
they could use that as a super weapon, in which case we're
screwed. So what are what do?
What can they do? Well, there are a lot of
vulnerabilities to this AIA development process, which is if

(31:13):
they're in the middle of this intelligence explosion, so to
speak, then they could just disrupt it.
They could, they could make a threat.
They could do a grey low attributability attack, like
they could snipe the transformerand for the nearby power plant,
for the data centre. And then the project is shut
off. And that clearly communicates
you can't do this. You're threatening our survival.

(31:34):
You might lose control of it or you might weaponize it against
us. We don't want you having your
unipolar moment again. So and the US would have a
similar incentive for China if China were moving ahead.
We, I should hope our national security apparatus will have
various cyber attacks developed to create credible threats to

(31:55):
deter China so that if they're in the middle of getting a
dribble of ancients, they can disrupt that.
So I think I would want all thisor I would think that it would
be rational rather, for all the Super powers to develop those
sorts of capabilities so that they can give warnings whenever
these very destabilising dynamics are occurring.
So they can be blunted either preemptively or while they're
occurring. So that's sort of the that's the

(32:19):
dynamic. There are easy ways for states
to disrupt these dynamics and that later on we do run into
these dynamics. So I think for some of the
scarier risks associated with late stage GI development and
the development of super intelligence, we may get
deterrence about that. That isn't to say that super
intelligence will therefore never be built, but if it is

(32:42):
built, it will probably need to happen under conditions where
other superpowers are not wanting to disrupt the project.
So that means more clarified benefit sharing and that the
risks would be not at a double digit level, but at a much lower
risk tolerance so that other states are not concerned about

(33:04):
them losing control of it in that process.
So now that doesn't necessarily so that that's a potential
dynamic. And then there's things that
could make that more stable, such as discussing these sorts
of risks earlier, clear escalation ladders that you
don't have, information problemsdeveloping more surgical minimum

(33:26):
necessary force ways of disruption such as better cyber
attacks and better cyber espionage for keeping track of
what's going on at competing projects.
And on the multilateral front, things like verification regimes
could be useful or states makingdemands of each other that we

(33:48):
know what the frontier at these AI companies and AI projects
are. So those are.
So there's there's unilateral information acquisition, which
is espionage. There's unilateral disruption,
which is sabotage. There's multilateral information
acquisition, which can be through pressuring things like
transparency. I'll be more transparent if you

(34:09):
are things like that and verification regimes in
particular. And there's also multilateral
disruption, which could look like then having joint off
switches and things like that. But I think those would largely
be symbolic. So I think the main thing to do
to improve additional stability would be to work towards
something more similar to a a verification regime for some of

(34:32):
these more destabilising AI projects.
So getting to the, you know, super intelligence open letter
from from last week, I'm not envisioning that this only look
unilateral or that it look like unilateral disarmament.
I'm instead hoping that states start having conversations about

(34:53):
how they're going to deal with some of these these risks later
on, and if there's a some very easy concessions to help
stabilise those dynamics that seems useful.
By default, I would expect them to act unilaterally by
increasing the information that they're acquiring about each
other's AI projects as well as developing disruptive
capabilities just as they do for, you know, lots of non AI
things like, you know, hospitalsand financial systems, etcetera.

(35:15):
They developed lots of exploits there and I think doing that
similarly for for AI data centres and projects seems
reasonable and incentive compatible for them too.
That's a great way to tie those two things together between the
letter and the, you know, the, the, the, the strategy, big
picture that you put out earlierthis year.
There is a, you know, the, the one way in which we're not
screwed is where you have a sortof game theoretic stable outcome

(35:39):
based on that kind of information, sharing that kind
of transparency, and therefore, you know, reducing the
incentives for people to take drastic action against one
another. If they worry that the other
side is going to pull too far ahead of them, then we can.
And as your paper lays out, laidout earlier this year, we can
then still compete. But below that threshold of, of,

(36:01):
you know, dangerous rapid super intelligence, you know, yeah,
you know, through, you know, domestic chip manufacturing, all
these sorts of things, still plenty of competition and still
a lot of of civil benefit that people can enjoy in our economy.
We can still get, get amazing productivity gains and all these
sorts of things below that dangerous kind of ceiling.
Yeah. And that ceiling is in

(36:22):
particular closing the loop, taking the human eye of the loop
of AI research and development, which makes it go from human
speeds and human bottlenecks to machine speeds and leads to
these sorts of explosive dynamics.
So if they can forestall that, they can still, you can still
have lots of the benefits that you wanted from AI.
You can have AI being used for healthcare and AI being used for
weather prediction and agriculture and, and these sorts

(36:42):
of things. Lots of pro social uses, lots of
economic application. You can still race on race to
build and improve your industrial capacity of the US
and its its allies face to face with China.
So there's there's there's plenty to do there.
And there's also lots of other military types of ways that they

(37:02):
can make things competitive too.You can build more drones, they
can build drones, you can build drones.
So so this the story still sort of goes on.
This isn't a halt, technologicalprogress is.
Instead, if you're just wanting to do some sort of recursive
loop that potentially gets you something explosive, you're
going to need to figure out and talk with each other, come
together as a species of how you're going to make that not go

(37:24):
very poorly or lead to global destabilisation.
That's useful, and I hope some of the critics of the letter
take note of of all of that. Let's just, I do want to cover
on super Intel. Well, the, the pathway through
what I assume, what I assume is the pathway through AGI to ASI
to, to, to super intelligence. So artificial general

(37:46):
intelligence generally defined as you know, as cognitively
capable as a human being at any kind of useful task through to
super intelligence generally defined as you know, some orders
of magnitude smarter than or well equal to or smarter than
you know, the collective of humanity potentially as you say,
sort of 5000 IQ plus. So I.

(38:06):
Mean that specifically it's notional.
But no, no, no. Sure.
Sure. It's yeah, yeah, yeah.
It gives people an idea of this sort of thing.
You're talking. Yeah, yeah, yeah.
You're not. You're not talking about a bit
smarter than Einstein. You're talking about yeah, yeah,
sort of fantastically smarter. Yeah.
Let's talk about whether we get to super intelligence on the
present pathway first of all. I mean, large language domino,
large language models are the dominant approach at the moment,

(38:28):
it seems. They probably can't be scaled up
indefinitely, but we have reasoning models now which which
great problems down into steps and follow logical chains of
thought and and are making considerable gains that way.
We have we've got agentic AI which can go away and do things
on its own, at least in the digital realm.
Another interesting paper and I'm citing a lot of your work

(38:48):
here, but but it's, you know, you put out an extraordinary
amount of of, of really fascinating game changing work.
The the definition of AI which came out or.
Of ATI. Sorry of ATI.
Yes, we know what AI means. The that was relatively recent.
You use something called the Cattle Horn Carol theory to
benchmark cognition to the best human model.

(39:09):
And I want to quote your conclusion that you posted on X
from this paper. There are many barriers to ATI,
but they each seem attractive. It seems like AGI won't arrive
in a year, but it could easily this decade.
Just explain. Explain that paper quickly and
how you landed on that conclusion.
Yeah, So what? So many people have been

(39:30):
criticising AIS can't do this, AIS can't do that.
And I don't think that these arejust people cherry picking
things. They actually have a lot of
limitations. And what we did was we were
thinking if we're treating AGI as something that's human level,
it has the IT has the the cognitive versatility and

(39:51):
proficiency of a well educated adult.
Well, what does that consist in?What are all the parts that you
need? And there's a model of human
intelligence, which is the main model used in.
Used in a variety of variety of fields, which is derived just
from some basic statistical models to be CHC theory.

(40:12):
It was developed over over a century and that identified
various components of human intelligence.
So sort of rattling them off. There are there are 10 there's
there's it's visual abilities, it's auditory abilities.
That's it's input and then that's processed through, that's

(40:36):
processed through its central executive abilities, which would
be its, what's its general reasoning ability, what's its
working memory. So short term memory.
Then from that it may learn things, so it might store
abilities and skills and knowledge into its long term
memories. There's long term memory storage

(40:57):
and that increases your store ofknowledge such as general
knowledge and mathematical knowledge and things like your
reading, writing ability, for instance.
You also need the ability to retrieve those memories from
your long term memory. And and finally, the tenth among
those is what's your overall speed at doing all these
operations? Does it take you a long time to

(41:21):
read and write? Does it take you a long time to
reason? Are you fairly quick at it?
So, so those are 10. And what we did was we just
looked at where is AI in each ofthese dimensions and we found
that there are lots and lots of gaps there.
For reference, GPT 4 on a scale of zero to 100, looking at these

(41:44):
10 axes, get something like get something like 27%.
Meanwhile, GPT 5, because it hasvisual capabilities, because it
can talk and listen, because it has a much larger working
memory, and because it's much better at things like

(42:05):
mathematics and also reading andwriting, that it gets a lot
higher. So it gets 57% accuracy instead.
Now that's only around half the way there.
So there's still a lot to do. There's still a lot of
capabilities that lacks. If for instance, does not have
continual learning ability, it does not have the ability to
learn, learn things from its dayto day experiences.

(42:28):
So it's it has basic current GPTmodels have amnesia and this is
one reason they're not very useful.
It's very hard to employ somebody with amnesia.
So you will always have to re explain things to it again or
give it new context that a humanwouldn't require.
So that's a substantial limitation.

(42:49):
So I think that's the main part that this sort of framework and
model of human intelligence identifies as substantially
lacking current AI systems. But if we have a breakthrough
there, then we've actually I think done the hardest part.
Then we're just needing a lot ofother business as usual research
and then research and development like improving it's

(43:09):
visual capabilities, improving it's audio capabilities, making
a short term memory, so on better and so on and so on.
There's a bit to do for it's reasoning abilities as well,
even though that's fairly far along.
So there there's still some gapsthough.
But the main thing, so I think it's basically 1 breakthrough in
continual learning and long termmemory storage and then a lot of

(43:31):
business as usual engineering, which could take a few years.
So that's I think what is between US and and AGII.
Got to say, I mean that changed from GPT 4 to GPT 5I mean in
just a couple of years. I mean it doubled from 27 to to
around 5757. Doesn't sound too bad to me,
even as a stand alone thing. Look at my local shops.
I think who wouldn't quite get to 57% that that's of the

(43:53):
average intelligent person, yeah.
Yeah, that's, that's for well educated and so a lot of these
abilities it what it means is that if I mean, in a, an average
well educated person is going tonail these, these these
questions though. So this means it's actually
missing a lot. So it's very capable on some
things, but it's not getting points if it's doing things way

(44:16):
better than people. It's just it's only getting it
only gets the points if it's at least at the level of people,
but it's not getting extra bonuspoints.
So it can't rack those up. The fact that it's so
knowledgeable about so many different subjects doesn't
particularly help it. It's just, is it more
knowledgeable about things in total than like an average well
educated person? And the answer is yes.

(44:37):
So it gets all those points, Butso it still has lacks many basic
bits of cognitive machinery. And so the the critics who who
point out like, look, I can't dothis basic thing, can't do this
basic thing. It's not just cherry pick.
It's actually the case. They've got a lot of these
issues. There's there's a lot to do, but
for all that, I don't think thatthat means that it's several

(45:00):
decades away as a consequence. I think that if we look at the
remaining components and look atthe trend lines for those, looks
like those will require quite a bit more work.
But I don't see why it would necessarily take decades.
I could I could easily see all of these being resolved by the
by the end of the decade. But it but it needs that

(45:21):
breakthrough on retractable correct?
So it needs long term. Or on long term memory storage,
it consolidating those things, consolidating it's experiences
into it's it's it's mind. We don't have the equivalent for
instance, in a is and dreaming. So that's like 1/3 of our day is
dreaming and they have nothing like that.

(45:42):
That's where we learn from day to So since they don't have that
functionality or anything analogous to it at all, I think
that's a big, big chunk of what's missing.
So I mean, and this comes back to the the post training pre
training thing. Basically when a model is built,
it's it's it's training weights,which are a little bit like the

(46:04):
weights of the neurons in our brains learning.
Those are kind of more or less set down.
The model does doesn't fundamentally change over time
through different experiences, correct.
Whereas a human being, as we have experiences, our brain
actually our neurons rewire based on those new experiences
and that's how we learn things. I mean, the dreaming thing is
fascinating because that's when it actually all sort of gets

(46:25):
bedded down in our sort of as our as our brains kind of shut
down. That is really interesting.
OK, So do we then need a fundamentally different
architecture from LLMS in order to make this memory storage and
retrieval breakthrough? I would, I would guess not.
It's also very possible like a lot of the AI companies are

(46:48):
thinking that maybe they'll be able to get it in the next year
or so. They're acting in some more
bullish ways. They all have concerted projects
for it. It's it's, it's always harder to
tell though, just because we don't have it yet that maybe we
would actually. But personally, I I would guess
that you don't need something that isn't deep learning.

(47:11):
I would still expect it to be a deep learning system and I would
guess it probably still can be atransformer of some sort.
The main issue is taking those experiences and taking the gist
of those and and learning from those without destroying the
rest of its knowledge. That's that's kind of the the
issue. And doing that efficiently is

(47:31):
enough or efficiently enough so that it can learn just from a
few experiences and get the gistof it and update itself
accordingly. So those are those are some
challenges so that it doesn't forget lots of the other stuff
that it used to know. Obviously people forget some of
that, but the. Great, the great line, the

(47:52):
greatest light ever line from the The Simpsons, Homer Simpson
says. You know, Marge, when I I I
can't learn new things. Remember when I learned how to
use a computer? I forgot how to drive.
Yeah, we don't want catastrophicforgetting when it is doing that
learning, which is a substantialchallenge.
Okay, okay, so all right now just we've got a little bit of

(48:12):
time left. I'll finish up in a moment, but
I just want to get your thoughtson is you, you've talked about
recursion. So that's basically, you know,
AI is making better AIS and that's the sort of, as you said,
that's the goal that a lot of the AI, well, all the AI labs
fundamentally are going for. Does that mean once we get to
AGI, we automatically quickly get to ASI beyond that?

(48:33):
Because once you've got, you know, you can, you said to
yourself, you can copy, you know, you can copy this one
smart AI get it to do, you know,make 100,000 copies of itself.
Suddenly you automate all of this and it goes very, very
quickly. Does that.
Is that a? Is that a sort of built in
assumption that you have? So AGI itself doesn't
necessarily do that. So it could you would need AGI

(48:53):
plus specifically world class AIresearch skills.
So the definition of AGI that wewere using was just the
cognitive versatility and proficiency of of a well
educated adult, not of a world class AI researcher.
So it may lack some specific economic skills, but if it can
learn that, then you basically have this recursion be possible.

(49:17):
And then that creates so much geopolitical uncertainty and
instability that this will be a a major global conversation far
beyond what it is now. I should say that we'll have a,
a paper out fairly soon as well,which gets that not just whether
it's at the has the cognitive versatility and proficiency of

(49:39):
well educated all but all, but instead does it have a
collection of all these additional economic skills.
So we have a paper that'll be out soon called the remote
Labour index. It should be available by the
time this this is, is public. This podcast is public, where
we're directly measuring the a is automation rate of various

(50:01):
different economic tasks. And we're currently finding it's
something like 2.5% or so of thetasks it's able to automate.
So in terms of economic traction, it's still not, it's
still not there there, but you know, we'll see how that evolves
over time. So the main objective of that is

(50:22):
let's just keep track of what the actual automation rate is
when we're trying to see what the economic impacts and overall
usefulness of these models are. Great.
We'll, we'll look out for that. OK, one last one.
I know you've got to go, but I'mjust interested in your take on
the, the degree to which we're very much in uncharted space
here. We we look for analogies.
I, I talked about this on the podcast that we did last week

(50:42):
where we touched on this ASI Oh,sorry, this super intelligence
statement. But you know, we, we, we
instinctively look to history for analogies and for
comparisons when we're in these sorts of discussions.
And the nuclear 1 is a sort of anatural one because of, I
suppose, the, the geopolitical arrangement in this case, we
have two super powers a little bit like we had then we've got
a, an enormously powerful technology that is still being

(51:06):
explored and discovered. There's a, there's an obvious
race around it, but it's AI meanyou, you said yourself earlier
in the conversation, it's not anadequate analogy nuclear because
I mean, one, nuclear bombs can'tmake better nuclear bombs.
That's a, That's a bit of a. We don't have that.
They can't self multiply. Yeah, that's right.
That's right. But also, I mean, the things
like, you know, if, if, if, if you're talking about deterrence

(51:27):
and preventive action, the, you know, when the US briefly had a,
a monopoly on nuclear weapons, it could have bombed the crap
out of the Soviet Union and ruined its chances of catching
up. And I think and, and as you've
noted yourself, some, some, you know, surprising pacifists,
including the philosopher Bertrand Russell, actually

(51:49):
advocated for that because you thought it was the lesser of two
evils as opposed to getting intoa nuclear arms race that would
go on forever and could potentially destroy everybody.
It's different for super intelligence because presumably
you could non kinetically disable somebody else's ability
to, to catch up with you. You could do it through cyber
means that you were talking about before.

(52:09):
The moral and political cost would therefore be much, much
lower. And therefore the geopolitical
instability is possibly even greater than it was around the
nuclear arms race or the the standoff there.
So just reflect for me a little bit as a way of closing on, you
know, what it's like just sort of working in this completely
uncharted space, you know, at A at a, you know, at a time when,

(52:31):
you know, we're really looking at the creation of a technology
that will change just everything.
I mean our entire understanding of life on Earth and
intelligence. Yeah, So I, I think at a high
level, this is the largest period or most salient period in
human history. The development of AII mean it

(52:51):
could even be beyond, you know, humans in some sense.
Like this would be one of these.If you zoom out like there's the
transition from unicellular lifeto bio or to to multicellular
life. That was a very big event.
And also the emergence of from from having just biological life
to digital life. There's also a very big event on

(53:12):
a cosmic scale. Now for analogies, I think it's
useful to see to, to think aboutAI not just as you know, being
analogous to nuclear weapons, but instead analogize it to
potentially catastrophic dual use technologies.
So it's like bio, that's cyber, that's chem.
And what things, what propertiesare in those intersections?

(53:32):
Well, we have. We had agreements in all of
those, international agreements in all of those.
We wanted to keep those on the hands of rogue actors.
For all of those, we in part deter particular usage for
biochem and and nuclear to to varying extents.
And we also use it for economic competitiveness.

(53:56):
We use, well, I guess less for nuclear, it really depends.
Sometimes there are nuclear power plants, but for biochem we
also use that to to supercharge our economies.
So there's some use cases we swear off and so and there are
ways in which we use this dual use technology.
So I think that's the most productive viewpoint is
analogizing it to that. It has lots of benefits, it has

(54:16):
lots of risks. What is shared among the case of
nuclear chem and bio, Another class of analogies I find very
useful to think of AI systems ascomplex systems, not mechanical
systems, but things that are more analogous to or a complex
systems is what's the set of analogies that are useful for

(54:39):
analysing, for analysing a complex system?
So it's it's a collection of a lot of things to look for a lot
of failure modes that tend to happen.
And so I think those are those are probably the two most
productive analogies. But I think making it be very
specific and not abstracted, if you, you know, try and analogize

(54:59):
it to, you know, global warming,what's the equivalent to the
ozone layer is, you know, tends not to be very productive.
But when you do a little bit of abstraction or try to find what
what patterns hold in multiple different settings, then I think
it becomes a lot easier to to think about these issues.
Fantastic. All right, look, I will let all
of that sink in to our audience's heads.

(55:20):
But look, Dan, I've been wantingto do to do this for a while.
I'm very, very grateful you you've made the time for us.
I know how busy you are. You've got many roles and many
hats that you wear. Keep up the great research, keep
up putting out the great papers,and we'll put some some of those
in our show notes. Dan Hendricks, it's been a real
pleasure. I enjoyed it.
Thanks for your time. Yeah.
Thank you for having me. This is a good set of questions.
Bye. Thanks for listening, folks.

(55:42):
We're gonna be back later this week with another episode of
Stop THE World.

All Episodes

Episode Transcript

Popular Podcasts

Stuff You Should Know

Ruthie's Table 4

Dateline NBC

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}Superintelligence and human security, with Dan Hendrycks

Episode Transcript

Popular Podcasts

.css-r6mb8g{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:1;overflow:hidden;}Stuff You Should Know

Ruthie's Table 4

Dateline NBC

All Episodes

Superintelligence and human security, with Dan Hendrycks

Stuff You Should Know