All Episodes

October 25, 2023 27 mins

Dario Amodei, former head of safety at OpenAI and now CEO of Anthropic, explores with Azeem Azhar what it might take to build safe AI. 

See omnystudio.com/listener for privacy information.

Mark as Played
Transcript

Episode Transcript

Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Speaker 1 (00:03):
In a few years, we will use helpful and harmless
AI systems. That's the premise of today's conversation. I'm Asimazar.
Welcome to the Exponentially podcast. Today's AI models are complex,
with hundreds of billions of virtual moving parts. We don't
so much as build them as nurture and nudge them.

(00:25):
As this technology improves exponentially, how can we trust it?
Can we really design these systems to be harmless and
honest as well as helpful. I've come to San Francisco,
the epicenter of the AI revolution, to talk to a
man who has staked a lot on being able to
do just that. Dario Amaday, the founder and CEO of Anthropic.

Speaker 2 (00:52):
Well.

Speaker 1 (00:53):
Darius, it's wonderful to have you here. You're a bona
fide researcher with papers on AI and AI safety that
have been cited more than thirty thousand times in just
the last seven years. But you are at the epicenter
of an enormous explosion in the field of AI today.
What does that feel like?

Speaker 2 (01:11):
It feels like a mixture of excitement and concern at
how fast things are going. I generally alternate between the two.
You know, on one hand, there's something new and exciting
every day that comes from us or that comes from
one of the other many players in the space. I
always look at things and I say, Wow, this is
so cool, this could be so useful, And then I

(01:31):
look at the other side of it and I'm going,
this is all happening so fast that it's hard for
us to adapt. It's hard for me running a company
that makes these things, to keep up with all the
innovations that we've done, even within the company.

Speaker 3 (01:43):
I totally concur with you.

Speaker 1 (01:45):
I've been in the tech industry since the early nineties.
I've been through the dot com, bubble, mobile, social Nothing
has been as significant as this.

Speaker 2 (01:55):
It runs the gamut all the way from this heiny
detail and in the computer code to you know, well,
what does that mean for the way the model interacts
in this particular use case?

Speaker 1 (02:06):
And also given the interest in AI today, what will
this mean for truth on the internet? What would it
mean for jobs for white collar and office workers? What
would it mean for national productivity in national competition? I mean,
these are all questions that people are asking and that
they're turning to the AI developers in a sense for
those answers.

Speaker 2 (02:26):
Yeah, I think the multi facodid nature of the technology
the generality means that on one hand, there's this almost
endless set of possible positive applications to the technology, but
also when you go to list what are the concerns
with the technology, that list is also very long.

Speaker 1 (02:43):
There's this challenge that we face though, because the technologies
are they're accelerating away this curve that we're all familiar
with now, the exponential curve. But the way that human
dynamics work, human institutions, the way that our laws work,
our families work, the relationships we have in school and
at work, they move much slower, they move at a
more traditional pace, and there is a gap that is emerging.

Speaker 3 (03:07):
Does that worry you?

Speaker 2 (03:08):
Yes, I think that's a good way of describing it.
We're pouring exponentially more compute into these systems, we're technically
able to do it, and we're getting better and better
performance when we do that. But then when we look
at what does that mean for society, for disruptions to business,
disruptions to economic and governmental structures, it's happening faster than

(03:28):
we can adapt, and so I think on the technical side,
we need to do more to try and control measure
steer these models more so than we're able to do today.
And I think on kind of the business and legal
and regulatory side, we need to find ways for societal
institutions to adapt faster to the changing technology.

Speaker 1 (03:48):
AI is one of those terms where it means so
many different things to different people. So what does AI
mean to you when you say you want to build
an AI system?

Speaker 2 (03:56):
What do you think of the systems that we primarily
work on building our large language models, which are systems
that you can talk to and they talk back, and
they can perform tasks for you. They can program, they
can answer questions about legal matters, medical matters, and any
number of topics. So our model Claude is an example
of this.

Speaker 1 (04:15):
Well, tell me about Claude, though, because I heard the
name and it's a really cute name, and a friend
of mine from school had a fluffy skunk that was
called Claude, so it always has this sweet association for me.

Speaker 2 (04:26):
I think we just wanted a name that sounds like
it was a friendly assistant or someone that would help you.
The term we use is helpful, honest, and harmless. So
what can I reasonably ask of someone that I'm asking
for help on something. I want them to be helpful
in the task. I don't want them to do anything
dangerous or harmful, and I don't want them to mislead me.
I want them to be honest. And if someone manages

(04:46):
to do those three things, then you know, I generally
feel like they've done a good job being an assistant.

Speaker 1 (04:51):
I want to bring that back to your definition of
what an AI system is. Is it a system that
exhibits those types of human personality characteristics, or is it
something a little bit different?

Speaker 2 (05:03):
Like the overall definition of AI right for the whole field,
can be any system that performs any intelligent or pattern
matching task. So it's possible to build AIS with all
kinds of different properties. But I think our vision and
our picture is that we want to build these systems
to be helpful, honest, and harmless, and if we can
achieve those consistently, then systems will be beneficial to society.

Speaker 1 (05:32):
So I've used Claude a little bit. I'll let you
in on a secret. I'd use it to help me
with my research for the interviews that I do. What
are the kind of use cases that you would like
people to be using the technology for now.

Speaker 2 (05:44):
I think on the helpful side, people often find that
Claude is more friendly and creative than other models, So
that's the helpful side. I think honest and harmless are
often connected to some business use cases that we think
are important. So by harmless we mean that we don't
want Claude to be willing to kind of engage in

(06:06):
aiding with dangerous or legal activities. We don't want Claude
to have prejudices or biases in either direction. Really, right,
if I present a model that you know it serves
as a lawyer or serves in some medical function, it's
very important to the model be you know, for a
human we would call it neutral and professional. From the model,
we call it harmless. One problem that models often have,

(06:28):
and I'll be honest, every model still this is unsolved
problem every model has to some extent is what we
call the hallucination problem.

Speaker 1 (06:34):
Right, So I find sometimes if I ask these systems
to give my biography, it'll switch my university, and then
it'll switch the first place I worked.

Speaker 3 (06:44):
It still looks really credible, but it's wrong.

Speaker 2 (06:46):
Yeah, this is the insidious nature of kind of the
imperfect systems. Right where you know the nightmare is, you know,
you ask the system for as seut of ten facts,
and all of it sounds professional and credible. Nine of
the facts are right, and one of them is wrong
in some very very important way. Making it good enough
so that you can really trust it is one of

(07:06):
our top priorities an Entropic. We have a significant probably
about a quarter of the team Atanthropic focuses on it.
But still no one is perfect. We, like everyone else,
are still to some extent, plagued by this problem.

Speaker 1 (07:18):
But this problem exists because of the way that these
large language models are structured. It's the way that they
I think we don't even say that they're built. They're
sort of grown in a funny sort of way. I
think that's something.

Speaker 2 (07:31):
Baked like a cake or something right like just stated.

Speaker 1 (07:33):
They're not built like scaffolding is built, or built like
a car engine is built, where you assemble component after component.

Speaker 2 (07:40):
No, there's in fact two stages of the training. So
in the first stage you just train the model on
a huge amount of text, like a huge amount of
the text on the internet.

Speaker 3 (07:51):
But it's billions of words.

Speaker 2 (07:53):
It's some large fraction of what's available, and literally we
just train the model to be good at predicting the
next word in the sentence, predicting each word in the
sentence after one another. So the model learns a lot
about the world when you do this, But honestly, one
thing it doesn't learn is that it shouldn't make things up.
It's basically trying to predict what would be plausible if

(08:13):
it came next, not necessarily what is true. So then
there's a second stage of training done in different ways.
For example, the state of the art in the field
is a method called r L from reinforcement learning from
human feedback.

Speaker 1 (08:25):
It's a little bit like how I might train my
young puppy. You give it rewards when it does well,
and you may treat it slightly differently if it doesn't
behave correctly.

Speaker 3 (08:36):
Is it like that or is it more sophisticated?

Speaker 2 (08:38):
Yea, it's actually quite a lot like that, where instead
of the owner speaking to the puppy, you just have
a human rate how all the models are doing.

Speaker 1 (08:46):
And who is the you in that you teach it?
Is it people like you?

Speaker 3 (08:51):
People like me?

Speaker 2 (08:51):
Yeah? To get a little bit into the details in
the state of the art method, which I said, r
L from human feedback. In that method, some number of
people will be hired. Usually it's contractors who looks at
the model and says, okay, I saw these two responses.
This one is better than that one. One of the
reasons why we invented constitutional AI is that's fairly opaque.

(09:13):
If someone says, you know, I might ask my model
say a political question, right, and it expresses an opinion,
and someone gets mad, they say, why does the model
have this opinion? Is that that opinion? All you can
really say with ROLHF is okay, Well that was the
average opinion of the thousand contractors that I hired, which
is not very satisfying.

Speaker 1 (09:32):
The other thing that strikes me about that approach is
that your models billions and billions and billions of words
in it, and it can throw out billions, umpteen billions
of different sentences. So that's a lot of stuff for
humans to look at. I mean, are there even enough
humans to give feedback?

Speaker 2 (09:51):
The big first stage of training does involve billions of words,
but actually the second stage of training typically it's maybe
I don't know, a thousand hum humans, each of whom
gives a thousand ratings over a few days. Or something. Wow,
So that is hundreds of billions and then millions. It's
very difficult conceptually, but it actually doesn't take all that
much data.

Speaker 1 (10:12):
But you've moved on from this URLHF reinforcement learning with
human feedback to constitutional AI, which introduces a second AI
system to help train the first one.

Speaker 2 (10:24):
Yes, So basically, in constitutional AI, you write a constitution
which could be anywhere between one page and ten pages,
and it basically states the rules that the AI system
could follow.

Speaker 3 (10:36):
What are the things that are in your constitution for Claude.

Speaker 2 (10:39):
It's evolved over time, but you know, from the beginning,
I think we started with some things from like the
UN Charter of Human Rights, things that are hard to
disagree with, and then we added some things about Claude
being responsive to the user and various things. There are
various kinds of harms that we were particularly concerned with,
kinds of information that are dangerous or.

Speaker 1 (10:59):
Really goal But how can you measure then whether Claude
is behaving as you have trained it.

Speaker 2 (11:06):
That's actually a very difficult and subtle problem, right because
I think one of the things about these models is
that they're incredibly broad. One of the things I've said
is you know, often a model might know something, or
not know something, or have an opinion on something, and
you don't necessarily know about it until a million people
have used it or something. To be clear, I think
this is a bad thing. We shouldn't have to deploy

(11:29):
the model to a million people to discover that it
happens to be an expert on some particular type of
weapons that I would rather not talk about that.

Speaker 3 (11:37):
Yes.

Speaker 2 (11:37):
Another example is I don't know the first thing about cricket,
but Claude is an expert on cricket. Claude is also
an expert on Japanese history. I don't know the first
thing about Japanese.

Speaker 3 (11:45):
I can help you with one of those two history.

Speaker 2 (11:48):
Yeah, and so one of our main areas of research
is trying to detect ahead of time all the things
that the model is capable of. So it's this very
but ended problem. Then we're constantly trying to build up
kind of evaluations and standards for measuring our model.

Speaker 1 (12:06):
Software and engineering has been very deterministic. Yes, you buy
a hammer, you know what a hammer does, ye, clunk.

Speaker 3 (12:14):
You get a piece of.

Speaker 1 (12:15):
Software like the calculator on your smartphone. It calculates and
will always give you the same results. And the words
that you use is that you're working with the model
so that it doesn't do things that you would rather
it not do. A bit like a kind uncle talking
to their slightly difficult nephew. Are you translating technical language

(12:36):
into normal english from by benefit or is this process
one of rather's and maybe's and would be betters.

Speaker 2 (12:44):
So when you go to train the system, right, you
know it requires thousands of computer chips all working in sync.
There's an incredible precision to the engineering. You know exactly
what you're making, you know exactly what data is going
into it, you know exactly how much it costs per hour.
It has all the hallmarks of precision engineering, same as
making the semiconductor chip, but on the output it has

(13:06):
exactly the properties that you talk about. It's much more
of an art than a science when you look past
the form and the container into which you're pouring things.
The pouring process is very predictable, but what you get
out at the other end is very inherently hard to predict.
And we're trying to turn it into more of a science,
but it's not inherently so it doesn't start that way,

(13:26):
that's a problem for us to solve.

Speaker 1 (13:28):
I think of this analogy of the first stereo system
that we had at home, and it had a base
dial and a treble dial. It had two dials that
you could use to adjust the sound. And when I
look at these large language models, they have ten billion,
one hundred billion, five hundred billion dials you guys call
them parameters. Does the fuzziness come out of that complexity?

Speaker 2 (13:50):
Yeah? I think it comes out of that complexity. And
we're not manually turning each of the.

Speaker 3 (13:55):
Dials, right, it's tedious.

Speaker 2 (13:57):
We have an automated process that kind of sides when
any dials should be turned in how much, based on
the data that it receives.

Speaker 1 (14:04):
So a lot of people have said over the last
five or six years, the problem with neural networks and
a large language model is a type of neural network
is that they are black boxes. And the point being
that you can't look into them and see what the
process is, and the same way you can't look into
my brain at the moment, not without hurting me anyway,
and see what the process is. So you're developing methods

(14:26):
of peering into that black box. You're developing the instruments
and the tools to do that.

Speaker 2 (14:30):
Yes, this is an area that we've been worked on
since the beginning of Anthropic This was one of our
first teams and it's grown over time. We're looking at
methods to try and understand when a particular element of
the network, which we call a neuron and that analogy
to the human neurons, turns on or fires, what is
associated with it, and we've found some interesting things that

(14:51):
actually parallel what we've seen in the human brain. I
used to be a neuroscientist, so you can see the
network often using very human like concepts. But we're really
just at the beginning of that. Right. We can decode
some of what the network does and understand some of
the principles behind it, but I think it's going to
be years before that science matures.

Speaker 1 (15:09):
Is it important to make some breakthroughs in those particular
fields in order to deliver verifiably safe AI systems.

Speaker 2 (15:18):
Yeah, I think that's going to be one important component
because of the fuzziness that we talked about before, and
if you understand something about what's going on inside the network,
why it does what it does, then you can maybe
predict what it's going to do in circumstances you've never
seen before.

Speaker 1 (15:34):
There are behaviors that come out of these networks that
weren't designed in that are emerging, and it's given almost
a sort of a mystical sense around it. What do
you understand by this idea of emergent behavior.

Speaker 2 (15:46):
Yeah, so I wouldn't attach anything mystical to it, anymore
than I would attach anything mystical to you know, as
humans grow up, they start to understand the world and
they have realization. But I think, you know, as the
model starts to see something in its training data, learned
to concatenate that training data to put together the puzzle
pieces in different ways. Writing semantically correct computer code, or

(16:09):
being able to do a particular type of math, or
understanding the concept of what's legal versus what's illegal. Right,
all of these are things that appear at some stage.
They're not magical, they're not mystical. They're in the training data.
But the model at some point learns to put together
the pieces when it wasn't able to before.

Speaker 1 (16:28):
It's such a complex set of trade offs because if
I know the thing is wrong half the time, I
will double check every answer. But if it's only wrong
once in one hundred, I'm not going to. And I
wonder about whether youth will see some almost chasm that
you'd have to leap of safety before these things really

(16:48):
can feel safe.

Speaker 2 (16:50):
Yeah, So I think that's an important problem, and we
really want to avoid this situation where the models are
kind of you know, we become dependent on them or
come to rely on them, while they may still be
sometimes making mistakes that we would be able to catch.
So I think one of the important things is for
models to know what they don't know. And so the
great thing would be a much more usable AI system

(17:12):
than the one you described, is one where ninety nine
percent of the time it gets the right answer, and
that one percent of the time it says I don't
actually know. Here are some guesses. They might be wrong,
but if it's able to signal or signposts, then it
might not be confident. It's a lot more useful. In fact,
I would probably prefer a system that's right ninety percent
of the time and says I don't know ten percent

(17:33):
of the time, then one that's right ninety nine percent
of the time and kind of silently lies to me
another one percent. Right, This is getting back to the
honest thing, right, Like it's okay not to know sometimes,
but I don't want you to make things up.

Speaker 1 (17:52):
These are pretty powerful technologies, and I'll put my calls
on the type. I think they will be the most
pawful technologies we'll see in a lifetimes. How do we
get them into society more broadly in ways that are very,
very beneficial?

Speaker 2 (18:06):
So I think there's kind of two sides to that, right,
there's preventing the harms and achieving the benefits. So I
think on the preventing the harm side, I mean this helpful, honest,
harmless looking inside the model. These are both important areas.
There's another area I haven't talked about yet, which is
ensuring that models stay under effective human control, that we're

(18:27):
able to supervise them even as they get smarter than
we are. You know, when the models start to know
much more than humans do, how do we make sure
that humans are able to check and verify their work
and that they don't lie to us in ways that
we can't detect.

Speaker 1 (18:42):
In a world where AI systems are prevalent, and many
of these systems perhaps are built by anthropic and therefore
they're guided by your constitution in your constitutional. AI, are
you the right person to set the rules for that constitution?
Because the US has a constitution, Germany has a constitution,

(19:02):
but that constitution was built by a sense of consensus,
a sense of accountability and legitimacy. You seem like a
really trustworthy guy. But is it fair for that palate
to reside with you?

Speaker 2 (19:14):
I think actually mostly not so. I think the way
we envision it is there may be a base model
that has a very basic constitution, right, and we talk
about things like the UN Charter of Human Rights, but
we're actually developing a process to allow different use cases
or different customers to write almost an addendum or to

(19:34):
extend the constitution on top of the basic things. So
the idea would be all versions of Claude have these
very basic rules, right, They're not going to commit things
that or help with things that almost all of human
society agrees is bad. But then let's say I wanted
to make an agent that helped with something medical versus
an agent that served as your lawyer, versus a customer

(19:57):
service agent versus a therapist. Rules for that are very different. Basically,
my answer is that for ninety percent of things, it's
not up to us to decide. It's only the ten
percent of things where we think most people would agree
and where we defer as much as we can to
societal processes.

Speaker 1 (20:13):
And there are so many great processes. We know, for example,
that cars are safer in twenty twenty three than they
were in the nineteen sixties because of rules around seat
belts and breaking systems and crash testing. We know that
when radium was first discovered by Mary Currie, anyone could
make a medical product with radium, radium cough suites for babies.
So what's the process that we should use for AI systems?

(20:36):
Should it look like drug approvals or should it look
like perhaps a much lighter weight system of the type
we have in the autel industry.

Speaker 2 (20:45):
I think maybe of like cars and airplanes or something
like that as good examples of kind of powerful technologies
that are safety critical where lives are online. So the
kind of early wild West of all these technologies. I
think we're in that period they and we need to
move as quickly as possible.

Speaker 3 (21:03):
Moved through that period, right, We've moved.

Speaker 2 (21:05):
Through that period rather quickly, rather soon. Where rules of.

Speaker 3 (21:09):
The road, why say quickly?

Speaker 2 (21:11):
I think it's the exponential with another technology, I might say, look,
we don't understand the cost and benefits that well, Like
we need to have these things play out in the
market a little bit before we start to step in
and set regulation that might be too rigid. But that's
not my view for AI. Because it's moving so fast,
because the implications are happening so fast, I suspect that

(21:32):
this is a case where we're going to need very
soon some kinds of rules of the road.

Speaker 1 (21:37):
Is it that the systems are getting faster? Is it
are getting measurably more powerful? Is it being used more
frequently in business? What is this exponential that you're.

Speaker 2 (21:46):
Ying yes yes to all? So the exponential is basically
the amount of computation number of chips times the time
we run them, four times the speed of the chips
and each of them. Those factors is getting faster. But
used to be five or ten years ago, the amount
of money that you would put into training one of

(22:07):
these AI systems was the size of an academic research grant,
So one hundred thousand to a million dollars. We're now
in an era where I would say companies spend ten
to one hundred million dollars, But I think we're going
to enter an era because the economic value is so
great where it's going to be, you know, a billion
dollars or ten billion dollars, and.

Speaker 1 (22:26):
We should convert that spend into the amount of processing
that these big AI supercomputers are doing, exactly, and they're
doing that processing to produce systems that are even more
powerful exactly.

Speaker 2 (22:40):
And at the same time as that happening, the chips
are getting faster, and more money is going also into
making the chips faster because there's so much useful things
that the models can do. And then of course engineers
are working on how to squeeze every possible drop of
efficiency out of the compute that I have once we
spend it.

Speaker 1 (22:58):
And companies are desperate. I've spike to the bosses of
many very large firms and it's really high up on
their agenda to figure out how they use these technologies
in their businesses.

Speaker 3 (23:09):
And walking around San Francisco the.

Speaker 1 (23:11):
Last few days, I can feel the palpable buzz of
people just wanting to build on AI the way they
wants to build on the iPhone fifteen years ago.

Speaker 2 (23:19):
I think, on one hand, that's really exciting, and we
benefit from it and others benefit from it, and I
don't want to do anything to slow down the excitement
or the positive benefits. But everyone understands that you need
to make these things safe, and that there is no
industry if you don't make these things safe.

Speaker 1 (23:34):
Absolutely, we're building these AI systems using large language models
that are on an exponential, but exponentials are really they're
really s curve, so they go up and then they
tail off. How long does this exponential run for before
it tails off? In other words, is this the last
set of innovations that we're going to need for AI?

Speaker 2 (23:51):
I would say we have at least a few years
of the current exponential, and then people have ways of
coming up with new innovations that continue things after that.
I think a few years from now we may get
to the point where AI systems can perform these feats
that humans aren't capable of. And we've seen with the
AI systems they're already broader than humans. So if we

(24:12):
could get them to the point where they're broader and
they're more creative than we are, or as creative and
able to see all the connections, I really have this
hope that human scientists assisted by AI could make progress
on these complex diseases as fast as we've made progress
on the simple diseases. And my hope is, if we

(24:32):
really get this right, could we actually get to the
point where this particular cancer is just not a problem anymore.

Speaker 1 (24:37):
And of course, beyond the medical applications, into climate change,
into poverty elimination, into all sorts of problems that we
as humans are found.

Speaker 2 (24:46):
Problems of complexity beyond human scope.

Speaker 1 (24:50):
Right, So let's look forward a little bit. The premise
of our discussion is that in five years we could
all be using good, trustworthy AI systems just as part
of normal life. Do you think that could become reality?

Speaker 2 (25:03):
Yeah? I think that could. So, you know, as soon
we get right all the kind of rules of the road,
safety helpful on is termless. If we solve all those
problems which we've talked about a fair amount, I do
think that everyone could have an AI assistant that they
really trust, and your whole way of interacting with the
world could be done through this AI assistant. It can

(25:24):
help you make better decision and say, hey, like, you know,
I think you'd be happier if you did X instead
of why tailored to the way you want it. To
be that helps you to be the best version of yourself.

Speaker 1 (25:33):
Well, maybe in five years time, my AI assistant can
meet your AI assistance right here and we can see
how well the two of us.

Speaker 2 (25:41):
Did same place the same time. Let's see if we
can fulfill that bet.

Speaker 1 (25:51):
Reflecting on my conversation with Dario, I'm struck by how
he acknowledges that the pace of change is so quick,
it's exponential, and he's really attentive to the problem of harm.
It's very thoughtful about it. That made me much more comfortable.
But it's also clear that the way we define what
we want from these systems cannot be left to AI developers.
It really needs to be led by ordinary citizens and

(26:13):
by their legitimate governments. Thanks for listening to the exponentially podcast.
If you enjoy the show, please leave a review or rating.
It really does help others find us. The Exponentially podcast
is presented by me Azeem Azar. The sound designer is
Will Horricks. The research was led by Chloe Ippah and

(26:34):
music composed by Emily Green and John Zarcone. The show
is produced by Frederick Cassella, Maria Garrilov and me Azeem Azar.
Special thanks to Sage Bauman, Jeff Grocott, and Magnus Henrikson.
The executive producers are Andrew Barden, Adam Kamiski, and Kyle Kramer.
David Ravella is the managing editor. Exponentially was created by

(26:54):
Frederick Cassella and is an Eat the Pie iplus one
limited production in association with Black Boomberg LC
Advertise With Us

Popular Podcasts

Dateline NBC
Stuff You Should Know

Stuff You Should Know

If you've ever wanted to know about champagne, satanism, the Stonewall Uprising, chaos theory, LSD, El Nino, true crime and Rosa Parks, then look no further. Josh and Chuck have you covered.

The Nikki Glaser Podcast

The Nikki Glaser Podcast

Every week comedian and infamous roaster Nikki Glaser provides a fun, fast-paced, and brutally honest look into current pop-culture and her own personal life.

Music, radio and podcasts, all free. Listen online or download the iHeart App.

Connect

© 2024 iHeartMedia, Inc.