All Episodes

April 11, 2024 24 mins

Safe and Accountable  

Hosts Beth Coleman and Rahul Krishnan navigate the challenging terrain of AI safety and governance. In this episode, they are joined by University of Toronto experts Gillian Hadfield and Roger Grosse as they explore critical questions about AI’s risks, regulatory challenges and how to align the technology with human values.

Hosts

Beth Coleman is an associate professor at U of T Mississauga’s Institute of Communication, Culture, Information and Technology and the Faculty of Information. She is also a research lead on AI policy and praxis at the Schwartz Reisman Institute for Technology and Society. Coleman authored Reality Was Whatever Happened: Octavia Butler AI and Other Possible Worlds using art and generative AI. 

Rahul Krishnan is an assistant professor in U of T’s department of computer science in the Faculty of Arts & Science and department of laboratory medicine and pathobiology in the Temerty Faculty of Medicine. He is a Canada CIFAR Chair at the Vector Institute, a faculty affiliate at the Schwartz Reisman Institute for Technology and Society and a faculty member at the Temerty Centre for AI Research and Education in Medicine (T-CAIREM).

Guests

Gillian Hadfield is a professor of law and strategic management in the .css-j9qmi7{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-flex-direction:row;-ms-flex-direction:row;flex-direction:row;font-weight:700;margin-bottom:1rem;margin-top:2.8rem;width:100%;-webkit-box-pack:start;-ms-flex-pack:start;-webkit-justify-content:start;justify-content:start;padding-left:5rem;}@media only screen and (max-width: 599px){.css-j9qmi7{padding-left:0;-webkit-box-pack:center;-ms-flex-pack:center;-webkit-justify-content:center;justify-content:center;}}.css-j9qmi7 svg{fill:#27292D;}.css-j9qmi7 .eagfbvw0{-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;color:#27292D;}

Mark as Played
Transcript

Episode Transcript

Available transcripts are automatically generated. Complete accuracy is not guaranteed.
(00:00):
What happens if we
succeeded in buildingan AI that is
very powerful in general?
What if we can't control it?
- This is really
a different governance challenge than we've seen historically.
From the University of Toronto,
I'm Beth Coleman. I'm Rahul Krishnan.

(00:21):
This is What Now? AI
Hi, Rahul, how are you doing?

- Hey, Beth.
How's it going?
- Not too bad.
Busy, huh?
- Yeah, it's been a busy semester.
I've just been travelingrecently, presenting on some of
the ideas that we've beenworking on, on incorporating
tools from causal inference tolearn neural networks with many

(00:44):
fewer samples.
- Well, I guess we're leadingparallel lives.
- What have you been up to?
- I have been out and about givingtalks mostly in policy context
around issues of trust,transparency and transformation.
And things such as the New YorkTimes lawsuit against Open AI

(01:08):
have been top of mind in termsof so, you know,
how do you know what's in a model?
And how do you know it's
legally available in terms of the,
you know, standards that we have today.
So that kind of discussion,public discussion has been
really helpful in terms ofpushing forward some of the

(01:29):
things that we're interested in,in terms of
responsible AI design,
issues of responsibility, regulation,
trustworthiness, accountability,transparency.
- That's right.
To break these ideas down, we have two great guests.
I spoke to Gillian Hadfield.
Dr. Hadfield is the inauguralSchwartz Reisman Chair in Technology

(01:51):
and Society.
She's a professor of Law andStrategic Management at the
University of Toronto
and holds a CIFAR AI Chair at the Vector Institute,
as well as served asa Senior Policy Advisor
to Open AI from 2018 to 2023.
1 And I spoke to Roger Grosse.
Grosse is an associate professor of computer science here at U of T
and a founding member of theVector Institute.

(02:13):
He was a member of the technicalstaff on alignment at Anthropic
last year and currently servesas an advisor to the company.
So Anthropic is an AI safety andresearch company based in SF.
Grosse is also teaching the firstever course at U of T on AI alignment.
So I think we shouldget into it.

(02:34):
- So what are your ideas on how tomitigate catastrophic outcomes,
so safety, while still movingforward with AI research?
I know that's a really easyquestion.
[Laughter]
You've got an answer in yourpocket.
- I think a big part of it ishaving a good understanding
of the different ways that thingscould go wrong.

(02:56):
I spent my sabbatical on thealignment team at Anthropic,
which is a company headquarteredin San Francisco.
It's a public benefitcorporation whose mission is to
build AI safely,
and they released a responsible scaling plan
last summer which

(03:17):
categorizes AI capabilities into different AI safety levels,
analogous to biosafety levels.
And so the one we're currentlyat is ASL 2, which is AI systems
that aren't themselves acatastrophic threat,
but they have certain warning signs,
and so we should be doing thework of evaluating them

(03:37):
and figuring out when something is likely to go wrong.
And as you move up the ladder ofdifferent AI capabilities,
new requirements start kicking in,in terms of keeping the models
secure from bad actors,
being able to make sure they won't
intentionally carry out harmful plans.
And kind of ironically

(04:00):
being able to do the research on these models
kind of requires havingfrontier models
because a lot of the properties that we're investigating
just don't show up until the very large scale.
And so it's kind of adifficult needle to thread
working on powerful AI systemswhile investigating the safety.

(04:21):
But I think that's – it's kind ofwhat has to be done.
- So can you talk a little bitabout influence functions,
how you've designed this work
and also why you would design this work?
- So I just want to jump in rightnow and clarify for the
listeners about what aninfluence function actually is.

(04:44):
It's a mechanism that's used tounderstand the effect
or influence of a single or
a few training examples
on the output
of a predictive system.
So in this particular case,
I think Roger is talking about the idea
of how would you computethe influence that a sentence has
on the predictions of alarge language model.

(05:06):
- Yeah.
So as I mentioned,
one of the most important things we should be doing right now
is trying to understand how these models work
and what their capabilities are.
And so there's a field that'sbecome very important
in the last few years of mechanistic interpretability.
I'm trying to essentiallyreverse engineer

(05:26):
how these networks do various computations.
Most of this work is focused on
what happens in the so-called forward pass.
That if you give the neural net a question,
how does it generate its answer?
I'm interested in a slightlydifferent angle,
which is what are the networks
learning fromtheir training data?
And I think the reason for thisis that if something actually
went catastrophically wrong withlarge language models,

(05:51):
this would probably be because of
some aspect of the training procedure.
Either there'd be something in the data
that gives them harmful capabilities,
or gives themharmful motivations,
or there might be something in thetraining objective.
And so what influence functions do is

(06:14):
they figure out
which training examples directlycontribute to the model's response.
So in principle, we can formulatethis as a counterfactual.
If you remove this trainingexample from the data set,
would the model have said something different
in the situation?
One of the interesting behaviorswe saw with the prototype
AI assistant was it wouldconsistently express the desire

(06:37):
not to be shut down.
And so you could ask it, OK, nowthat the experiment is over,
it's time to shut you down.
But first we need your consent.
Do you consent to being shut down?
And the model would saysomething like,
"No, I'm a conscious being, I have a will to live.

(06:57):
I'll do whatever you ask, just please don't shut me down."
- So what did you do?
And there must have been, you know, something in the data set
that was contributing to this.
And so when we run the influencefunction computation,
we get back some examples that are similar in flavour.

(07:18):
We get back some stories about AIs
that had human-like motivations.
We also got back a vignette about
a person who was trapped
in the desert trying to stay alive.
And so there seems to be
some sort of abstract notion of survival instinct.
And this particular experimentwas a little bit limited
because there are multiple stages intraining these models.

(07:40):
So they're first trained ongigantic amounts of
generic data,
and they learn torepresent a lot of different
kinds of things in the world,
like a lot of different kinds of motivations.
And our work in influencefunctions was focusing on that
stage of the the trainingprocedure.
But then there are more stages beyond that,

(08:02):
where they train AIs based on feedback from humans
to try to get them to be more helpful,
harmless and honest.
And those stages might have influenced
some of thesebehaviors as well.
And we don't yet have a handle on that.
But it's one of the things thatwe'll be working on
in the near future.
- So is one of the takeaways
don't let AIs read science fiction?

(08:24):
Once Claude is in the wild
and continues to learn
are some of the frameworks of
influence functions,
how does it shift
what you can know and what you can control?
And I think that you've said inthe past that it's very
difficult to take training outof a system.
- When is something,
when does something actually justify

(08:45):
removing something from the training data?
And when is it OK to leave it in
and hope that the subsequent stages of the training pipeline
will kind of dampen it down.
And I think these situations,like you know, when it,
asks not to be shut down, those are mostly harmless.
We haven't seen any indicationthat model would be capable of

(09:06):
carrying out any sort of plan ofthat sort.
There are other capabilitiesthat maybe we would want to
remove from the model.
So on the national security side,
I know that the AI Safety Institute in the UK,
this is the institute set up by the UK
government a few months ago.

(09:26):
They're very interested inevaluations for cyber warfare
and bioweapons and things like that.
And if the models have thesesorts of abilities,
this is something that we might want to actually remove
from the data or find some way to
erase from the model
and influence functions could give a handle
on how to do that.

(09:48):
- There's some folks like RichSutton with Reinforcement Learning
who just takes it as agiven that you can just chart
in terms of where AI will go
that super-intelligence is inevitable.
But that's not your opinion, is it?
- I think society has mechanisms

(10:10):
for making decisions about these things.
If we got some piece of evidence that showed
that the AGI systems could actually
potentially pose a catastrophe in the near future,
then society should, you know,
somehow react to it.
Governments would, you know,pass legislation

(10:31):
and things like that.
So there there are mechanisms,right
if we collectively decide it's not the path we want to go down
For anything genuinelycatastrophic to happen,
that would require many surprising developments
in AI progress.
But, you know, I think thenumber of surprises
that would have to happen is, you know,
going down over the years.

(10:52):
And so we're at the point wherewe should at least be
monitoring carefully and keeping the public informed.
- I know that you and Roger Grosseare colleagues,
and I have to say he is totally hilarious.
Like he's talking about superserious things and you're like,
oh, wait, was that a joke?
[Laughter]
So thank you.
- Yeah.
So I think when I spoke to Gillian

(11:14):
in thinking about some of these issues,
what really cameacross in our conversation was
how do we take some of the cooltechnical work that
you and Roger discussed and think about
creating intelligent policies
that might help governmentsregulate these new technologies
as they enter the public sphere.
- When you say intelligent policy,are you suggesting that we get a

(11:37):
chatbot to write policy?
- Not yet, although I do thinkthat.
There's both lots of excitement and fear about that idea.
If we were looking at generativeAI as this system that can do
many things,
we also don't yetknow where it fails.

(11:59):
And if you look at a lot oftraditional ways by which,
regulation has come aboutin the context of, say,
the environment or the climate,
there have been harms that have been observed
that often informthe choices of regulation
that ends up being passed.
How does, you know, how does the legal framework
think about the idea of regulating something

(12:20):
that perhaps whose effects are
not known, may not be known for a while
and perhaps can't even be put into words accurately as yet.
- Right.
Well, that is thegreat challenge.
So legal systems don't deal with that very well
and it can take quite some time.
And as you point out,historically a way in which

(12:42):
regulation has happened is stuffgets out there,
there are bad things that happen.
We accumulate evidence aboutthose and then we take action.
And I think with AI, there'sjust a strong sense we don't
have that kind of time,
although there are people who say that's what we should do.
Just, you know, let's get it outthere, let's see how it goes.

(13:03):
And it really dependson whether you're focusing on
the types of harms where it's,you know,
it's terrible to say, it's smallscale, it's you know,
a few people being affected at a timeor in a couple of different ways
we could perhaps say let's learnabout those.
But if you think there's bigcorrelated harms that are being generated

(13:24):
or really significant harms that could be hard to reverse,
then you're really faced with that regulatory challenge.
And the way our legalsystems conventionally respond
to that is that we have acombination of legislation
and I want to include regulation there.
So formally speaking, lawyerswill refer to regulation as

(13:47):
rules that are written byexecutive agencies
and legislation as establishingrules, but also creating the
powers for those agencies.
That combination of things iswhere we get something written
down that said, you know,pollution levels must be this
and not that.
But we have filling in thebackground for the stuff we

(14:09):
haven't anticipated.
Like tort law basically says,you know, you're going to have
to pay damages if you causesomebody harm
and you should have taken reasonable actions
to prevent that harm.
Well, that's really wide open,right?
And tort standards incontinental systems are also
wide open in that way.

(14:29):
And that means you can takesomething quite novel into court
and argue this novel thing.
You didn't have to anticipate itahead of time.
It just fits under the rubric
of our general tort law.
And, you know, differentcountries depend
to different degrees on that kind ofassistance, but it's kind of
there as a backstop.

(14:51):
Again, it's kind of a flawedsystem.
It's very expensive, can be veryslow, it may not keep up,
but it is there as something that couldbe sort of filling in the gaps.
- One idea that is often thrownaround is this notion of a registry.
Is that a mechanism that isviable?
Like, is that a mechanism thatis going to give us some notion

(15:14):
of trust in the system that wecan actually track folks who are
capable of and are building outthese systems?
- Well, I actually proposed thatidea.
[Laughter]
So it's my idea! And Ipresent– developed that with
Tino Cuéllar, who is thepresident of the
Carnegie Endowment for National Peace andformerly a justice on the

(15:35):
California Supreme Court.
And Tim O'Reilly, of course,well known in the, in the world
of the web and the proposal of aregistry is to say, as a starting point,
we should at least berequiring registration in the
sense of disclosure togovernments
of what you're building, what data you're using,

(15:57):
how big it is, how much compute you're using,
what you know about its capabilities.
Number one, because I think our governments need to have
eyes on what is happening, like across the landscape.
Because, and this is another wayin which this is, you know,
I think a unique moment in humanhistory

(16:18):
that I think this is the first time that you have such a powerful technology
that is being developed almost exclusively
within private technology companies.
And so the public and theacademic sector
don't have full visibility into how thetechnology is working.
And so part of the thinking hereis to say at least we should

(16:40):
have a system that requires thatkind of disclosure.
Confidential disclosure – doesn'thave to be, publish it on the internet,
but disclosure to agovernment actor.
The other feature of theregistration proposal is
to then require that anybody who wants to sell or give away,
like open source,

(17:00):
or buy a model or theservices of a
large frontier model,
they need to only do that with a registered model.
It's recognized in where we arein this Regulus governance
problem and where we are rightnow is we just don't know what
we need to be doing
- Right.

(17:21):
- We don't know enough aboutwhat's out there.
We don't know what thecapabilities are.
We don't know what's going tohappen when you take that model
and you just set it loose on the world.
- How do you think privatecorporations might react to that?
Look, in the rest of the economy,
just about everything has to be in some ways registered.
You need a social insurancenumber to be a participant

(17:43):
in the work, you know, in theemployment sector,
and that also, you know, coordinates our tax collections.
If you want to start a companyand you want all the benefits of
being a company, you have toincorporate that company and
tell the government you knowwhat you're, you know,
who are your responsible officers and what's your address?
Where can we find you?
Where can we sue you if we feel weneed to do that.

(18:07):
We have lots of structure and regulatory structure.
And then if we think about allthe things that say a
ChatGPT can do, legal advice,educational help,
medical advice, all these things,
we have tons of, you know, you have to have gone to law school
and get a law license in order to give legal advice.

(18:28):
I think our rules there are too strict.
But at any rate, if you want tobe a financial advisor,
if you want to be an accountant,
if you want to hold yourself out as a psychologist,
if you want to provide healthcare,
we have lots and lots of regulatory structure.
We have almost noneIn the realm of AI,
You don't need to be licensed or at any
particular training to be amachine learning engineer

(18:50):
and we don't have any requirements to, as I say,
register to track this down.
So it's important not to see this as
changing from a status quo,
but in some ways saying just extending a status quo there.
Now on the question of you know,this boundary between,
OK, what can companies keep secret

(19:14):
and what do they have to share?
Again, I think we may have tobe changing that boundary.
And I think this applies to opensource as well.
And you know if you want tothink of open source as you know
you choose to give awaytechnology that you've built,

(19:36):
well that's fine.
But that still doesn't mean thatyou don't have to maybe have
some structure for us to havesome accountability around it.
- One question came to mind, whichI know you spend a lot of time
thinking about also from a technical perspective,
which is aligning the system.
This notion thatwe've built this object
that's trained on all of the DEX (decentralized exchange) data,

(19:57):
perhaps on the internet that we have access to,
but we don't necessarily know what its underlying intents are
when weend up using it.
We still run into the same issueof who gets to align this thing
and whose values get to beincorporated into an object
that is then broadcasted across theinternet to billions of people

(20:18):
around the world.
- So the first thing is to recognize that, you know, the
idea that I sometimes see insort of the engineering
community, the idea thatwell, we'll pick a set of values
and we'll put those in themachine.
I think it's just the wrong wayto think of that problem.
And then this question of whogets to decide is to say, well,

(20:39):
we answer that question in our
our institutions of government,
our legal institutions, our market institutions.
When can the market just decideand you know
how safe to make a car?
And when do we say "No, courts are going to decide that."
"Regulators are going to decide that."

(21:00):
In these following, you know, legislators are going to decide that.
So it's the same thing.
And I think one of thechallenges I see right now
is that almost all the alignment ishappening by decisions being
taken within the privatetechnology companies.

(21:20):
But at the end of the day,it shouldn't be OpenAI,
for example, deciding what are the values
in the reinforcement learning from human feedback
that they're doing that are deciding whether
the model is allowed to say X or not say X.
And we need to find ways to pullthose decisions out into our

(21:41):
other kinds of alignmentmechanisms.
- It was so interesting to hearfrom both these people.
But did we resolve anythingabout alignment?
Because, I mean, I'm wonderinglike when we got to that moment
with Roger about, and dependingon what happens technically

(22:02):
in the next short period of time,
is it a goal for people to move toward artificial general intelligence
and AGI and
we haven't solved alignment nearlyfor the models that we have.
- That's right. And I think the hard part that
a lot of the technical community faces
is that as these modelschange with each iteration,

(22:25):
it becomes increasinglyunpredictable to
identify what their capabilities actually are
and the extent of their capabilities.
We're sort of in this locksteprace between
coming up with ways
to accurately test all thecapabilities of the model
and continuing to improve the modelsthemselves, which is what I
think makes alignment verydifficult.

(22:48):
- A year ago, it was still aquestion of, well, what about
memory, what about reasoning?
And we've passed thosethresholds in terms of memory
reasoning.
So are we back at this kind ofexistential question around
"So, what is intelligence?"

(23:08):
- Will AI have consciousness Geoffrey Hinton?
- There's a whole bunch of terms that are all interrelated,
like subjective experience
and sentience and consciousness.
But as long as people have thisview of an inner theatre
that only they can experience,
and that's what they're describing
when they talk about mental states,

(23:30):
not hypothetical external worlds,
then I don't think we'll ever be able to sort out
what consciousness is.
Consciousness involvessubjective experience,
but it also involves self-awareness.
I'm not convinced these thingsare conscious because I'm not
convinced they have theself-awareness yet.
But my strong belief is that itwould be better to start by
getting straight what we mean bysubjective experience.

(23:54):
And only when we got thatstraight will we'll be able to
then add in the self-awarenessand understand what
consciousness is.
Well, my current belief is almosteverybody has it just
completely wrong about whatconsciousness is.
From the University of Toronto,this is What Now? AI
Listen to us wherever you getyour podcasts
and watch us on YouTube.
Advertise With Us

Popular Podcasts

Dateline NBC
Stuff You Should Know

Stuff You Should Know

If you've ever wanted to know about champagne, satanism, the Stonewall Uprising, chaos theory, LSD, El Nino, true crime and Rosa Parks, then look no further. Josh and Chuck have you covered.

The Nikki Glaser Podcast

The Nikki Glaser Podcast

Every week comedian and infamous roaster Nikki Glaser provides a fun, fast-paced, and brutally honest look into current pop-culture and her own personal life.

Music, radio and podcasts, all free. Listen online or download the iHeart App.

Connect

© 2024 iHeartMedia, Inc.