Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Joshua Rubin (00:06):
Welcome and thank you for
joining us today on, uh, AI Explained,
on Managing the Risks of Generative AI.
Uh, I'm Josh Rubin, PrincipalAI Scientist at Fiddler AI.
Uh, I'll be your host today.
I've been working in the responsibleAI space for about, about five years
now, building tools and algorithmsto help companies instrument,
uh, their, their AI applications.
(00:27):
We have a very special guest todayon AI Explained, and that's Kathy
Baxter, Principal Architect of,uh, Ethical AI at Salesforce.
Uh, welcome, Kathy.
Uh, would you like to givea little self introduction?
Kathy Baxter (00:39):
Hi, thank you
so much for having me today.
Uh, yeah, I'm PrincipalArchitect of, uh, Responsible
AI and Tech at, uh, Salesforce.
Uh, been at Salesforce since 2015.
Uh, our team is part of the largerOffice of Ethical and Humane Use.
Uh, and in addition to my workwith Salesforce, I am also a
(01:01):
visiting AI fellow with nist.
Uh, I'm on the board of equal AI andnonprofit, uh, and also, uh, on the
advisory, uh, council for Singapore's,uh, ethical use of AI and data in
their nonprofit AI Verify foundation.
So like to do lots of things withgovernments and nonprofits outside
(01:23):
of my, uh, work with Salesforce.
Joshua Rubin (01:26):
Very, very cool.
Um, so, I don't know, I kind ofthought to open this up, we might
just talk about what, like, Uh,ethical AI or responsible AI means.
Maybe, I don't know if you want totalk a little bit about some of the
harms that you think about, some ofthe risks that organizations deal
with, just at kind of a general level.
Kathy Baxter (01:44):
Yeah, absolutely.
The, I think of generative AI,the broad availability of ChatGPT
really brought onto the screen.
Uh, or, uh, into everybody'sconsciousness, um, the,
the potential risks of AI.
(02:04):
These are not risks that are unique,they're not brand new, they've existed
in the world of predictive AI, but it'skind of amplified with generative AI.
Uh, and so it's one of theconcerns is accuracy, particularly
in a B2B scenario, you know,Salesforce like Fiddler, we're B2B.
(02:25):
We have enterprises that demandthat whatever it is that we create
and offer, it's got to be accurate.
Uh, nobody wants, uh, their, theirweb engine to hallucinate answers.
Um, but when it's in a B2B scenario,it's particularly concerning.
So you have to make sure that.
(02:46):
The AI is giving you accurate answers,that they're safe, uh, that you, they've
been assessed for bias and toxicity.
It can never be bias free or toxicfree, but you've gotta do, uh, a lot
of work to make it as safe as possible.
And it needs to be reliable, robustto security, uh, and, uh, to security
(03:11):
violations, um, and sustainable.
Um, we are in massive, massive, uh,climate change, and although there's a
lot of potential for AI to potentiallyhelp us solve some problems, um, we are,
we are burning through a lot of carbonand a lot of, Water, every time we train
(03:33):
a model or generate, uh, new content.
So, to be responsible, we have tomake sure that our, the AI that we are
building or implementing is accurate,safe, uh, transparent, empowering
of humans, not automating everybodyout of jobs, and, and sustainable.
Joshua Rubin (03:53):
Yeah, that's great.
I, you know, I, we, we try to maintaina sort of, uh, a list of, uh, incidents
of one kind or another that you mighthear about in the news, uh, that, uh,
you know, of various organizationshaving problems with responsible AI,
you know, from various things likelack of observability or whatever.
Just carelessness or, uh, you know,insufficient controls around their models,
(04:18):
you know, I, I don't know that we have anyexamples that we, we, the environmental
impact I think is a really interestingdimension that I think, uh, is becoming
increasingly important, obviously, withdata centers full of generative AI models,
um, but, you know, I think we think alot about, you know, like, obviously
there's, you know, things that companiesare concerned about, reputational risk,
(04:39):
you know, actual harm to individuals,Um, you know, revenue left on the table
because they didn't realize that, um, theworld changed in such a way that caused
a model to wander into some region whereit was making inefficient predictions,
where, you know, somewhere where it mightbe extrapolating, or somewhere where its
training data was, was insufficient, ormaybe there's just some concept drift
(05:01):
in the world we weren't prepared for.
There's just lots of different categories,and I think in some ways, you know, those
are maybe the things that companies careabout, but that's, doesn't even bring the
kind of ethics question of, you know, arethese tools serving all of their human
stakeholders in the most appropriate ways?
Um, yeah, so, so I don't know, I thinkone, one thing that you're, uh, You
(05:23):
know, one of your main focuses is, youknow, how do we create this, like, you
know, sort of, uh, conversation betweencompanies and government agencies and
civil society, you know, in such a wayto get all the stakeholders to a table.
Do you want to talk a little bitabout some of your efforts with Um,
you know, uh, trying to make policy,trying to come up with best practices
(05:46):
around this and why that's important.
Kathy Baxter (05:48):
Absolutely.
Um, I'm, I'm incredibly proud of the, thework that I've been fortunate to do with
NIST on the AI risk management framework.
Uh, I can't stress enough the importanceof Public private collaboration, uh,
as, there's, there's been a lot ofcriticism over time of the government
(06:10):
not understanding this technology, butwe're a long way from, you know, testimony
where, where somebody asks, how doesa social media company monetize, uh,
monetize youth, uh, or monetize, um,their, their views Um, and so, we have to
(06:30):
think about how do we engage as partners.
It's impossible for any governmentagency to be able to know how
each company is building thistechnology because It's kept secret.
Um, it is in, uh, it is not somethingthat a lot of companies are going
(06:52):
to make that publicly available.
And so trying to keep up withrecommendations, guidelines,
much less regulations, as thistechnology is emerging so quickly,
uh, it's, it's virtually impossible.
So you need that public privatecollaboration to be able to talk about.
(07:12):
How we can ensure that this technologyreally is beneficial for everyone.
How we can create guidance.
Because right now, we don't quite knowwhat, uh, what it means to be safe
in the worlds of medicine or food.
(07:32):
There are thresholds for how much of acertain heavy metal or chemical is allowed
before it becomes toxic to a human.
But we don't know how much bias or howmuch toxicity could be in a training
dataset or a model that above thatit becomes too problematic, and more
(07:56):
work needs to be done to fix that.
So we really need to beable to come together.
One of the things that I'm excitedabout is to see the, the USAI
Safety Institute, um, and the workthat, uh, that they will be doing.
They're going to be bringing togetherPrivate industry, academia, non
profit, as well as government, uh,to have different working groups to
(08:20):
collaborate on these different questions.
And so that's going to bereally exciting to see.
Joshua Rubin (08:27):
Yeah.
Um, you know, I, I think, uh, you know,from a kind of the technology point of
view, I, I, you know, one, one challengeabout this stuff from my perspective is
that Um, you know, when you are dealingwith a kind of predictive modeling,
discriminative modeling world, um,it's a little easier to measure things.
(08:49):
You tend to have, uh, labels that areclosely associated with, you know,
some sort of ground truth information.
Um, or you can, you cangenerate those labels.
You can go ask humans to Um, assess theperformance of the model in a way where
it makes it a little easier to measure,um, you know, that the model is behaving
in a way that it was intended to behave,you know, and it's, and it doesn't.
(09:11):
Uh, choose to bias in ways thataren't helpful to its objective.
Um, I think one thing that we're seeingfrom the kind of technology perspective
is this kind of convergence around,uh, you know, evaluations and suites of
evaluation tools on generative models.
So, you know, uh, collecting, you know,you know, these things like, uh, like
Truthful QA or collecting these, youknow, you go on to like Hugging Face and
(09:35):
there's these like leaderboards now of,you know, various, um, kinds of things.
You know, uh, carefully curatedeval datasets with objectives for
generative models, um, that at leastgive you some sort of benchmark for
comparing one model against another.
I don't know if, you know, it's, it's, weknow yet, kind of, from that, like that,
like your F, FDA analogy, um, you know,where the threshold is of, of harm, right?
(10:02):
Um, you know, and, and for thatmatter, like, You know, I mean, a lot
of these models, I mean, a functionof machine learning is, is in some
abstract sense to bias, right?
It's to, to have, it's to be opinionated.
Um, I mean, not, not in the, in the,in the negative connotation, but, um,
you know, clearly there's a point whereit's behaving in a way that is not
(10:22):
appropriate for, um, you know, by, byhuman standards, and it's clear that
the models need to, uh, reflect that.
So, so I think there are good efforts.
The point was, I think there's good,good efforts in terms of building
eval datasets and test benches.
Um, for, uh, at least starting to measurethese things and being quantitative.
Um, you know, I want to, to go backto what you were talking about.
(10:45):
I mean, one of the things that I,you know, you hear about The EU or
the, you know, the, um, uh, the WhiteHouse sort of producing guidance on,
um, you know, uh, how we're going to,uh, make sure that we've got a close
handle on this rapidly evolving world.
And, you know, I, I have to admit thatsometimes I kind of roll my eyes, right?
(11:06):
It's sort of, uh, It seems likepart of what makes tech fun, what
makes AI fun, is that you don'tknow who's going to invent something
surprising and new and dramaticallydifferent tomorrow morning, right?
Um, so, so I guess I'd kind of liketo hear from you about, sort of,
how you think about developing bestpractices that You know, are sort of,
(11:29):
um, sufficiently specific, you know,that, uh, that, that have enough teeth
but aren't overly rigid, um, thatthey actually can apply for future use
cases that we haven't thought of yet.
Kathy Baxter (11:44):
Yeah, that's,
that's a great question.
One of the things that I really like aboutthe, the NIST AI RMF, and, and I'll be the
first to admit that I'm biased because,uh, you know, I helped, helped contribute
to it, but one of the things that I reallylike about it is that it is about process.
It doesn't draw lines and sayanything on this side is bad,
anything on this side is good.
(12:06):
It's all about the process and how youshould be building this soup to nuts,
um, from concept all the way to launchand then post launch coming back again.
And that really is criticalbecause you can't, if you happen
to have good, harmless outcomes.
(12:29):
from bad process, that'snothing but sheer luck.
You can have good, amazing process,and you may still have harmful
outcomes, but it's a lot less likely.
And so having a, a process that everyonefollows, and then being transparent.
about it is really important for trust.
(12:52):
You need to know that, uh, how are youcollecting data that trains your model?
How are you evaluating that model?
What are the safeguards thatyou have in place to look for
things like bias and toxicity?
How do you, how are youtesting for Data exfiltration.
(13:14):
How are you testing for data leakageand then, and then blocking it?
All of those are really tough questions.
There is no solid right answer.
Everybody must do X, but at leastby asking everybody to, uh, you
know, do certain processes andthen communicate what you're doing,
(13:35):
communicate what you're finding,it will drastically increase trust.
And then hide, hide.
Rises all boats by a, by beingable to learn from each other.
We all get better in our practice.
We, we shouldn't treat responsible AIas, uh, as a market differentiator.
(13:57):
We, all of us should want the productsthat we use, regardless of whether
it's a company that we work at or not.
If it's a product we use, weshould always want it to be safe
and responsible for everyone.
Joshua Rubin (14:10):
Yeah, so, to kind of follow
on there, you know, one thing, so, uh,
when we were chatting the other day, we,uh, one topic that came up that I thought
was interesting was about, you know,organizations and how they're structured
to, you know, start to adapt to kindof, um, policies and best practices.
Some of these things that you'vedescribed, you You know, in terms of
having, following these robust proceduresand, you know, I was, you know, one
(14:35):
thing that I've seen at times is that,um, you know, you'll have a part of an
organization that's responsible for, uh,sort of, uh, governance compliance, right?
And responsible AI becomes part of thatand it becomes a sort of adversarial
relationship with model developersand, uh, You know, product developers
(14:55):
who, uh, sometimes feel like, youknow, there's some other team that's
hoisting a bunch of extra steps, uh,that may be, uh, totally separate from,
you know, their OKRs or their, youknow, sort of immediate objectives.
Um, do you have any feelings abouthow, you know, organizations can
start to, you know, internalize someof these rules in a little bit more
(15:16):
of a Um, integrated fashion like it.
Kathy Baxter (15:21):
Yeah, it, uh,
we really need to have the
concept of an AI safety culture.
I recently, uh, published a, abrief, um, blog post about this.
Uh, in regulated industries, you'remore likely to see A safety culture.
Um, Patrick Hudson is an internationallyknown, um, uh, safety expert and he
(15:48):
published this safety ladder or maturitymodel for organizations, um, which is
very similar to the ethical AI maturitymodel that I had published a few years
back, but he identified five aspectsof a safety culture that he identified.
I believe are also very relevantto an ethical AI culture, um, with
(16:10):
leadership being the first andforemost, uh, the most important aspect.
But there also needs to be a, alevel of in account accountability.
And so throughout an entire organization,You may have a dozen, hundreds of people
that are each responsible for differentelements of, uh, safety of a system.
(16:35):
It could be building an airplane, it couldbe drug testing and manufacturing, or it
could be building a large language model.
And You end up with this, whenit comes to AI in particular, you
end up with this responsibilitygap when something goes wrong.
Who is to blame?
(16:55):
There are probably many people thatcould have some part in that system.
And so you end up with not beingable to hold anyone accountable
when something happens.
In the airline industry, Uh, Dr.
Missy Cummings, she's a former fighterjet pilot, she's worked with, uh,
(17:15):
the Department of Transportation.
She's talked about the need for,uh, uh, chief AI test pilot.
Like they have in, um,the airline industry.
All of these people are working togetherto make sure that, uh, a plane is safe
and has, has, all of this work hasbeen done, but at the end of the day,
(17:36):
there's one person that takes thatplane out for a flight and then signs
to attest that this plane is safe.
Thanks.
Flightworthy.
And so she advocates that particularlyfor AI systems with safety implications
like self driving cars, that thereshould be a chief AI test pilot.
(17:57):
So one person that, that does that finaltest and, and, um, signs off to a test
that this model, this system, this AI app.
is, um, is trustworthy.
It's, it's safe for use.
So having that end to end safetyculture and accountability
is incredibly important.
Joshua Rubin (18:19):
That's interesting.
I mean, one, one thing, so we work afair amount with financial services
companies, uh, large banks, um, youknow, and, you know, in those heavily
regulated industries, there, thereis already some guidance, right?
You have things like, um, likeSR 11-7 from, uh, the Fed.
Which, you know, I think in a lot ofscenarios feels burdensome in those
(18:40):
organizations, um, you know, but atthe same time, uh, you know, to kind of
go back a little bit to, uh, You know,providing kind of the right level of
guidance, you know, kind of lays downa framework that's, uh, I'm going to
say specific, but in a general sense,it asks questions like, you know, for
(19:02):
this particular model application,first of all, like, you know, what are
the, uh, you know, what are the, whatare the risks involved in this model?
Is this doing something totally kindof trivial, like, uh, categorizing
emails, or is this doing something that.
Uh, with all sorts of complicateddynamics involving, uh, you know, that
can, that can, you know, physicallyharm humans or, or at least, um, affect
(19:23):
their financial futures in ways thatare interwoven with, um, you know, uh,
demographics in, in, in awkward ways.
So, you know, it asks, you know, theModel developers to put forth, um, you
know, procedures for identifying, uh,you know, what are some changes in the
(19:45):
world that could affect your model, right?
Like what, how, how could the, howcan the, the, the economic climate
change, how might that affect,uh, the performance of your model?
How are you going to measure that?
Right?
Like, so there's sort of like generalcategories of, you know, here are
some things you should think about.
Um, define a procedure by whichyou're going to track, you know,
(20:10):
change in the world, change in modelsperformance, um, you know, uh, shifts
in the way that it's cutting acrossdemographics in ways that might, you
know, uh, amplify, uh, pre existingbias in the, in the, in the record.
Um, do you think differently about,um, AI tools that are designed to be
(20:35):
used within organizations versus thosethat are sort of consumer facing?
Kathy Baxter (20:42):
With, uh, in enterprise
or within a, within a company, I think
you can have a lot more transparency,um, a lot more understanding
about what's actually happening.
I think a lot of, uh, uh, companiesare very concerned about showing too
much to consumers, either becauseIt's difficult to understand.
(21:05):
There's the whole difference betweenexplainability and interpretability.
I can tell you what's happening underthe hood for this AI model, but it
means absolutely nothing, versusinterpretability of being able to
understand, um, you know, if you are ofa certain gender or race, you're less
likely to be recommended for this job.
You don't have to understandwhat's happening under the hood.
(21:27):
But understanding the relationships,the cause and effect, uh, kind of
analysis is, is very important.
Um, so how do you communicate what'sworking, what's happening, um, uh,
the consequences, uh, that can alsohave real legal implications as well.
So companies, um, uh, are much morelikely to have behind the scenes.
(21:52):
A lot more model qualityassessment and monitoring that
that's just for their eyes only.
If you're in a regulatedindustry, that probably is also
going to be visible to auditors.
So having, having these kinds of analysesavailable to provide to regulators at
(22:13):
any point in time or to an auditor thatcomes in and being able to click a button.
We've seen a lot with um, Uh, model cards,or, uh, uh, IBM calls them data sheets,
uh, being able to provide a documentthat lists out the training data, what
tests you've done, known biases, intendeduses, unintended uses, and, uh, Um, uh,
(22:40):
those types of things, there's, you canfind a lot of those on HuggingFace, um,
uh, and, uh, those can be very helpfulfor those that are particularly, uh,
people in procurement that are thinkingabout purchasing a product, uh, being
able to ask for those model cards andunderstand what's What's happening here?
(23:02):
Is there, is there any known bias?
What, what checks have you done?
And if you don't get sufficientanswers, then come back and, and ask
for that before completing a purchase.
Joshua Rubin (23:14):
So, so, you know, one, one
thing that kind of just sparked in my
mind is, uh, you know, there's kind of adifference between what the laws say and
sort of guidelines and best practices.
Do you imagine a world where, you know,there is, uh, sort of, the government
is, Responsible for an auditing function,or do you see that more as like a sort
(23:36):
of, um, you know, uh, internal bestpractice, a sort of, uh, you know,
some internal process that, you know,organizations have to follow or will
want to follow a kind of you know,maybe for, for lack of a better, um,
(23:58):
descriptor, kind of legal defensibility.
Like, I think in some of the, youknow, I think, I think in a lot of
ways, um, there's been this kindof historical lack of guidance.
Uh, but companies know that ifthey do something egregious,
they can get sued for it, right?
Like, it's not just reputational harm,which is huge, but there's also legal
(24:18):
repercussions if you can't demonstratethat you had some set of best practices.
Uh, defined and followed thosein a, in a historical sense.
In conversations I've had with lawyers,that's been kind of the, the, the gist
of, of, of their recommendation is law,the laws are fuzzy, uh, the best thing
(24:39):
you can do is Define a plan and be ableto show that you followed it over time.
So I guess I'm kind of curious, like,where you come down in terms of,
like, the balance between a, you know,centralized government org versus this
kind of, um, I don't know, uh, internal,uh, uh, due diligence process that's
(25:02):
just considered to be a best practice.
Kathy Baxter (25:05):
Well there are a number of
departments in the government that have
made it clear that new laws are not neededto ensure that AI is being used fairly.
So for example, the EEOC, EqualEmployment Opportunity Commission,
they've, they've said that whetherit's a human or an AI, if you make
(25:27):
a biased hiring decision against aprotected We're coming after you.
Um, so I, I think there are a numberof departments where there are areas
that are already regulated and,and, um, uh, the FDA is one of them.
There are, uh, a number of AIs thatare considered medical devices.
(25:49):
It's not a device in a hardware sense,it's, it's still an, an algorithm,
but it's regulated in the same way.
So I think we, we already seesome sector specific of, um, uh,
enforcement of AI where, where wehaven't necessarily needed new laws, we
could just apply what already exists.
(26:11):
Um, I do expect that additionalones are going to be.
And so that sector specificapproach is going to be really
important because you've got thoseexperts in transportation, food,
medicine, um, uh, employment.
They really know, um, These areas,these domains, and so it would be much
(26:36):
easier for them to be able to developregulations about what, what is fair.
I think in other, broadercircumstances, it can be more difficult.
Uh, doesn't mean it's impossible.
Um, but being able to understand,again, Um, what is necessary
(26:58):
from a transparency standpoint?
How can we know if what youare doing is safe and fair?
we talk a lot internally about detectingsignals of veracity with generative AI.
some cases, when content is being created,It's pretty clear as to whether or not
(27:20):
something is a right or wrong answer.
So in customer service, whensomebody asks, my router isn't
working, how do I fix it?
There is a very specific set ofsteps that you may need to take.
And so the AI can generate thatcorrect summarization or not.
(27:41):
And so there are ways that youmight able to cite sources.
Where's the knowledge article?
That you pulled this, this informationfrom if you're doing RAG, if you're
grounding the model in a data set.
In other cases, it'sincredibly subjective.
in our marketing cloud product, SubjectLine Generation, There could be dozens
(28:06):
of different ways that an AI could writea subject line for a marketing campaign
that could all be right, technically,um, but some might be better than others.
Some might be closer to yourcompany's voice and tone.
Some might be better at generatinga sense of FOMO um, uh, than others.
(28:29):
And then it's just really, uh, asubjective assessment, uh, and So
how do you give signals then tothat end user to help them choose
which one of a dozen differentsubject lines have been generated
is the best one for them to, to try?
Or maybe they want to do A B testing.
(28:49):
What are the top three that I wantto, that I want to do a test on?
Uh, and that really takes, again,domain expertise, um, that, in order
to come up with rules of thumbs,guidelines as to this is, this is
(29:10):
good, this is safe, this is not.
Joshua Rubin (29:15):
Um, yeah,
that's really interesting.
That, uh, that, uh, what did you call it?
Signals of veracity, I think, isa really, a really great phrase.
Um, I hadn't heard that before.
Um, you know, coming at this from, like,the, uh, you know, sort of tooling and
instrumentation perspective, one thingthat we talk a lot about it on the, from
(29:36):
the Fiddler side is, Um, how do you,like, uh, is, is, uh, it's trying to
gather signals of what kind or another.
We sometimes call them,like, feedback controls.
Um, you know, and this kind of getsback to this question of, like,
how generative AI is a little bitdifferent than some of the, you know,
um, predictive or discriminativemodels we've dealt with in the past.
(29:59):
You know, and the lack, the lackof the obvious label, right?
Like, to, to your point.
Um, there may be a thousand differentright answers that are right in different
shades of right, um, and, uh, you know,being able to quantify that helps you
control, um, the behavior of your model.
And you'd certainly want to know if, uh,you know, there are some topics for which
(30:20):
your generative model comes up with reallygreat answers and other topics for which
it comes up with mediocre answers, right?
Because that speaks to model performance.
Um, it speaks to, you know, somethingthat could be improved, ways in which
the model might be underserving, uh, youknow, certain users, certain stakeholders.
Um, you know, and of course, youknow, just to bring it back to kind
(30:41):
of the fairness and bias thing.
It's hard to know the ways in which, um,you know, the, there will be interplays
between things like, you know, uh,uh, you know, controlled attributes,
things like, uh, you know, race, gender,demographic bias, or demographics, and,
(31:06):
you know, things like topics, right?
Uh, we sometimes, I'm going to wanderoff into a little tangent, uh, but.
You know, one thing that we were talkingabout a couple of months ago that I
think would make a really fun sortof, you know, research topic is just
exploring how, um, different uses ofdialect, you know, different, different
(31:27):
dialects of subpopulations, you know,even within American English, uh,
could lead to different model outcomes.
Like I said, you know, if I had allthe time in the world, I would love
to spend some time researching thatbecause we know, I mean, you can see
from all this work that's coming outnow on sort of Um, you know, adversarial
attacks on generative AI, how sensitivethe model can be to very subtle
(31:51):
differences in wording or terminology.
Um, so to out again, you know, havingtooling that is, uh, capable of, uh,
providing feedback in whatever wayis appropriate for the application,
you know, whether that's some sortof a human labeled thing, whether
(32:14):
that's a thumbs up or a thumbs downon a chat bot that a human can click.
Um, that may be other modelsscoring, you know, the model in
question for, uh, you know, uh, howeffective it is at meeting its task.
Um, having a framework that kind of bringsall of those things in together and tools
that allow you to, to gather those thingstogether, um, and aggregate them in a
(32:36):
way that lets you actually kind of rootcause and isolate problems is, I think,
for us, seems like a really key, um, uh,set of design objectives for tooling.
Kathy Baxter (32:48):
Yeah, I mean, you've
touched on a bunch of issues there.
I was really fortunate to work foryears with Greg Bennett, who is a
linguist, and we've talked aboutthese issues many, many times.
Very early on, it was making surethat chatbots could understand people
(33:09):
that spoke non standard grammars.
Um I'm originally from the South.
When I first moved to California, Ihad a really thick Southern accent.
Um, my, my grammar was not thesame as other people in California.
Um, and, uh, if I were engagingwith a chatbot, uh, I would have
(33:33):
to code switch, um, uh, AfricanAmerican vernacular English, Ave.
Um, there's been a lot of reports andarticles that show that people have
to change how they speak in orderfor their, uh, HomeSmart assistant
to be able to understand them.
So this is a, this is a huge issue.
(33:56):
we talk about this a lot interms of getting training
datasets in different languages.
So at Salesforce, we have a trust layerthat When our, uh, end user could be
a customer support agent, sales rep,marketing, campaign creator, they
submit a prompt and then it goes througha number of checks and detectors.
(34:20):
Checks for PII, masks it, strips itout, does a toxicity check, um, goes
into the, the LLM, uh, comes back outand does a number of other checks.
For toxicity, toxicityis incredibly contextual.
It is, it varies by language,by region, by dialect.
(34:44):
Humans are immensely horrible atcoming up with new racial slurs and
insulting and, and, so, when you saythat your model works in Spanish, it's
not good enough to just get a Trainingdataset of toxic words and phrases of
(35:11):
people that speak Spanish in the U.
S.
You need to get a training dataset fromSpain and Argentina and Mexico and It's
incredibly expensive and time consuming,but if you don't do that, then you
could have a chatbot that is completelygenerative AI, so it's not just giving,
(35:35):
it's not just rule based, it's not menubased, it's, it's, you know, natural
language processing and generation, andit could generate very insulting things
because you didn't have a data setthat understood what was offensive or
harmful for that language in that region.
(35:56):
So we, we have to think a lot.
Beyond our little box of where I live,my lived experience, my language, my
values, and have those, those datasets have people from all of those
backgrounds participating in this and.
(36:19):
it's like a bolted on to the end,we're going to do adversarial testing.
So let's find a really diversegroup of people to come in
and try to break this thing.
Usually that's red teaming and security,but from an ethical standpoint, you
can try to have it do really, make itsay and do offensive, terrible things.
Ideally, you want to have a reallydiverse team from the very beginning,
(36:44):
thinking about what are all ofthe terrible ways this product
could be used, um, and cause harm.
So you're thinking from the beginning,from a set of values, from many different
people's point of view, and how do wemake sure that your AI is working to
(37:07):
support those values, and you are puttingall the protections in place to block
the harmful ways that it might be used.
If you, if you don't do that, if you'reonly doing adversarial testing at the
very end, thank god at least you'redoing that, but it ends up just being
Band Aids, where you find that yourAI is doing all these really terrible
(37:30):
things, and you're trying to put BandAids on each one of those terrible
things, as opposed to starting from thebeginning and having a truly inclusive,
safe product from, from, uh, conception.
Joshua Rubin (37:43):
Yeah.
Yeah.
I think it's a, it's a, a bigundertaking that you're describing.
It's really important and it, it, I, froma technology perspective, I'm actually
optimistic in a way, um, because, youknow, for years and years we had these
sort of, you know, if you think aboutlike, um, detecting toxicity, right?
You know, it was not unusual foran L-G-B-T-Q friend on Facebook
(38:06):
to have, you know, screen grabbedsomething where, you know.
A legitimate, you know, not harmfulconversation they were having with a
friend or a peer, you know, had beenflagged by some, uh, you know, um,
, uh, you know, over rigid, simplisticai, uh, for being toxic because of
(38:30):
what that model had been trained on.
And the model was just not capableof understanding those nuances.
Yeah.
So, I mean, it's not, it's not tosay that, uh, you know, there's some
problem that's magically fixed, butyou know, do hold some hope that.
You know, with these really exciting largemodels, you know, with the right human
oversight, the, uh, the pliability isthere for the models to finally be able to
(38:55):
make more nuanced and helpful judgments.
Um, but of course, as you, asyou describe, you know, it, it
really does take the whole villageof, uh, people coming at things
from different perspectives.
Um, so, yeah.
Yeah, yeah.
There's a project there, butI think there's some, some,
uh, some light that's visible.
(39:16):
I
Kathy Baxter (39:16):
yeah, I, there have been,
um, uh, a lot of women, especially
women of color that have been at theforefront of trying to raise these alarms.
Um, so of, of course, uh, joy,uh, with, uh, her coded bias.
Netflix, every time somebody is like, ohmy God, have you seen the social dilemma?
And I'm like, oh my God.
Have you seen coded bias?
Like what?
Please watch that one.
(39:37):
Um, and, uh, Dr.
Hanani, um, she a few years ago.
Uh, had, had published, um, a, awarning, had published a paper about
image models using CSAM, um, uh, or,or child sexual abuse, um, uh, types
of content to train image models.
Uh, Stanford recently, um, publisheda, a report about that, but she
(40:02):
was raising these issues years ago.
Uh, and so Um, you, you have allof these women that have been at
the forefront raising these alarms.
And of course, uh, um, the StochasticParrots paper, Emily Bender, Meg Mitchell,
Timnit Gabru, um, uh, long beforeGeoffrey Hinton had said is, this is bad.
(40:28):
There's, there's risks here.
They had published a papersaying here's all the things
we need to be thinking about.
Here's all the potential harms.
And so we have to make sure thatwe are uplifting those voices,
that we are paying attention to it.
This is, I, I don't believe thatwe are moving towards a place
(40:48):
where AGI is coming to, to kill us.
We know how to do AI safely.
We know what needs to be done.
If we can pull on our big kidpants and everybody make sure that
they are building AI responsibly.
We have that transparency.
We have that discipline.
Both companies that are procuring thistechnology as well as end consumers.
(41:13):
When you decide what productyou're going to use, who are
you going to give your data to?
Who are you going to giveyour attention, your money to?
If we all make those really hard choicesand say, I'm not going to use this product
because I know that they're not doingthings that are very nice, or I don't know
what they're doing, I don't know wherethey got their training data, I don't
(41:36):
know how they're honestly using my data,The market really can have an impact.
It is not enough.
It is not sufficient.
We have seen that.
It's not sufficient, but it can have an,impact, and so it takes everybody paying
attention, um, when these issues areraised, and then taking action on it,
(42:01):
not just being horrified by the Netflix,um, uh, uh, documentary you just saw,
um, but actually taking action on it.
Joshua Rubin (42:10):
Yeah, I mean, to go
back to your FDA, uh, analogy, I
mean, I, I think, uh, you know, whenwe have nutrition labels on these
things, it'll, it'll certainly helpfor consumers to understand, uh, you
know, what they're getting and, uh,you know, what, what's in the sausage.
Kathy Baxter (42:26):
I know whenever I see on
a menu the calorie count, I look at it.
Joshua Rubin (42:31):
Yeah
Kathy Baxter (42:31):
And it has changed
my choices where I think, oh,
I'm going to have the salad.
It's going to be nice and healthy,and I see it's like 1, 200 calories.
I'm like, god, you gotta be kidding me.
And so, um, providing people,empowering people with the information.
(42:51):
People can make better choices whenyou give them the information, and
sometimes you got to force Companies ororganizations to give that information.
Joshua Rubin (43:03):
So, I, I, we should
probably cut over to the Q& A.
I feel like I'm dominated, but Ido want to, I, I, I think one, one
question that is super interestingto me is, uh, is about, um, literacy,
like, of, of the general public.
Because I, I think to, to the, the thingabout the nutrition label, I, I think
most consumers of AI would be surprisedhow many nutrition labels are on every
(43:28):
part of the applications they use.
I mean, down to, you know, I thinkmost people would be surprised to
know that, you know, there were twodifferent AI models selecting products
for them to see on an e commerce site.
One that makes a course selection,and then one that orders the items
on the screen to, you know, like,optimally, uh, uh, you know, sort of,
(43:50):
uh, pique their interest on somethingthat they're most likely to click on.
I wonder what you think about, um, youknow, general AI literacy as kind of,
you know, going together with that.
I mean, otherwise, I think peoplewill just suddenly be shocked when
there's a You know, 10 model cardsin the first page of their Amazon
(44:13):
order or something like that, right?
Kathy Baxter (44:15):
Yeah, again, going
back to explainability versus
interpretability, um, you need tomake sure that you're communicating
the right level of information atthe right time to the right person.
And so, um, model cards are not goingto make sense for, um, uh, every
consumer before they start usinga product, um, but understanding
(44:39):
why did this marketing company, um,make these recommendations to me.
So clicking on that little i.
We have, um, a couple of years ago,we published, uh, recommendations
for responsible AI marketing.
And we recommend to ourcustomers that they prioritize.
(45:02):
And when they're labeling their data,they differentiate between zero party,
first party, and third party data.
So, for those that might not befamiliar, just very quickly, zero party
data, that's the data that I give you.
I tell, um, my, um, favorite Coffee spot,my birthday, because I want to get that
free cup of coffee on my, on my birthday.
(45:23):
So I'm giving you that information.
You can trust that.
Or I fill out a form and I tell you, maybeit's a makeup site, I tell you my skin
tone, my skin, um, issues, my preferences.
party data is the behavioral data.
So what I search for, whatI click on, what I bought.
And then third party, that's usuallyinferences that you make off of what you
(45:47):
are predicting about somebody, or maybeyou purchase it from a big data broker.
Much less reliable, um, uh, not quite astrustworthy, um, and the user likely did
not consent for you to, to make thoseguesses or have that data sold about you.
And so when, um, uh, you, you see thatwe are making these recommendations
(46:10):
for you based on what you told usabout your preferences, based on past
purchases, and then being able to edit it.
Maybe I purchased this thingbecause it was for my, um, my
coworker who just had a baby.
I'm not going to be buyingany more of those things.
Please don't keep giving merecommendations for baby, baby items.
(46:31):
Let me delete that, um, uh, fromthe, from the algorithm so you don't
keep recommending those things.
So, being able to communicate, again,the right level of information.
At the right time and empoweringusers, you can create a much more
accurate AI system is going to getmore engagement from your consumers.
(46:54):
By empowering them, they cantrust you, they get a better
experience, and you get better ROI.
It just makes sense all the way around.
Joshua Rubin (47:03):
So I think we just answered
one of the popular questions, which is
how should companies include human in theloop practices to make sure their AI stays
compliant and helpful to their end users?
Um, I don't know if there's anythingelse you want to add there, or
we could jump to a different one.
Kathy Baxter (47:15):
Yes, so, uh, in a moment,
I'm gonna start in the Q&A populating a
whole bunch of links for folks, and one ofthem that I'm gonna put in there is a link
to, um, uh, one of our user researchers,uh, Holly Prouty, um, published a
piece in December on human at the helm.
and how critical that empowerment is.
(47:37):
A lot of regulations have been proposedto have a human in the loop, a human
that makes the final recommendation.
GDPR actually requires that a human makeUm, uh, or that an AI cannot automate
any decision with legal or similarlysignificant effects, so you can't use
(48:01):
an AI to automatically decide who tohire, who to fire, who to, um, uh,
give, um, uh, some other benefit to.
A human has to make that finaldecision, but it doesn't mean anything
if you don't empower the human toknow, is this an accurate decision?
(48:21):
Fair decision.
Um, otherwise the humanis just a rubber stamp.
And you are, you may be complying withthe, the, uh, letter of the law, but
not with the spirit of the, of the law.
So I will put a, um, a, uh, a link tothat in the, in the Q&A in just a moment.
Joshua Rubin (48:43):
I, um, I don't wanna
say about human in the loop stuff.
I sometimes, so one of my favorite thingsto work on at Fiddler is when we have a
customer who is thinking not just aboutthe AI model but about the application,
um, and that gives you the opportunityto include things like explainability
or other kinds of guidelines likeyou, or guidance like you've been
describing that can help the humaninterpret the model's prediction, right?
(49:07):
I sometimes think that we sort of tendto, when we think about the model focused
sort of AI problem as organizations,we kind of miss that maybe the right
unit of Thinking about these thingsis at the application level where, you
know, you can have all this adjacentinstrumentation and, uh, supplementation
to the model that can help, um,give the human a little bit more.
(49:30):
Like, I think, you know, if modeldevelopment teams owned more of
the application rather than thespecific thing, it would maybe
empower them to apply more of these,uh, principles of, uh, you know,
giving more diagnostic informationin a stakeholder appropriate way.
Um, let me, okay, so here's anotherpopular question, which is, uh, Okay,
(49:55):
so for companies that are startingout with generative AI, what ethical
frameworks should they consider following?
Um, and are they different basedon vertical and company size,
like startup versus enterprise?
Kathy Baxter (50:07):
That is
a fantastic question.
And, um, I'm putting, um, sorry,while I put in links, so just
put in a couple of links there.
Um, so I put in a link to our,um, guidelines for responsible
generative ai and those, those applyto, uh, any company of any size
(50:34):
or any organization of any size.
Um.
I also put in a link to, um, our,uh, ethical AI maturity model that
I had mentioned, uh, earlier, um,just want to make sure I did put that
in there, uh, and so, Depending onthe size of the company is how much
(50:59):
you can do at each of those stages.
Um, you may end up having onlyone person in your organization
that, that is responsible forproviding guidance and expertise on
how to do a, uh, risk assessment.
Or you may be able to have an entirecentralized team or you may be large
(51:22):
enough that not only do you have acentralized team, but you also have
experts that are embedded in differentparts of your organization to so um,
uh, the further you are along thatmaturity, uh, stage, but then also
the size of the company is based onhow, how you distribute, how much
expertise that you have internally.
(51:42):
Right now, there aren't a lotof people with, um, responsible
AI expertise experience.
I'm seeing more and more people thatare graduating, um, that their programs
do have, uh, an emphasis on that.
And so they're coming straightout of school and they have a lot
of enthusiasm, but they might nothave, uh, a lot of experience.
(52:06):
And one of the things that I havefound is in this role, It really takes
a lot of, um, uh, skill to be able tocall people in, not call people out.
This is one of the discussions thatwe talk about at Salesforce a lot.
It's really easy to call somebody outand tell them they're using non inclusive
(52:29):
language, that the idea that they haveis Unethical because it's going to harm
this other group that they have, theyjust don't know anything about this other
group or their lived experience and sothey would have no idea that that what you
just proposed would be harmful for them.
If you are viewed as the ethicspolice, nobody's talking to you.
(52:50):
People look for ways to work around you.
And so, you have to be a true partner.
That you are committed to the successof the teams that you are working with.
And they know that you are there tohelp them to create an even better
product than they could on their own.
And That really does take, uh, um,some amount of experience in having
(53:17):
those kinds of conversations sothat it doesn't feel alienating.
It feels, um, inclusive.
You're, you're drawing the personin to, um, up level and, and create
something that's even better.
So, what I recommend in those cases,if you do have trouble hiring from,
from experts from outside, You can,you can train people internally.
(53:43):
My background is as a userexperience researcher.
I worked for two decades asa, as a UX researcher, uh, co
authored a couple of books.
Um, I have found, and again, showingmy bias, I have found that people with
a user experience research backgroundUm, they may also have an ethics,
like a research ethics background.
(54:04):
They have experience fighting for theuser or fighting for the customer and
really understanding, um, that customersContext and point of view, and so those
can be some amazing individuals thatif you can give them the, or you can
provide for them, um, additional trainingin tech ethics or AI ethics, they can
(54:28):
be really powerful in this role, andthere are some training programs that
are out there now, so if you're not ableto hire externally, then you can, um,
promote from within and have individualswithin the company to take on this role.
Joshua Rubin (54:45):
It does feel like, um,
kind of some of the best, sort of, um,
ethical AI efforts are sort of, end upjust fundamentally being cross functional.
Yes.
It takes the technical knowledge, and ittakes the understanding of the End user,
the product design experience, um, andoftentimes that comes from two or three or
(55:07):
four different places in an organization,um, which I think, I think is interesting.
It does really often circle back to howdo you get more stakeholders voices,
uh, heard as part of the, um, theprocess of constructing an application.
Um,
Kathy Baxter (55:24):
yeah, I mean, I, that kind
of touches on the, the last question,
I believe, uh, I'm, I apologize if I'mgoing to mispronounce your name, Emad,
um, uh, so, one of the, the main thingthat I do in my role is I am that bridge.
I connect all the different parts.
(55:45):
I work with product and engineering.
Uh, Research Scientist, User ExperienceResearchers and Designers, Legal, Privacy,
Government Affairs, all of us together.
And being that glue to make surethat everybody who is responsible
for their part of governance are allworking together on the same page.
(56:08):
I feel like that's one ofthe biggest values that I
bring when I work with teams.
And so we have an, um, EthicalUse Advisory Council and we have
representatives from every oneof those roles in the company
that are part of the council.
We have both executives as well as Um,frontline employees, uh, and we have
(56:31):
external experts that we bring in, um,uh, as part of our oversight as well.
And so I, I recommend that everycompany has an internal governance,
um, uh, council that is representativeof a broad range of roles as well
as demographics, lived experiences,expertise, both internally and externally.
Joshua Rubin (56:59):
Very nice.
Um, I think we're pretty much at time.
Uh, so maybe we wrap here.
Um, so thanks a ton, Kathy.
This is a really interesting conversation.
Hope everybody out therehas a, has a great day.
Thanks.
Kathy Baxter (57:16):
Awesome.
Thanks, everybody.
Bye bye.