Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Speaker 1 (00:03):
Within five years, open source AI will have raised the
GDP in the world's poorest countries. That is a premise
for today's conversation. I'm Asimazar. Welcome to the Exponentially podcast.
Recent reports from the banks and consultancies have suggested that
new breakthroughs in AI, particularly generative AI, could add trillions
(00:27):
of dollars to global GDP. But the most advanced of
these models are built in the West, on expensive supercomputers
and trained using English language data sets. So how can
the global South, with their young populations, tighter finances and
shake your infrastructure share in this productivity boom. Today's guest
(00:48):
is m Ad Mustark, a Jordanian born Bangladeshi immigrant to
Britain who's a founder and CEO of Stability AI. It's
a firm that has accelerated to a billion dollar valuation
in less than three years. Emad has been vocal about
the potential that open source AI offers to the poorest
in the world, but there have been serious criticisms leveled
(01:09):
against him, most prominently in a recent story by Forbes magazine.
These include questions about taking credit for the company's technology,
about some of the partnerships the firm is meant to
have and also governance practices at Stability AI, where Mustak
remains CEO. MD has publicly rejected these as grossness characterizations,
(01:29):
and a debate over the story and others critical of
Stability AI continues to play out on social media and elsewhere.
But as long as investors who have poured hundreds of
millions of dollars into Stability AI continue to stick by
a mad and as long as a firm continues to
innovate and bring out new AI models, he will remain
a powerful force in the expanding AI universe. What is
(01:58):
generative AI and how is it different from all of
the AI systems that we saw previously.
Speaker 2 (02:05):
Generator AI is a new type of AI that started
in twenty seventeen. There was a seminal paper called Attention
is all you Need because not all data is the same.
You pay attention to what's important. So classical AI was
built on this concept of big data. So you had
all this data of Facebook and Google and they used
it to sell you coconut shampoo, but it couldn't go
outside its boundaries. So it's like a very logical The
(02:27):
future is like the past, stable kind of environment this
new type of AI said pay attention to the important
parts of data to compress it. So as people listening
to this, we'll see they might take away a few points,
they're not going to remember our entire conversation. That's what
human mind does. You've got the very logical part that
can memorize stuff, and you've got the part that builds principles, stories,
frameworks for understanding.
Speaker 1 (02:49):
And this paper attention is all you need in a
sense analogize that for a computer system exactly.
Speaker 2 (02:56):
It was the first one that said, this is how
you show it at scale, and let's simplify it down
to a problem of better data and bigger computers. So
using gigantic supercomputers, you can take these big data sets
of text, images and others and can press it down
to just a few gigabytes of file that learns principles,
not facts.
Speaker 3 (03:16):
And so this was the missing.
Speaker 2 (03:17):
Piece in AI, and that's why using these systems feels
actually quite surprisingly human.
Speaker 1 (03:23):
Surprisingly human. But how do we get to the generative
part of all of that.
Speaker 2 (03:26):
The generative part is that you put a prompt in
or some words in, and then it gives you something back.
It generates the outputs and the outputs are not always
even the same because it has principles as a base.
Speaker 1 (03:39):
So in the same sense that if you and I
meet one day, the way the conversation plays out could
be quite different to the next time we meet, because
we have principles of socialization and of behavior and of
how well we know each other, and those get applied
in real time at that moment each time we shake hands.
Speaker 2 (03:59):
In real time, it's a file with just a bunch
of like it's called neural nets.
Speaker 3 (04:04):
Weights.
Speaker 2 (04:05):
Words go in and get shaken out like a pinball
slot machine, and then the output comes. But the output
input can be a painting of a cup in the
style of Vermere, and then it understands the nature of
painting cup Vermire. But cup has so many different meanings,
can mean this cup or cup your hands, or cup
your ears, or a world cup, and it understands those
things in place because it's been trained on images and text. Similarly,
(04:29):
a lot of these language models they've been trained on sentences,
so they look at the context of the sentence and
they say, what's coming next, you know, like that game
of improvisation where you start with the sentence and then
you provide something in the next one, and on we go.
Speaker 1 (04:45):
And it's moving so so quickly. So last summer I
was away in Tanzania on a safari and there's really
no mobile signal. We were away from about two and
a half weeks and when I came back, the Internet
was full of stable diffusion, stable diffusion, stable difusion, stable diffusion.
And that is one of your your generative products. What
(05:06):
it does it takes text and it produces images. So
I can say I want an image of a badger
playing football on a bicycle, and stable difusion will will
produce that image. It could also produce more useful, commercially
interesting images. Recently, you've brought out a new generative product
(05:29):
which is stable LM, which looks a lot like the
these text models that we've seen are running around totage.
Speaker 2 (05:37):
Stable lemmas are language model suites, and what we do
is a bit different from a lot of the other companies,
and that a lot of the focus and the breakthroughs
were because a lot of these research labs, the open
eyes and thropics of the world deep minds, have this
focus on AGI artificial general intelligence.
Speaker 3 (05:52):
Can you be an AI that can do anything?
Speaker 1 (05:54):
Yes?
Speaker 3 (05:54):
It turns out maybe you can so.
Speaker 1 (05:57):
Making a machine that's in our own image in a sense,
that's what AGI sound.
Speaker 2 (06:00):
More than that, it's a general intelligence. So it's the
kid that made you look bad at school because he
was good at everything. You know, the top performer like
GPT four now can pass the bar exam, the medical
licensing exam, the GIRE. It's probably going to Stanford, you know,
and next floor. But then our take was that's great,
(06:21):
you can have these amazing giant general models. What will
be better is to mimic humanity where our companies are
not one generalist doing everything. What if we made it
so you could bring your own data to the models
and have lots of specialists working together and have those
models working for you that you own.
Speaker 3 (06:37):
Right, What if you.
Speaker 2 (06:38):
Had open data, open models and allowed it to be
customized and specialized, so rather than relying on one models
to do everything, instead you optimize them.
Speaker 3 (06:48):
Right.
Speaker 1 (06:49):
So, but the idea is still the same. The idea
is still constructing a model that is generative. I can
give it some text and it can produce something that
is commercially useful or in emotionally useful. So working software
code or a working invitation to a meeting, or you know,
things that will save us time and help us perhaps
be a bit more creative.
Speaker 2 (07:10):
So again, like a talented graduate, and they can learn
very quickly from a few examples. So they have this
base of generalized knowledge. They've been through kindergarten, high school,
and university, but they're not specialized yet. You can train
them yourself, or you can just show them some examples
and they learn very quickly. Unlike classical AI models that
you had to train on the whole data set.
Speaker 3 (07:29):
They weren't good at at aaptation like the badger riding
a bike.
Speaker 2 (07:33):
That's not really a normal thing, you know, And so
that has been the province of well, just us right.
We were able to take these concepts together, whereas a
computer could never merge together concepts. Now we've got that
missing link of being able to take concepts, merge them together,
understand some of these hidden meanings.
Speaker 1 (07:59):
I'm curier about what you think the economic impact of
all of this will be. I mean, there have been
any number of papers coming out from the investment banks
and the research houses and economists in the last few months.
I think Golden Sas had a report that suggested that
that with one of their scenarios, they could see fast
(08:20):
implementation of generative AI across the world, leading to a
seven percent increase in global GDP in about ten years.
What's your sense of what it could do economically.
Speaker 2 (08:30):
I think it's the biggest thing since the Gutenberg Press,
maybe even fire.
Speaker 1 (08:34):
The Gutenberg Press six hundred years ago, or fire two
million years ago.
Speaker 2 (08:39):
I think that humans are driven by stories. It's what
allowed us to form tribes and then money and things
like that are stories. The press allowed us to write
down the stories. But it's very lossy. Again, you me
everyone listening like you're looking through your things right now.
We do power points, we write things, but it doesn't
capture the richness of humanity. Our organizations are built on
(09:02):
layers of text, which is painful, and that's why it
turns us into cogs and the machine, shall we say.
Speaker 1 (09:08):
I mean there's something that's quite powerful about the models.
I think that you are getting to here, which is
that you can feed them text and through that the
machination of the billions of different switches and cogs you
guys call them parameters. Yes, in the system, it starts
to find those underlying relationships that we know, probably deeply
(09:31):
in our brains, but don't express. What we do is
we express words, one word at a time, and they
look at all these words and they're able to find
some representations of reality that actually humans use but can't
touch and describe.
Speaker 2 (09:45):
I think there's that part of it. Another part of
it is just being able again. AI is about information classification.
When you're writing, the hardest thing is to write something
terse and compress. It's a bit easier to write big,
but it's still difficult. The easiest thing for us to
do is talk.
Speaker 1 (10:00):
Now there's the old added. You know, I couldn't send
you a short letter, so I've sent you a long one.
Speaker 2 (10:05):
Right now, anyone anywhere can create any image soon, any
PowerPoint slide almost instantly that looks beautiful. So the fact
that it understands concepts is a big deal because the
barriers to information flow are reduced, so in motion can
flow better around. Our organizations are systems.
Speaker 1 (10:20):
As you eliminate barriers to information flow, you're taking friction
out of systems. You're taking friction out of daily life,
You're taking friction out of business processes, You're taking friction
out of the economy and so we would hope to
see improvements in productivity and with that improvements in prosperity.
Speaker 2 (10:40):
All of finance can be broken down into two things,
securitization and leverage. Securitization is a representation of a asset
of some sort. It's money, the trust of the American government,
it is a bond, it is a property deed, something
like that.
Speaker 3 (10:56):
But you can only have so much information on that.
Speaker 2 (10:58):
You and I, we have our credit score based on
the information of who we are, what we do, our
functional identities.
Speaker 3 (11:04):
Most of the world is invisible.
Speaker 1 (11:05):
The global South. This is people who are under banked,
people who perhaps don't have formal IDs and so on.
Speaker 2 (11:11):
You need identity and you need information to allow for banking.
You need that for finance, and our financial systems are
quite slow. As you get increases in information flow, you
get increases in prosperity because you can direct assets to
where they're needed. You can direct resources to where they're needed.
It's like I always tell people in the team roadmaps,
are they a resource constrained? Their story constrained? Because if
(11:32):
it's a good idea. As as a leader, I hope
I will find you the resources. But you have to
convince me.
Speaker 1 (11:36):
First, I can imagine these models in rich, advanced economies
where there's a huge service sector. There are lots of
people who sit behind computers typing away creating spreadsheets and
PowerPoint slides. You can imagine these models helping economies like that.
But how can we see them helping the Global South
or poorer, less advanced economies.
Speaker 2 (11:57):
One of the reasons these models have got everywhere is
they've become good enough, fast enough, and cheap enough. Stuff
that used to cost dollars, tens of dolls, hundreds of dollars,
thousands of dollars you can now do with a few
simple prompts now. I think it will remove a lot
of the basic tasks and make people more productive as
opposed to leading to mascul unemployment.
Speaker 1 (12:15):
And we're not seeing demand for coders drop off right.
Coders can still get work pretty quickly as they need it.
Speaker 2 (12:21):
Because you will write better code and again a productivity increase.
Smart Phones can take amazing pictures, but there are more
employed photographers in the world now than ever. You know, again,
we adapt, we improve, we use the technology. However, the
Global South is a very interesting promonm We had mobile phones,
you remember it used to be for the rich, only
these big, big things.
Speaker 1 (12:40):
And now in the global South there are mobile phones everywhere.
Speaker 3 (12:42):
There.
Speaker 2 (12:43):
They leapt over the PC to mobile. Yeah, they leapt
over to instant payments. Whereas we took a while to
catch up. In certain of western countries, you still haven't
caught up to instant payments. I think what will happen
is these models become good enough, fast enough and cheap
enough just within the next few years that they will
leap forward to intelligence augmentation.
Speaker 1 (13:01):
To the benefit of these emerging markets. Yes, so let's
think about where we've got to. We've got this very
powerful technology that we've characterized. It's extremely helpful in many
many different ways. But it is the case that these
systems are extremely expensive to build. To train, as it's called,
(13:27):
they require lots and lots of data. The British government
has allocated more than a billion dollars to build a
supercomputer just to train these models. The rumors are that
the GPT four model from open Ai cost hundreds of
millions of dollars to train. But they're also trained on
pretty much everything that you can find on the internet,
(13:48):
a large part of which will be Western American English language,
a strong cultural buias. So it sounds like that not
only can poorer countries not afford this, but even if
they could, the technologies wouldn't necessarily be be suitable for
the economic requirements or the cultural requirements of Tanzania or Bangladesh.
Speaker 2 (14:11):
Yeah, I think this is a real problem. I think
the quality of data we're feeding to these incredible models
is poor. It's scraped from the whole Internet. We need
better data, We need that as infrastructure. There is a
monetary equation if we need giant supercomputers, but more is
a question of talent and expertise. It's complicated to build
these things. This is one of the reasons again kind
of we had stability to do an open version and
(14:33):
build these data sets for each country on an open basis.
Speaker 1 (14:36):
So what do you actually mean by by an open
model and how does that solve the computational problem.
Speaker 2 (14:42):
We got the giant supercomputers, and then we got we
stability ability, yes, and then we made it available, and
then we released an open source so people could take
these models as a base and then extend them.
Speaker 1 (14:52):
But I'm familiar with open source in software. It's a
whereas with closed source. If you're getting a Microsoft word
you buy it from Microsoft, and you can't inspect the
source code the instructions that make it run. You just
run it, so you are fundamentally consumer of it. But
with an open source project like Libra Office, which is
(15:15):
an open source office product, you just download the code.
You can look at the code, you can inspect the code,
you can modify the code, and you can tailor it
to your own requirement. So that's open source in software.
What's open source in a model?
Speaker 2 (15:29):
So in a model, you can inspect the code, you
can expect the data sets, and the model weights themselves
are freely available as a fresh trained graduate, as it were.
And by releasing this openly so you could take it
and adapt it, it's a massive development boom where you
start seeing it everywhere.
Speaker 1 (15:46):
Help me understand the mechanics of all this, because I
think it's important. Stability has its own machine learning AI supercomputer.
So you run up the cost of training these models
for the first time. Yeah, you then release them as models,
data and weights which any developer can take and use.
(16:07):
And when the developer runs them, they run them on
their own computing hardware, and then they're paying for that.
Speaker 3 (16:15):
Yes, in some sense, they don't pay for it.
Speaker 2 (16:17):
But if you want enterprise support, then you work with
us and our partners, or if you want customized versions,
because it's our view that you enterprise will want their
own versions with their own data sets underlying it. Every
country will want their own version because this is the
next generation of infrastructure. The actual comparison is five G.
This is five gv phonology works right, Yes, this is
(16:39):
five G for creativity as it were, it's five G
for information flow. And our trillion dollars has been spent
on five G.
Speaker 1 (16:46):
Right, so we can spend a lot more on AI
systems across our economies. If we come back to your
open source models, it strikes me that one of the
things that you can do with them is you could
make them very culturally relevant. And I think back to
this idea that that sort of Western values get exported
(17:09):
for every country regardless. Back in the day, when you
register for Facebook, it would ask you what your marital
status was, and it was sort of single, divorced, married
or it's complicated.
Speaker 3 (17:21):
Yes.
Speaker 1 (17:21):
And my mum, who was in her late seventies at
the time, is registering for Facebook and is on the
phone to me game, what does it's complicated mean?
Speaker 3 (17:28):
Because it just.
Speaker 1 (17:28):
Didn't exist within her sort of mental space. And it
seems like given how important AI is going to be
as infrastructure, I mean, it is going to be the
layer between me and the services that I access as
a consumer, as a citizen or as an employee. It's
a really critical gatekeeper. So is that part of your vision?
(17:50):
An Indian version of chat GPT, a Brazilian version of
chat GPT, in Indonesian version of chat GPT.
Speaker 3 (17:56):
Yes.
Speaker 2 (17:56):
My vision is that every person, company, country, culture has
their own models that they themselves build and have the
data sets for, because this is vital infrastructure to represent themselves,
to extend their abilities.
Speaker 1 (18:10):
How much of what you're saying is theory? Do you
actually have national models being built across the global South?
Speaker 2 (18:19):
A lot of this stuff is still in the research
phase and now is only just entering the engineering phase,
and so there's a lot we still don't know about
these models. But they're good enough, fast enough, and cheap
enough to do it.
Speaker 1 (18:28):
Okay, even if the models are cheaper, you still have
to get the relevant data because you know, the Internet
doesn't necessarily have lots of information about Pakistani culture in
Pakistani broadcasts in Pakistani media exactly.
Speaker 2 (18:43):
And so what you need to have is you need
to have Pakistani newspapers, Pakistani broadcast and then have Pakistani's
come together to build better data sets that teach these models.
We know the technology now, but we lack the data
and so that is the key blocking point.
Speaker 1 (18:59):
To get access to the data as is needed.
Speaker 2 (19:03):
No to enable it so that as these data sets
are built, the models can then come from there and
then people can build on those models for their own people.
Speaker 1 (19:11):
It almost sounds philanthropic, right, So what is your model
for making money from these open source models that you
are effectively giving away after all your hard work.
Speaker 2 (19:21):
The whole of the infrastructure here in London in the
West is all based on open source. And the model
for open source is that you have an open version
that anyone can use, they can start experimenting with, and
then there are variant enterprise versions that you provide full
support around other services and facilities integration, and you're making
(19:41):
money that way with these models. We have our open
models based on open data and we have open models
based on licensed data from our partners. Because when you
talk to regulated industries and others they want models they own.
They don't want to send their data away to other people,
and they want to know every single piece of data
in there, and they want it to be the best
data you know.
Speaker 1 (20:02):
Essentially, your average young developer or a small startup can
download your models and use them, but if there's an issue,
there's not a lot of support. But if you're a
big company, you're a national government, you might enter into
a more detailed contract where there is support and advice
and potentially even data yes coming through. We know that
AI is a very powerful technology, and Stability has taken
(20:26):
a different path to other firms by going through an
open source approach, which could democratize the technology, making it culturally, linguistically,
locally relevant for any nation, any business, any region, any individual.
But you're up against firms like Microsoft and open AI
(20:46):
and Google and deep Mind and others. How do you
plan to compete?
Speaker 2 (20:51):
I think there's a cab career on addressable market. Our
addressable market is all the private data in the world.
Data you can't send anyone but your personal data or
enterprise data, financial regulated data, and so our models will
go in and transform that into knowledge and we'll have
a hybrid AI think, we've got our models on your
private data, and we standardize all of that, make it
(21:12):
very predictable, loads of support, and then you use these
proprietary systems for the best outcomes. You'll have your own
graduates that you hire, and you'll hire from McKinsey, and.
Speaker 1 (21:21):
You'll put them together.
Speaker 3 (21:22):
You'll put them together.
Speaker 1 (21:23):
But I'm curious because other companies are taking a closed
source approach to proprietary data. So there are companies like
cohere and Anthropic who will build a powerful generative AI
model just on a company's own private data. They're competing
(21:43):
with you as well, right, and so why is your
approach better than that?
Speaker 2 (21:46):
They will not give that company ownership of that model.
They will not share the detail of every single piece
of data that's in that model.
Speaker 1 (21:55):
But that's the case today with lots of the technology
that we use. You know, when I'm running my e
commerce application on the cloud, I don't know the details
of every configuration of the servers that I'm renting from
Amazon or Microsoft as are so businesses are used to that.
Speaker 2 (22:11):
One hundred percent, and you have data that you can
share with people, but there is a core of regulated
data and other things hip a compliant data medical data
that you cannot send to other companies, and you have
to build your own systems for inside regulated environments. The
feedback we've got from regulated entities and again from policymakers
and others is open, transparent models, even if it's licensed
(22:32):
data are something that we would like a lot, and
we would like to own this technology if it's going
to be vital infrastructure.
Speaker 1 (22:38):
Right, we're in London. Now there's a deep bench of
AI skills. How do we expand this and democratize it
out to countries where there's just less talent in these
breakthrough areas.
Speaker 3 (22:52):
I think there's talent, it just hasn't been accessed.
Speaker 2 (22:54):
And these models are very interesting and you can use
AI to help you develop applications of the AI.
Speaker 3 (23:00):
Quite a funny kind of recursion there.
Speaker 1 (23:01):
Right, So you effectively will start to support places where
perhaps the workforce doesn't have the depth of San Francisco's
AI talent with the tools themselves.
Speaker 2 (23:11):
Yes, because the AI models are pre computed, the actual
running of the AI is very computationally non intensive. The
creation of the AI is ridiculously intensive, right, So you've
got all the energy at the start rather than the end.
Speaker 1 (23:23):
So that takes us back to what stability will do.
Stability will take the cost upfront, and then it'll find
rich companies, rich nations, rich clients to tailor the models,
which allows you to continue to make the base foundation model,
which can then be given as open source to anyone
(23:45):
else who yes.
Speaker 2 (23:45):
It stimulates demand, and then as people go up and
they need the support, they come to us and our partners.
As any customization, they come to us on our partners, right,
And it's for private data open models, and then other
models are for data that you either are semi private
or you don't mind, and you combine the two so
you have models of both types.
Speaker 1 (24:04):
There are, of course concerns with the safety of AI systems,
and one argument is that with closed models that are controlled,
there's a lot more safety because if I'm accessing it
over the web, the organization that's running the AI system
can can stop me if I'm trying to do something
bad with it. And with open source models, of course
(24:24):
they're just available for anyone to download, so the cat
is literally out of the bag many many times over,
millions of times over. Is your approach less safe than
the closed approach?
Speaker 2 (24:37):
I think it's more safe. There's a reason that our
infrastructure is based on open source databases and servers and
others because it can be checked, it can be tested,
and it can be fully audited and battle tested. Our
approach to stability is to create the standard around this,
so there aren't thousands of different models. There is an
entity in a partnership and an ecosystem that standardizes around
(24:58):
this principles line safety, water marking, and other things, so
it becomes predictable.
Speaker 1 (25:04):
But you've put the models out for any bad actor,
any hacker, any annoyed employee to build something difficult with.
Speaker 2 (25:13):
We weren't the ones that came up with open models.
We're standardizing it. We're supporting open innovation for detection and
prevention as well as creation.
Speaker 1 (25:21):
But it does sound like bad actors will end up
having a little bit of a field day, which creates
I suspect an enormous opportunity for an AI driven security
and resilience industry.
Speaker 2 (25:33):
The reality is that we're stronger together when things are open,
and open is required for all the private, regulated and
other data out there. If you don't have open systems,
then you will only have proprietary entities, and they become
the choke points on the Internet, and that's far more
dangerous than the other side. Open is there anyway, But
like I said, let's standardize it, let's make it safer,
(25:54):
and let's work together to combat the bad as opposed
to leaving it with a few unelected giant companies.
Speaker 1 (26:00):
Part of the story of technology is that technology has
been exported from one place really for everyone else to use.
I think one exciting opportunity now is the idea that
the people on whom this technology is going to operate
could potentially build their own Now, many of those people
are going to be in the Global South, and the
(26:23):
premise of our conversation is that within five years, safe
and open source generative AI could make communic for contribution
to increase the GDP of those world's poorest nations. How
likely is it that this vision could become reality? I
think it's incredibly likely. The desired talent and passion to
(26:43):
adopt technology like this is huge within the Global South,
and it is where it can have the most impact
the highest ROI. So I think they'll take the building
blocks that we and others provide and they'll build some
amazing things to activate their potential. Mada, it's a great vision.
Thank you so much, my pleasure, thank you for having me.
(27:08):
Reflecting on my conversation with Emmad, I'm reminded that much
of the software that powers the Internet today, used by
billions of US, is actually open source. It's proven to
be resilient, stable, and importantly affordable. The open approach is
one reason why the Internet is today ubiquitous, So why
wouldn't that be true for generative AI? And if the
(27:31):
technology can live up to its promise of improving productivity,
wouldn't the open approach make it more widely accessible to
the poorest countries in the world. That seems to make
sense to me. Thanks for listening to the Exponentially podcast.
(27:53):
If you enjoy the show, please leave a review or rating.
It really does help others find us. Any podcast is
presented by me Azeem Azar. The sound designer is Will Horrocks.
The research was led by Chloe Ippah and music composed
by Emily Green and John Zarcone. The show is produced
by Frederick Cassella, Maria Garrilov and me Azeem Azar. Special
(28:15):
thanks to Sage Bauman, Jeff Grocott and Magnus Henrikson. The
executive producers are Andrew Barden, Adam Kamiski, and Kyle Kramer.
David Rivella is the managing editor. Exponentially was created by
Frederick Cassella and is an e to the pie I
plus one Limited production in association with Bloomberg LC