All Episodes

May 30, 2025 • 15 mins

Every aspect of Amazon is leveraging artificial intelligence, says Matt Garman, CEO of Amazon Web Services. Garman discuss Amazon’s AI roadmap and reflects on his first year in the role with Ed Ludlow on “Bloomberg Technology.”

See omnystudio.com/listener for privacy information.

Mark as Played
Transcript

Episode Transcript

Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Speaker 1 (00:02):
Bloomberg Audio Studios, podcasts, radio news.

Speaker 2 (00:16):
Welcome to our Bloomberg radio and television audiences worldwide. We
go right now to a conversation with Matt Garman, AWS CEO. Matt,
it's good to catch up. It has been basically one
year that you've been in the role as AWS CEO.
Is a place to start what has been the biggest
achievement in that time for AWS.

Speaker 3 (00:39):
Yeah, thanks for having me on. It's nice to be
here again. Yeah, it's been a fantastic year of innovation.
It's really been incredible and as I look out there,
one of the things that I've been most excited about
is how fast our customers are innovating and ten adopting
many of the new technologies that we have. And as
you think about customers that are on this cloud mication journey,

(01:01):
many of them have been doing that for over the
last several years, but this year in particular, that we've
really seen an explosion of AI technologies, of agentic technologies,
and increasingly we're seeing more and more customers move their
entire estates into the cloud and AWS.

Speaker 4 (01:17):
So it's been really fun to see.

Speaker 3 (01:18):
It's been an incredible pace of technology and it's been
a really fun first year.

Speaker 2 (01:23):
The moment that investors kind of sat up and paid
attention was when Amazon said that it's AI business was
at a multi billion dollar run rate in terms of sales.
What we don't understand as well is what proportion of
that is AWS infrastructure?

Speaker 3 (01:41):
Yeah, that is AWS, right, And so the key is
that's a mix of customers running their own models. Some
of that is on Amazon Bedrock, which is our own
hosted models, where we have first party models like Amazon Nova,
as well as many of the third party models like
Anthropics models, and some of those are applications things like
on q which helps people do automated software development, as

(02:04):
well as a host of other capabilities, and so there's
a mix of that, and I think part of the
most interesting thing about being at a multi billion dollar
run rate is we're at the very earliest stages of
how AI is going to completely transform every single customer
out there.

Speaker 4 (02:18):
And we talk to customers and we look at where the.

Speaker 3 (02:20):
Technology landscape is, and we firmly believe that every single business,
every single industry, and really every single job is going
to be fundamentally transformed by AI. And I think we're
starting to see the early start the stages of that.
But again we're just at the very earliest stages that
I think what's going to be possible, and so that
multi billion dollar business that we have today is really
just the start.

Speaker 2 (02:42):
Can you give me a generative AI revenue number.

Speaker 4 (02:47):
For the world or for awls?

Speaker 2 (02:49):
Are you guys for AWS? Maybe Amazon as a whole.

Speaker 3 (02:52):
Yeah, Like I said, we are in multiple billions of dollars,
and that's for customers using AWS. We also use lots
of generative AI inside of Amazon for a wide range
of things. We use it to optimize our fulfillment centers.
We use it when you go to the retail site
to summarize reviews, or to help customers find products in
a faster and more interesting way. We use AI in

(03:15):
Alexa in our new Alexa Plus offering, where we conversationally
talk to customers through the Alexa interface and help them
accomplish things through voice that they were never able to
do before. So every single aspect of what Amazon does
leverages AI, and our customers are exactly the same. Customers
are looking to AWS to completely change, whether it's their

(03:38):
contact centers through something like Amazon Connect where it shows
AI capabilities so that you don't have to go program
it all the way down to our custom chips or
Nvidia processors or anything where customers at the metal are
building their own models. We have the whole range of
people that are building AI on top of AWS as
well as Amazon themselves.

Speaker 2 (03:59):
We always credit AWS is being number one hyperscala. But
just what you said there about what the client's using
the silicon level through to capacity, it would really help
if you could proportionately tell me what percentage of workloads
are being run for training and which proportion of workloads

(04:20):
being run for inference.

Speaker 4 (04:21):
Sure, yeah, and that changes over time. I think.

Speaker 3 (04:25):
Look as we progress over time, more and more of
the AI workloads are being inference. I'd say in the
early stages of AI and general of AI, a lot
of that usage was dominated by training as people were
building these very large models with small amounts of usage.
Now the models are getting bigger and bigger, but the
usage is exploding at a rapid rate, and so I

(04:45):
expect that over the fullness of time, eighty percent, ninety percent,
the vast majority of usage is going to be in
inference out there, and really and just for all those
out there, inference It really is how AI is embedded
in the applications that everybody uses. And so as we
think about our customers building, you know, there's a small
number of people who are going to be building these models,

(05:05):
but everyone out there is going to use inference as
a core building block in everything they do. And every
application is going to have inference, and already is starting
to see inference built in to every application. And we
think about it as just the new building block. It's
just like compute, it's just like storage, it's just like
a database.

Speaker 4 (05:23):
Inference is a core building block.

Speaker 3 (05:25):
And so as you talk to people who are building
new applications, they don't think about it as AI is
over here and my application is over here. They really
think about AI is embedded in the experience. And so
it's increasingly I think it's going to be difficult for
people to say what part of your revenue is going
to be driven by AI. It's just part of the
application that you're building, and it's going to be a
core part of that experience, and it's going to deliver

(05:47):
lots of benefits from efficiency, from capabilities, and from user
experience for all sorts of applications and industries.

Speaker 2 (05:54):
But present day, it's fair to say majority is still training.

Speaker 3 (05:58):
No, I think that at this point more definitely more
usage as inference than training.

Speaker 2 (06:02):
We want to welcome our radio and television audiences around
the world. We're speaking to AWS CEO Matt Garman, who
officially next week celebrates one year in that role leading AWS.
A new metric that has been discussed, particularly this earning season.
We discussed it with Nvidia CEO Jensen one this week
is token growth and tokenization. Has AWS got a metric

(06:27):
to share on that front?

Speaker 3 (06:29):
I don't have any metrics to share on that front,
but I think it's one of the measures that we
can look at as the numbers of tokens that are
being served out there, but it's not the only one,
and I increasingly think that people are going to be
thinking about these things differently. Tokens are a particularly interesting
thing to look at when you're thinking about text generation,
but not all things are created equal.

Speaker 4 (06:49):
I think, particularly as you think about.

Speaker 3 (06:51):
AI reasoning models, the input and output tokens don't necessarily
talk about the work that's being done, and increasingly you're
seeing models can do work for a really long period
of time before they output tokens, and so you're having
these models that can sometimes think for hours at a time. Right,
you ask these things to go and actually do research

(07:12):
on your behalf. They can go out to the internet,
they can pull information back, they can synthesize, they can
redo things. If you think about coding and que developer,
we're seeing lots of coding where it goes and actually
reasons and does iterations and iterations and improves on itself,
looks at what it's done, and then eventually outputs the
end result. And so at some point kind of the

(07:33):
final output token is not really the best measure of
how much work is being done. If you think about images,
if you think about videos, there's a lot of content
that's being created and a lot of thought that's being done.

Speaker 4 (07:43):
And so tokens are one aspect of it.

Speaker 3 (07:46):
And it's an interesting measure, but I don't think it's
the only measure to look at. Although they are rapidly increasing.

Speaker 2 (07:53):
Project RAY near Massive Custom Server Design project. Yeah, what
is the operational statu and latest on project right now?

Speaker 3 (08:02):
Yeah, So we're incredibly excited about so project right here
is a collaboration that we have with our partners at
Anthropic to build the largest compute cluster that they'll use
to train their next generation of their claud models, and
Anthropic has the very best models out there today. Claude
four just launched, I think it was last week, and

(08:23):
it's been getting incredible adoption out there from our customer base.
Anthropic is going to be training their next version of
their model on top of Trainium two, which is Amazon's
custom built accelerator processors purpose built for AI workloads, and
we're building one of the largest clusters ever released. It's

(08:44):
an enormous cluster, more than five times the size of
the cluster compared to the last one that they trained on,
which again is the world's leading model.

Speaker 4 (08:52):
So we're super excited about that.

Speaker 3 (08:54):
We're landing Trainium to servers now and they're already in
operation and Nthropic has already is our using parts of
that cluster, and.

Speaker 4 (09:01):
So super excited about that.

Speaker 3 (09:02):
And the performance that we're seeing out of Trainium too
continues to be very impressive and really pushes the envelope
I think on what's possible both from an absolute performance
basis as well as a cost, performance and scale basis.
I think some of those are equally going to be
really important as we move forward in this world, because
today much of the feedback you get is that AI
is still too expensive. But costs are coming down pretty aggressively,

(09:25):
and it's still too expensive, and so we think there's
a number of things that need to happen there. Innovation
on the silicon level is one of those things that
needs to help bring the cost down, as well as
innovation on the software side and algorithmic side so that
you have to use less compute per unit of inference
or training. So all of those are important to bring
that cost down to make it more and more possible

(09:46):
for ADI to be used in all of the places
that we think that it will be over time Matt.

Speaker 2 (09:52):
On Wednesday, Nvidia CEO Jensen Wang summarized inference demand for me.
I just wanted to play you that SoundBite.

Speaker 4 (09:58):
Sure, well, we.

Speaker 5 (09:59):
Got a whole bunch of engines firing right now. The
biggest one, of course, is the reasoning AI inference. The
demand is just off the charts. You see the popularity
of all these AI services.

Speaker 2 (10:13):
Now your pitch for trainium too. And as you know,
I've kind of taken a part the serve a design
and looked at it is the efficiency and cost efficiency
relative to Nvidia tech. Are you seeing that same demand
Jensen outlined for Trainium two outside of the relationship with Amthropic.

Speaker 3 (10:33):
Yeah, Look, we're seeing it across a number of different places,
but it's not really Trainingum two versus in Nvidia, and
I think that's not really the right way to think
about it. I think there's plenty of room. The opportunity
in this space is massive. It's not one versus the other.
We think that there's plenty of room for both these
and Jensen and I speak about this all the time
that in Vidia is an incredibly fantastic platform. They've built

(10:53):
a really strong platform that's useful and is the leading
platform for many many applications out there, and so we
are incredible design partners with them. We make sure that
we have the latest in video technology for everyone, and
we continue to push the envelope on what's possible with
all of the latest in Vidia capabilities. And we think
there's room for Trainium and other technologies as well, and

(11:14):
we're really excited about that, and so we have many
of the leading AI labs are incredibly excited about using
Trainium too, and really leaning into the benefits that you
get there, But for the law for a long time,
these things are going to be living in concert together,
and I think there's plenty of room, and customers want choice.
At the end of the day, Customers don't want to
be forced into using one platform or the other. They'd

(11:37):
love to have choice in Our job at AWS is
to give customers as much choice as possible.

Speaker 2 (11:42):
What is general availability of Nvidia GB two hundred for AWS?
And have you, I guess, launched Grace Blackwell backed instances yet?

Speaker 3 (11:52):
Yes, yep, so we've launched our they would call them
P six instances, And so those are available in AWS
today and customers are using them and liking them and
the performance is fantastic. So those are available today. We're
continuing to ramp capacity. We work very closely with the
Nvidia team to aggressively ramp capacity and demand as strong

(12:12):
for those P six instances. But customers are able to
go and test those out today, and like I said,
we're ramping capacity incredibly fast all around the world and
in our various different regions.

Speaker 2 (12:25):
Now, what is your attitude to Claude Anthropics model being
available elsewhere on Azure Foundry for example?

Speaker 4 (12:35):
Great I mean that's okay too.

Speaker 3 (12:36):
I think many of our customers make their applications available
in different places, and we understand that various different customers
want to use capabilities in different areas and different clouds.
Our job is to make AWS and this is what
we do, is to make AWS the best place to
run every type of workload, and that includes anthropic claud models, but.

Speaker 4 (12:58):
It includes a wide range of things.

Speaker 3 (13:00):
And frankly, that's why we see big customers migrating over
to AWS. Take somebody like a Mondali's who's really gone
all in with AWS and moved some of their workloads
to there. One of the reasons is that they see
that we have capabilities sometimes using AI by the way,
in order to really help them optimize their costs and

(13:20):
have the most available, most secure platform in monthlies. This case,
they're taking many of their legacy Windows platforms and transforming
them into Linux applications and saving.

Speaker 4 (13:30):
All of that licensing costs.

Speaker 3 (13:32):
But we have many customers who are doing that, and
so our job is to make AWS by far the
most technically capable platform that has the most and widest
set of services, and that's.

Speaker 4 (13:44):
What we do.

Speaker 3 (13:45):
But I'm perfectly happy for other people to use, Like,
it's great that Claud's making their services available elsewhere and
we see the vast majority of that usage happening in AWS.

Speaker 4 (13:54):
Though.

Speaker 2 (13:55):
Will we see open AI models on AWS this year?

Speaker 3 (13:59):
Well, just like you know, we encourage all of our
partners to be able to be available elsewhere. I'd love
for others to take that same tack.

Speaker 2 (14:08):
Let's end it with this a question from the audience actually,
which is where you're going to grow data center capacity
around the world. I got a lot of questions from
Latin America and Europe in particular where Jensen flies to
next week?

Speaker 4 (14:20):
Yeah. Great.

Speaker 3 (14:22):
So in Latin America we're continuing to span expand our
capacity pretty aggressively. Actually, earlier this year we launched our
Mexico region, which has been really well received by customers,
and we've announced a new region in Chile. And we
already have and for many years have had a region
in Brazil which is quite popular and has many of
the largest financial institutions in South America running there. So

(14:45):
across Central and South America, we are continuing to rapidly expand.
In Europe we're expanding as well. We have many regions
already in Europe. One of the things I'm most excited
about actually is at the end of this year we're
going to be launching the European Sovereign Cloud, which is
a unique capability that no one has, which is completely
designed for critical EU focused sovereign workloads, and we think

(15:08):
given some of the concerns that folks have around data sovereignty,
particularly for government workloads as well as regulated workloads, we
think that's going to be an incredibly op popular opportunity
for everybody.

Speaker 2 (15:20):
Matt Garman AWSCO, thank you very much.

Speaker 4 (15:24):
Thank you for having me
Advertise With Us

Popular Podcasts

Stuff You Should Know
Dateline NBC

Dateline NBC

Current and classic episodes, featuring compelling true-crime mysteries, powerful documentaries and in-depth investigations. Follow now to get the latest episodes of Dateline NBC completely free, or subscribe to Dateline Premium for ad-free listening and exclusive bonus content: DatelinePremium.com

24/7 News: The Latest

24/7 News: The Latest

The latest news in 4 minutes updated every hour, every day.

Music, radio and podcasts, all free. Listen online or download the iHeart App.

Connect

© 2025 iHeartMedia, Inc.