How to Perform Cloud Selection for AI Workloads

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Speaker 1 (00:14):
Hello everyone and welcome back to another episode
of the Cables to Clouds podcast.
My name is Chris Miles at BGPMain on Blue Sky.
At BGP Main on Blue Sky,joining me, as always, is my
ever happy and gleeful co-host,tim McConaughey at Carpe Diem
VPN on Blue Sky as well.

(00:34):
Today.
We have a very special episodefor you today.
So we've obviously been talkinga lot about AI in this podcast,
sometimes when we want to,sometimes when we don't want to.
Luckily, today, this is a veryspecific use case that we do
want to talk about.
So we have a guest joining usSam Zamanian Did I get that
right?
Sam Zamanian, who is currently aprincipal advisory director at

(01:00):
Infotech Research Group and,oddly enough, our first actual
Australian resident that we'vehad on the podcast, aside from
myself.
So this is a big day for us.
We've had Peter on here a fewtimes, but he's since flew the
coop, so he doesn't really getto count.
So, yeah, we have aninteresting topic.

(01:21):
Today.
We're going to be talkingspecifically about cloud and
infrastructure selection foryour AI workloads and how to
appropriately do that.
So Sam's been doing a lot ofwork in this space, so we
thought we'd bring him on tohave a discussion about it.
So, sam, I'll shoot it over toyou.
Can you tell me a little bitabout yourself, who you are and
what you do?
Thank you for having me.

Speaker 2 (01:43):
As you said, my name is Sam Zemanian and I'm a
principal director with theinfrastructure and operations
advisory at Infotech, which is aglobal research and advisory
firm, and prior to that, I'vebeen around the IT industry for
20 plus years.
I come from an applicationbackground and then I moved to

(02:05):
architecture and that's where Ispent most of my recent career
at, in various capacitiesapplication solution
architecture and a little bit ofenterprise architecture as well
and I was followed by spendinga couple of years doing
consulting for a global systemintegrator, and my architecture

(02:27):
journey has been predominantlyin the financial services space,
and cloud has always been oneof the major areas of focus
there.
I've seen lots of moves, shiftsand evolution, both from the
CSP side, but also as well asthe industry.
And yeah, that's fast forwardto my current role research and

(02:52):
advisory and I've had theopportunity to do a bit of
research on how to make sense ofinfrastructure and AI workloads
in particular, or vice versaand, yeah, we can roll this
topic and I'm hoping that peoplewill take away some insights
from this podcast.

Speaker 1 (03:12):
Yeah, absolutely Well , thanks for joining us and yeah
, let's not waste any time,let's hop right in.
So, like I said, we want totalk about cloud and
infrastructure selection forspecific AI workloads that you
want to run, right.
So I think we probably need todo a little bit of level setting
.
So first, let's kind of talkabout the lay of the land and
get an overview of what AIworkloads is, right.

(03:34):
Obviously, there's a bigcommotion right now these days
around AI itself.
Right, but AI and ML have beenaround for many years.
Right.
This has been a very prominentthing, specifically in cloud,
for quite some time.
So let's kind of understand thelay of the land, like what are

(03:54):
AI workloads that you could andcould run, either on-prem or
within cloud?

Speaker 2 (03:59):
AI generally comes down to three stages from a
workload perspective, and peoplealready know this.
There is data prep, or data workthat involves bringing data
from different places in variousshapes and sizes, and then the
data then will have to bestructured, refined, cleansed or

(04:24):
cleaned up, making sure thatirrelevant entries are removed
and that becomes meaningful andrelevant to the next step, which
is training.
That is the foundation of whatAI does.
The data then is used todevelop AI models and the model
will learn patterns andbehaviors and relationships from

(04:47):
data.
And then the next step, which Icall the production mode, is
where the model gets deployed tosupport the use case or
application in question and thenuses the patterns that it's
learned to make predictions ordecisions based on unseen new

(05:07):
data.
So these are three categoriesof, or three steps of, workloads
, that they come under one andumbrella of our workloads, and
that gives us a lot of leverageto be able to, you know, make
decision or to drive theconversation around the
selection of cloud slashinfrastructure for AI and gives

(05:28):
us lots of opportunities tooptimize for each of these
specific categories.

Speaker 3 (05:34):
Right, okay, do we?
I mean, are we talking aboutbasically, like high performance
compute workloads?
Is that basically what we'retalking about?
We're talking about chunkinghuge amounts of data and doing
an ETL extract, transform, loadtype of operation.
Is that what you're talkingabout?
We're talking about chunkinghuge amounts of data and doing
an ETL you know, extract,transform load type of operation
.
Is that what you're talkingabout?

Speaker 2 (05:48):
Yeah, well, if I take the same breakdown and we can
go through them each by each fordata step.
You know, like you said, theremight be HPC involved to
structure data, but we need ascalable storage and flexible
storage.
Flexibility is a key here,because data comes from

(06:12):
different shapes and a varietyof formats.
Scalability is important from astorage point of view because
we want to be able to move dataaround pretty quickly and
efficiently, and you needsufficient network bandwidth
there as well.
For training, hpc becomescritical in the sense that we

(06:34):
need a compute powerhouse thatsupports parallelism.
And why parallelism?
Because we're dealing with vastand significant amount of data
and super complex calculations,and parallelism will make it
possible to train data in atimely fashion.

Speaker 3 (06:51):
Right, Like hyper-threading basically.

Speaker 2 (06:54):
There is the classic HPC and we can get to this, and
there is the AI or the nextevolution of HPC, which is more
appropriate for AI and thinkingabout AI accelerators we can get
to that discussion as well.
And for inferencing and that'sthe interesting part you may be
able to get away with generalinfrastructure, unless you're

(07:16):
dealing with real-time use casessuch as mission-critical
applications.
But then the evolution frompredictive ML slash AI to
generative I don't know if Imentioned that from a different
perspective.
There is predictive AI, whichyou know enables people to make
decisions, make recommendationsand do forecasts, and there is

(07:39):
generative AI, which is aroundgenerating content, and that the
latter has shifted again andraised the bar for both training
and inferencing from a from aninfrastructure standpoint, and
resource consumption has becomea critical challenge as you move

(07:59):
from classic ai, ml2, generatedright AI and also generated
there Right?

Speaker 1 (08:03):
So I mean, I guess maybe kind of just a dumb
question, but in that particularconcept of traditional AI ML
workloads versus Gen AI, isthere any significant difference
from an infrastructureperspective?
What needs to be accounted forin either one of those?

Speaker 2 (08:21):
Yeah, the short answer is there are a lot and
just to give you an idea, in thegenerative AI we talk about
large language models a lot,which wasn't the case with
classic AI, and it can takebetween a few months to a year
to train a large language model.
That's the ballpark.
Of course, not every use casecan be as aggressive as that,

(08:43):
but that's technically the worldthat we are at, and Microsoft
used a supercomputer for OpenAI,microsoft slash, nvidia and
that took about a couple ofweeks, if I'm not mistaken.
That otherwise would have takena year just to train GPT-3
models.
So that's the scale that we'retalking about here, gpt-3 models

(09:04):
.
So that's the scale that we'retalking about here.
Also, from inferencing point ofview, there is a challenge of
how you can load a largelanguage model efficiently into
memory, how you can distributeyour AI models across different
GPUs or across different nodes.
It's become a combination ofsystem design and infrastructure

(09:25):
in a way that yourinfrastructure choices will
dictate some of your designprinciples and vice versa.
You would have to think abouttraffic management, for example,
with Gen AI, a lot moreaggressive than classic AI ML.
With large language model.
We need to think about cachingfrom an inferencing point of
view which wasn't a challengeback in the classic ALM days,

(09:50):
and also from infrastructure.
GPUs and other types ofaccelerators are highly demanded
.
There is a supply shortage andbottleneck around that which has
never been the case before.
So the game has been taken tothe next level.

Speaker 3 (10:08):
I'm just curious, because what we just talked
about being the large amount ofresources, the infrastructure
requirements and everything, doyou think and I know it's really
early now and I don't know ifyou've had a chance to look at
it all but what do you thinkabout the DeepSeek thing, about
what they're saying, about beingable to do essentially a lot of

(10:29):
the stuff that these LLMs aredoing with a fraction of the
resources?

Speaker 2 (10:33):
Do you think there's any accurate truth to that or is
it embellishment from the Ialways think that optimization
has been one of the challenges,but we got better and better at
it.
There's been this notion thatwhen you deal with AI, you got
to go with premium resources andthat breakdown that I just
described would help youoptimize.

(10:53):
But there is tons of otheropportunities to optimize.
Now this deep seek is like twodays old at least.
We haven't got that muchinformation.
But haven't got that muchinformation.
But just from a costperspective, the cost of running
a GPT-4 would be around $100per million tokens Tokens yeah,

(11:16):
I saw that and the other one is,I think, $7 per million tokens.
That's the difference.
We'll have to wait and see howthat will come about.
But again, there's tons ofopportunities to optimize.

Speaker 3 (11:29):
Yeah, with the hardware.

Speaker 2 (11:31):
Yeah, correct, exactly.
Even with cloud, withvirtualized hardware, the key
part is to break it up andunderstand what those sweet
spots are, and that would beheavily use case driven as
opposed to infrastructure drivendecision.

Speaker 1 (11:46):
Right, okay.
So yeah, I think that's arelatively good segue.
So let's.
I mean I feel like on thispodcast we obviously are very
cloud centric, right, and wetalk about cloud each and every
day and our day jobs as well.
So we kind of get lost in thesauce, so to say, and we
understand kind of theimplications on why it's easy to
do things in cloud, especiallywhen you need specific things

(12:08):
like resources for workloadslike AI.
But let's kind of go back tothe origin of that and maybe
talk about the conversation ofdoing something within cloud
versus doing it on-prem right,about the conversation of doing
something within cloud versusdoing it on-prem right, and what
are some of the common decisionfactors that would come in.

Speaker 2 (12:28):
When you want to choose doing this in the cloud
versus on-prem and that's alwaysan open question you can start
with cloud, even though you canbe skeptical about choosing
cloud for your workloads.
And before I get to that, I'dlike to clarify two things.
One is that the factors thatdrive cloud versus on-prem
aren't unique to AI.
They still apply to othernon-AI, like general enterprise

(12:52):
workloads as well.
It's just that they may mattermore with AI just because the
impact of getting them wrong maybe larger.
And the second part to that isthat cloud means different
things to different people.
If I'm in DevOps, then I thinkof cloud as a bunch of pre-built

(13:14):
, readily available tools thatthey can just start with.
If I'm in the line ofbusinesses space, I'll see the
CRM application SaaS-based CRMapplication as a cloud
capability, and that really is.
If I'm an architect, I probablythink about cloud as a
cloud-native architecture.
And, for sure, if I'm aninfrastructure person, I see

(13:36):
cloud as a destination or ahosting mechanism.
And AI is no different.
It's a cross-functionalcapabilities.
We still need ML apps.
We need to think aboutinfrastructure.
We need to think about how youembed your AI solution in the
application.
My well, at least from an AIpoint of view.
The only reason that I would usecloud or recommend cloud well,

(14:00):
let me rephrase it to the onlytime that I think that cloud
would be chosen is theconvenience factor.
If I'm a beginner in the domainand I want to get something
quickly up and running, I willstart the cloud because that
becomes a low friction option.
I don't have to think about HPCprocurement, because that tends

(14:23):
to be from an on-prem point ofview.
That tends to be cumbersome,slow and complex.
I don't need to deal withsupply issues.
I don't need to deal withmassive upfront investments and,
of course, skill set.
So if I need to start withtools, then my only choice
becomes cloud.
That doesn't mean that that'sthe right answer, but that might

(14:44):
be the right answer now.

Speaker 3 (14:45):
Right, like with a lot of apps right, where you'll
prototype in the cloud, you'llfigure it out in the cloud and
then you buy the you know somepeople, especially if it's a
24-7 type of workload, whichthis could certainly be you'll
figure out the app, you know,figure it all out and then bring
it back on prem and at thatpoint you know what to procure,
you know what the requirementsare and everything.

Speaker 2 (15:05):
Martin SPLITT yeah, absolutely, and that's the
convenience aspect.
It's convenient to start withCloud and then we can figure it
out later.
Although, from an on-prem pointof view, I think there are two
reasons that hold people backfrom choosing Cloud.
One is predictability, mostfrom a performance.
Well, definitely from a costperspective, because the pay as

(15:27):
you go, nature, really For sure,is going to create a lot of
variability surprises from thatperspective, but again, only for
AI or only for HPC typeworkloads.
Also from a performance pointof view as well, because cloud,
potentially the multi-tenancyaspects to it, can lead to
performance fluctuations,although I haven't heard many

(15:48):
stories around that, you know,being the cost of the default of
the CSP, but that's technicallypossible.
And the other side, you know goon-prem over cloud is controls.

Speaker 3 (16:01):
Of course.

Speaker 2 (16:02):
There's a classic one around data and security, but
there is also a specific onewhich is around customization.
Cloud would give you littleopportunities to do deep
customization, and by which Imean there might be cases and
this is again heavily use casedriven that you need to align

(16:22):
your hardware choices back toyour workload and that gives you
a lot of leverage to optimizefrom a performance point of view
.
You don't get that from thecloud, because we're dealing
with a pre-selected range ofservices and virtual devices.
So again, control andpredictability two choices there
, Two reasons why people choose.

(16:43):
On Ankur Mootar.

Speaker 1 (16:45):
So not to go back to the why people would choose
cloud option, but like how muchare you seeing, at least in the
market today, where people arechoosing cloud purely out of
availability standpoint?
Out of availability standpoint,like maybe they can't get GPUs,
you know at a timely fashionthat they need to start their

(17:07):
process, and how easy is it toswitch from one to the other if
you've already started down thetrajectory right?
Maybe you know GPUs are goingto come in for a few months?
Is it easy to start and moveback and forth?
Are you seeing many people dothat?

Speaker 2 (17:20):
So I just your first question.
In essence, most people thatstart with Cloud.
That's a common choice, like Isaid, because of the challenges
of skill set on making the rightchoice when it comes to picking
the infrastructure for theirworkloads, but also the lead
time that Right procurement yeah.
Procurement and lead time aroundthe deployments.

(17:41):
And to add to your secondquestion, how easy it would be
to go back and fix things in thefuture.
That would be heavily comingdown to the workload design.
In the classic days we had IaaSinfrastructure as a service and
then PaaS platform as a serviceand we said that it's easy to

(18:03):
move infrastructure as a serviceworkload around but at the same
time it's not the right peak tobe in the cloud.
Because you still have to manageit, and the same thing applies
here, although the onlydifference is that I don't see
many people doing infrastructureas a service for their AI
workload.
And the more you go towardsplatform as a service and SaaS,

(18:25):
the harder it would be for youto shift around in the future.
The reality is that AI alsorelies on lots of open source
frameworks, tools such asTensorFlow's PyTorch.
They can run in containers.
You can use Kubernetes, so themore you sort of lean towards
open source, that would be easyto shift in the future.

Speaker 3 (18:46):
I see yeah, okay, yeah, that makes some sense.
I've seen like I was doing somework with Andrew Brown on his
Gen AI bootcamp that we're doingor that he's doing and I'm
helping a little bit with verypoorly, but yeah, I mean he was
showing me some of the opensource tools you know and where

(19:07):
you pull a lot of the opensource models and stuff like
Lama, index and whatnot.
Do you end up running theseopen source tools?
Just, I guess, just yeah, likeyou said, just run them as an
app on on Kubernetes.
You build a Kubernetes clusterto run all these applications
and then, in theory, kubernetesis portable.
You should be able to.
If you wanted to run your cubeon prem, then you could just

(19:28):
move it.
Essentially.

Speaker 2 (19:30):
Yeah, that's the theory, right, and Kubernetes
comes with its own.
Speaking of Kubernetes, its owntool set on AI.
They've got an MLOps toolbox,which I can't remember what,
something along the lines of Q,so they come with their own
toolbox around AI.
Obviously, there is just thechoice of MLM frameworks, which

(19:56):
most people tend to go to eitherKeras or PyTorch or TensorFlow.
So in theory, you can shiftthese things around, but for as
long as you have theinfrastructure to support that.
And there is still some levelsof locking with this CSP or that
CSP that needs to be managed.
But from a technical standpointthere is opportunity to do that

(20:19):
in the future if you have to dothat?

Speaker 1 (20:21):
Yeah, I mean I think we've been talking about this on
the show for a while now isthat the very strong difference
between a company that's wantingto leverage, you know, maybe
specific things in the cloud totrain their own models and do
their own things, versus someonethat just wants to consume the
off the shelf models, that thatare relatively general purpose
or use case focused?

(20:42):
Right, and I can see rightthere, like if you're consuming,
like an AI service or somethinglike that, just to give an
example, like if you'reconsuming AWS, bedrock for you
know how you run yourapplications moving that on-prem
is not trivial, right, that'snot just a lift and shift,
because that is, you don't havethat option.

(21:03):
So, yeah, I can totally see howthe architecture, design and
kind of the adoption pattern isreally going to matter.
Whether or not you can movethings around.
You build it yourself.

Speaker 3 (21:12):
Obviously, you have all the ability in the world to
take that workload and move itback on-prem or from on-prem to
cloud.
But yeah, with managed serviceslike Bedrock, for example I
mean they've essentially builtthe infrastructure for you and
then made it available to youlike a managed service provider,
right?
So, just like an MSP, it'd be alot harder to take your
workload and take your data andput that back on-prem.

Speaker 1 (21:34):
All right.
So yeah, let's move on to thekind of next topic we wanted to
talk about, which is exactlywhat kind of infrastructure are
we using for AI workloads?
I'm assuming we want to focusmore so on the kind of build
your own roll, your own type ofAI deployments, but what things
go into choosing the specificworkloads or choosing the

(21:59):
specific infrastructure that weuse for our AI workloads in
cloud Sure and they fall intoclassic infrastructure domains.

Speaker 2 (22:09):
There is storage, there is compute, memory and
networking.
So I started with compute, andthey come down into four main
kinds.
There is CPUs.
They're good at handlinggeneral tasks and tasks that
have sequential majors, such asI can start my job only after

(22:29):
the previous job is finished,and that's by design, and CPUs
are great at handling that.
And that would apply to, again,most inferencing scenarios,
most inferencing scenarios.
Very, very few kinds of deeplearning and neural networks
that have sequential natures canbe done by CPUs, and you would
win if you use CPU over GPU froma cost perspective.

(22:50):
And the next would be GPUs.
They are the most kinds, themost demanded and, again, the
main value that GPU brings ontothe table is that parallelism
they're great at handlingparallel tasks.
There is TPUs, or TensorProcessing Unit, which was a
hardware that was built byGoogle to support tensor

(23:14):
operations, and tensoroperations are a subset of or
one type of math techniques thatare used heavily in deep
learning and neural networks.
And there is also an NPU, whichis neural processing units,
same as GPUs and TPUs, good athandling parallel tasks, but

(23:34):
they tend to be more energyefficient.
Hence the prime use case wouldbe edge AI, such as mobile and
smart devices, from a memorypoint of view, and memory
consumption comes down todifferent, I guess, constraints
there is.
From a training point of view,the size of batch that you pick

(23:55):
for the job drives your memorydecision.
Like I said, in the intrinsicscenario that I described, you
got to load a language model tothe memory.
The larger the model, the morememory you need and the more you
need to think about from asystem design point of view how
you optimize that as well Then.

(24:17):
But from a hardware point ofview, ddr are still the most
common, affordable type that aregreat in working with CPUs, and
then GPUs have their own DDRcomponent, which is called GDDR,
and there's been a relativelynewer version, which has been
around for 10 years, which is ahigh bandwidth memory, hpm.

(24:40):
Again, that wasn't builtspecifically to support AI, but
that's one of the options thatprovides ultra high bandwidth
from a memory.
I guess a speed point of view,there is networking, and you
guys know the drill here, somaybe we should talk roles.
But well, there is an scale upand a scale out scenario there.

(25:03):
I need some like a premiumhardware which supports my model
training, and that applies verywell to small to medium size
model training.
But I want to distributemultiple jobs across different
GPUs but within one singlephysical.
No, I don't want to go overnetwork, so in that case I would

(25:26):
rely on some special hardwarethat provides inner
connectivities between GPUs andtheir memory components.
There are two proprietaryoptions that I know of.
There is NVIDIA, nvlink andthere is AMD something, I think,
something along the lines ofAffinity.
Realistically, for most jobsyou need to go over network, in

(25:49):
other words, you need todistribute workload across
different physical nodes.
So the answer is ideally youneed a combination of both An
awesome single node plus I needmultiple of those, and to be
able to go over network.
Again, there is a proprietaryoption that provides that direct

(26:09):
connections between GPUs acrossdifferent nodes.
There is InfiBand and there isan Ethernet version of that
which is called RoQui.
Most of these techniques relyon a method called RDMA, or
remote direct memory access, tooffload some of the network
management tasks from CPUs.

(26:31):
So that's that.
Of course, infiband is aprimary choice and I think
that's what Microsoft is usingin their supercomputer that is
used by OpenAI, but that Rockytends to be more popular among
network folks because of thefamiliarity and the skill set
that they can retain from anEthernet point of view.

(26:52):
Yeah, that's right, awesome.
And the storage tends to be themost agnostic of this.
From a workload perspective westill need an object storage for
flexibility.
If we deal with lots oftransactional data, then we're
going to sort of fall back tohard disks and stds, so storage

(27:14):
is less of a concern.
But the question becomes howyou communicate, how fast the
storage can communicate withmemory.
And typically for hard, forhard disk and ssd, you would use
sedar or s as the protocol.
But there is a flavor of STDsthat use PCI Express, which
tends to be heaps, tends to befaster exponentially over SATA,

(27:39):
but that's only available toSTDs and that's called NVM.

Speaker 1 (27:45):
Express.

Speaker 2 (27:47):
Which is just basically STD over PCI Express,
pci express right yeah, it'sreally fast.

Speaker 3 (27:52):
Fast, though it feels like flash memory almost, like
it's really.
It's really, it's pretty fast,yeah, yeah well, yeah, that's,
that's great.

Speaker 1 (27:58):
Thanks for the thanks for the breakdown there.
I think it's.
I think it's relatively obviousthat, as you've gone gone
through all of that, that it'snot trivial to kind of put these
things together right.
You need to actually be veryfocused and make sure that
you're picking every right toolfor the job here, especially
when it comes to AI.
I will say you have a note herethat written down about

(28:20):
confidential computing and Iwill say I don't necessarily
know if I know that term or whatthat really implies.
So can we kind of break thatdown?
What does confidentialcomputing mean to you?

Speaker 2 (28:32):
Sure, if you have heard of enclaves or CPU
enclaves, that's basically whatconfidential computing is it's
about?
A piece of hardware which livesinside the CPU and can create
workload isolation.
Every time the data moves intoenclave it will be decrypted,

(28:53):
and every time that it gets outof Enclave it will be encrypted.
So that will provide thatencryption or data protection in
move to complete the lifecycleof data at rest and data in
transit.

Speaker 3 (29:07):
Yeah, that makes sense.

Speaker 2 (29:08):
Yeah, how important, that makes sense.

Speaker 1 (29:09):
Yeah, how important is that process like when it's
all within, I guess, within anapplication or within a network
that's under your jurisdiction?
Is that as important?
Or you know, we talk aboutgoing, you know, over
third-party mediums.
That's obviously I can see astrong use case for requiring
encryption there, but are weseeing a lot of that within

(29:33):
fully contained environments?

Speaker 2 (29:34):
Yeah, one of the benefits obviously is that
except for the application code,which again will be authorized
to access that piece ofsensitive data, no one else and
that's the advertised benefit noone else can access data, even
the cloud service provider.
So that would help organizationsto sort of, you know, go

(29:57):
through their own hurdles andestablish controls when it comes
to cloud.
Because, again, this is lessabout how secure cloud is but
it's more about how much controlyou've got from a customer
perspective over cloud and thatwould help ticking that box or
facilitate some of thoseactivities from the customer

(30:17):
side.
And there are some strictlyregulated industries, such as
military, which aren'tcomfortable as much to be on the
cloud and they are reluctant togo through that because there
is a lot of work to sort of gothrough those hoops and tick
those boxes.
They may not have resources ora skill set or cost to finish

(30:38):
that process, so they decide tostay on-prem.
So things such as confidentialcomputing, which is not again
unique to AI, but it's morearound data protection and
compliance, would help thoseorganizations sway towards the
cloud option as opposed to theon-prem.
So it's more about a controlteam than security team, at

(31:00):
least from my perspective.

Speaker 1 (31:01):
All right, yeah, thanks for breaking that down.
For us that's really good.
So obviously, sam, you work fora research group, right?
So probably a lot of analysisgoes into this and we will look
at where things are trending.
What's going to happen in thefuture, right?
So I guess, at least from whatyou've seen in the market today,
what do you feel are going tobe probably the most prominent

(31:24):
challenges people are going tohave and what's in store for the
future?

Speaker 2 (31:30):
Sure, and again, this is a fluid topic, so I can only
speak to the decisions thatpeople have made in the past.
But the first thing is I thinkI touched on it before from the
cloud versus on-prem, mostpeople still choosing cloud
because it's convenient.
They need a whole bunch oftools and cloud is fairly

(31:52):
attractive from that perspective.
So that's one.
From a hardware point of view,one of the challenges that I've
observed is supply bottleneck orsupply challenges, and it may
vary from market to market.
Obviously, some markets, suchas North America or Europe,

(32:13):
might be more aggressive when itcomes to AI, so therefore the
type of challenges around thatwould be different to other
markets.
But also some availabilityissues might exist in a market
like Australia or APAC comparedto the US.
So there might be some lagging.
But that still remains to bethe challenge in the next year

(32:42):
or so.
And, as you might know, nvidiahas partnership with all four to
five major CSPs that we know of.
But new players are coming in.
Last year during the insights,even Microsoft announced their
partnership with AMD, which isgood news.
Other players are coming intothat GPU manufacturing, so it's
no longer just Nvidia that mightalleviate some of those pain

(33:06):
points, but something that comesoff the back of supply issues
cost.
Obviously the typical supplydemand.
The lower the supply, thehigher the cost.
But what I think will add tothat is the fact that we need
muscle resources for HPC.
They are not usual, so thesewill add extra price tags to

(33:31):
what we're dealing with.
But I think that should beremediated in the future and I
wouldn't be surprised that someof the cloud costs or
operational costs to come downas well.
The more abundance we get, theless hostile it's going to get
from a point of procurement andprovisioning perspective.

Speaker 3 (33:51):
And the last.

Speaker 2 (33:52):
Thing is optimization .
Optimization is still achallenge and with DeepSeek that
was a very good lesson learned.
Like I myself always thoughtthat OpenAI is going to make the
best choices when it comes tothat, it took about and this is
from Forbes but also anecdotesbetween 40 mil to 80 mil just to

(34:17):
train GPT-4 model.
Just to train that, and theGemini.
Again, the range is wide.
They mentioned somewherebetween 40 mil to 190 mil.
Again, that's a wide range, butthis is the ballpark figures
and the deep seek has changed alot of that.
It's gonna change a lot of that.
But I think there's tons ofopportunities to optimize.

(34:40):
Yeah, if you take a top-downapproach as well as a bottom-up
approach, and by which I meanstart from use cases and then
break it up and take a divideand conquer approach to decouple
different types of workloadsand then move your way down from
that.

Speaker 1 (34:58):
Yeah, it makes sense.
I mean, it's just a naturalthing that we need to be able to
do that right.
And I'm just thinking back tothe days when, initially, when
we started storing data to youknow, the days when you know,
initially, when we startedstoring data, you know you
needed like four life-sizecabinets of storage to house
like 500 kilobytes right nowLike ENIAC, right yeah now it
fits in, you know, kind of likethe fraction of my pinky nail to

(35:22):
do that right.
So it's like it's natural thatoptimization needs to occur.
And yeah, I mean not totimestamp this episode too much,
but this is, as Sam pointed out, this is only two days after
the DeepSeek model has launched.
So by the time we release thisepisode, I don't know if some
significant amount of data wouldhave come out about that.
But yeah, I agree with you there, Optimization needs to be in

(35:47):
the forefront of this and I mean, I would think like obviously
the vendors are going to, youknow, potentially sell less the
hardware components of the morethings optimized.
But I think that's kind of thenatural way of the land.
That's what we need in turn.

Speaker 3 (36:00):
I actually don't know about that.
Like think about it, if thisstuff was easier to run sorry, I
didn't mean to cut you off,though, but if this stuff was
easier to run, like it was alower cost of entry, I think
we'd actually see more hardwarebecause more people would need
it.
More people would be trying tobuild their own data centers to
run it Like in theory.

Speaker 1 (36:18):
I'm just thinking.
I'm just thinking in terms ofquantity, right, Obviously, if
you, if you can run, if you cando the same job with you know a
fraction of the gpus, then inturn I'm going to sell less gpus
to that one specific customeroh, per customer sure yeah but
yeah, to your point, adoption iskey, right, it's, it's how many
people are actually doing it.

Speaker 3 (36:38):
So yeah, um yeah, so it's kind of a it's a balance,
right I still don't think weneed 100 or 500 billion dollars
to to be invested in, investedin AI or whatever it is.
I think that's a huge grift andI think I'm curious to see what
things like DeepSeat keepshowing, because, I mean, other
countries are working on thistoo.
It's not just China.
There's other people out thereworking on this, and I'm curious

(37:02):
to see the timing on.
That was not a surprise either.
Right after Project Stargate,we're gonna 500 billion dollars
the next day.
The china is just like oh look,I, I slipped and dropped my
open source lom on the marketyeah, absolutely.

Speaker 1 (37:18):
Um well, yeah, thanks a lot, sam for uh coming on.
I think we'll go ahead andstart wrapping up here.
So, um, last thing, uh, youknow any any closing thoughts
that you want to add about theepisode today?

Speaker 2 (37:28):
no, I think.
Uh, I just formulated my finalthoughts in that last question
around these stats, but if youhave any, you can throw it in uh
, no, sounds good, like I said,I think.

Speaker 1 (37:38):
I think we're um, we're all eager, we understand
this is a very ephemeral uhenvironment right now, right um,
whether it be yeah, whether itbe the technology or it be the
you know the roller coaster thatis the NVIDIA stock price, et
cetera Things are changingliterally every single day.
So, yeah, Well, Sam, thanks forreaching out and thanks for

(38:01):
coming on the show.
I think this has beeninformative.
So if people want to reach outand talk about any of this stuff
, how can they reach out to you?

Speaker 2 (38:10):
You can hit me up on my linkedin.
Sam's an onion and that'sprimary channel for a lot of
people to reach out perfectsounds great.

Speaker 1 (38:19):
Well, I will.
Uh, I'll make sure we get thatinto the show notes.
So if you want to reach out tosam, please check that and uh,
yeah.
So thank you so much forlistening.
This has been another episodeof the Cables to Clouds podcast
and we will see you next week.
Goodbye.

Speaker 3 (38:35):
Hi everyone.
It's Tim and this has been theCables to Clouds podcast.
Thanks for tuning in today.
If you enjoyed our show, pleasesubscribe to us in your
favorite podcast catcher, aswell as subscribe and turn on
notifications for our YouTubechannel to be notified of all
our new episodes.
Follow us on socials atCables2Clouds.
You can also visit our websitefor all the show notes at

(38:57):
Cables2Cloudscom.
Thanks again for listening andsee you next time.

All Episodes

Episode Transcript

Popular Podcasts

Cold Case Files: Miami

Dateline NBC

Stuff You Should Know

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}How to Perform Cloud Selection for AI Workloads

Episode Transcript

Popular Podcasts

.css-r6mb8g{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:1;overflow:hidden;}Cold Case Files: Miami

Dateline NBC

Stuff You Should Know

All Episodes

How to Perform Cloud Selection for AI Workloads

Cold Case Files: Miami