Network engineers already understand Kubernetes better than they think.

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Tim (00:13):
Hello and welcome to another episode of the Cables to
Clouds podcast.
As usual, I am Tim.
I'll be your host this week,and with me, as always, is the
other guy.
What's his name again?

Chris (00:26):
Chris, I'm still the other guy.
Two weeks in a row row I'm theother guy he's the other guy.

Tim (00:29):
Yeah, the yin to my yang or the yang to my yin, I don't
know, we haven't figured out theyeah, um anyway.
So, uh, we have a a new guestwith us, new to the podcast and
very excited to get him.
Uh, here it's uh, marino Weijie, and we go ahead and uh just
introduce yourself, marino,ouijie.
And go ahead and just introduceyourself, marino, for somebody
who hasn't heard of you yet.
Yeah.

Marino (00:49):
Thank you so much, tim Chris, appreciate it for
inviting me to the show.
So my name is Marino Ouijie.
I am a Canadian, so I live upin Toronto and I'm a little bit
of a techie.
I geek out every once in awhile with the home lab and
stuff like that.
But I'm deep into both thenetworking and the Kubernetes

(01:11):
ecosystem and I found my wayinto the whole Kubernetes
ecosystem through networking,interestingly enough, and
because of that I figured it'dbe great to just chat about it
and see where the landscape istoday and where it's going.
But a little bit about what Ido.
I work at a company called Kongand I focus in on AI and API

(01:33):
transactions, so a lot of higherlevel networking.
I don't really touch hardware,but I touch a lot of like Docker
, kubernetes, various cloudsystems as well, because I'm
using the same networkingprinciples that I would, just
with APIs.
So that's a little bit about me.

Tim (01:51):
Awesome and yeah.
So, marino, I think we wantedto.
We wanted to talk aboutKubernetes and I love something.
So, you know, just, we reachedout to you cause we you know,
we've followed you for a whileon LinkedIn, we love what you do
and Kubernetes is so integratedwith networking and when you

(02:11):
replied and said, yeah, let's dothis, that was actually one of
the things that you brought upas well.
So, like, kubernetes andnetworking are so tightly
coupled because of thedistributed nature of Kube, so I
think this is going to be areally, really good one.
So, let's, you know, we'vetalked about Kubernetes a few
times on the show, but kind ofwhat's, let's, let's, let's hear
, I guess, just your take onKubernetes to begin with and

(02:33):
let's kind of go have adiscussion from there.

Marino (02:35):
So, if you think back to maybe 2017, at the time,
Kubernetes was just a containerorchestration system and all it
really did was allow you to runcontainers at scale, and it was
a very clean but also verymanual environment.
Today, it does pretty much thesame thing, but also is a

(02:56):
platform for a variety of kindsof workloads.
We're talking like not onlycontainers but VMs, but networks
, but also bare metal, and it'sjust phenomenal to see the
growth of that ecosystem,because when you think back to
like, virtualization, thatsystem was what vSphere so you

(03:16):
might work with vSphere rightand today Kubernetes has pretty
much taken over that role.
Now there's just a lot ofdifferent implementations.
It's a very pluggablearchitecture.
So you're not only plugging inyour workloads, you're plugging
in network security, you'replugging in observability,
you're plugging in variousdifferent systems just so that

(03:38):
it can interact with Kubernetes.
You're even plugging in AI, andAI is also supported by
Kubernetes as well in a lot ofdifferent ways.
So we're here at this stagewhere it's been adopted in so
many different organizations.
In fact, you look at all thecloud providers and they have
their own implementation ofKubernetes, which they roll,

(03:58):
they scale, they manage for you,and all you do is you just run
your containers or you run yourworkloads, basically.

Tim (04:05):
Before we get too far into the networking side of it.
So what, what?
What I found interesting.
So I had to learn kubernetesand of course I'm still learning
kubernetes, like everybodyelse's, as, as it continually
changes, uh, you know for my job.
Uh, you know, of course, I workfor aviatrix and you know where
our focus is on cloudnetworking and security, um, and
what that means.
You know where our focus is oncloud networking and security
and what that means you know.

(04:25):
So basically, I had to take, youknow, a networking background.
You know I'm a CCIE and almostpure networking, but also I used
to be a firewall jockey.
So a little bit ofcybersecurity, very little.
And then take all of that andbe like, okay, well, now let's
figure out Kubernetes, which ofcourse feels like a completely
new topic.
I did know Docker.
I'd used Docker years ago andso the concept of

(04:49):
containerization, at least, wasfamiliar.
But yeah, the orchestrationplatform of course feels very
different, the way it's allorchestrated and built.
So it's really interesting forsomeone coming from a networking
background to understand howKubernetes is, how to use
Kubernetes.
Essentially, you're absolutelyright.

Marino (05:07):
I mean, if you think about systems like OpenStack and
vSphere and how they operated,they are very much distributed
systems to just capitalize onthe fact that you have all of
these different resourcesavailable to you.
But you have to find a cleanway to carve them out so your
developers aren't screaming forresources when they need them.
You can just make it very easyfor them to consume.

(05:30):
Now, what's really interestingabout Kubernetes is that it
heavily depends on networking.
I mean, if you don't have anetwork, there is no Kubernetes.
Yeah, and it's heavily relianton this idea of reachability.
You've got this cluster.
This cluster is comprised of anumber of nodes and these nodes

(05:51):
all need to communicate witheach other.
They need to exchangeinformation, they need to
identify when something's wrongso that they can tell another
node hey well, I can't doanything anymore.
Can you take on the load?
And everything should stillstay the same and look the same
to a consumer, a developer,whoever.

Chris (06:09):
And I guess I think in the early days of Kubernetes it
was kind of, I mean, keep mehonest here, but I think the
idea was kind of more implicitnetworking, whereas like it was
just kind of implied thateverything is able to talk to
everything.
And then once kind ofnetworking and network security
kind of got got a seat at thetable, they're like hey guys,
that's not really how thesethings should be operating.

(06:30):
You know, we should, we shouldyou know kind of um, uh, make
this a little bit tighter and atleast a little bit more
explicit about what is allowedto talk to what Um.
So I mean, in your experience,how has that kind of maybe
evolved Kubernetes over time?
Has it become kind of thisbigger beast and it's harder for
people to handle with that typeof control, or do you think

(06:53):
it's been kind of for the better?

Marino (06:54):
So when you think about Kubernetes for a second.
To your point right, it wasvery much just this black box
that would run containers.
Networking in Kubernetes was ablack box I wouldn't say static
but you really couldn't do muchwith it.
You couldn't customize it a lotand the demands of the
networking and security teamswere like no, we need to be able

(07:15):
to get in there and tracepackets, we need to be able to
see what's going on, we need tobe able to have ports that are
locked down and other ports thatare open for this communication
to occur.
And then oh, by the way, wealso need to have a dashboard to
see all of this as well.
So you start to see atransformation of the networking
side of it, where it's growingto accommodate what network

(07:36):
engineers want, but at the sametime just aligning to what
Kubernetes is.
So let's think about a workloadand how it gets on the network
for a second.
So you've got like a serverthat is connected to a network
with, like Ethernet.
It gets its IP subnet maskgateway and now it's on the
network.
You've got some DNS going onand now people can hit it
without having to know the IPaddress.

(07:57):
That same concept exists insideof Kubernetes when you have a
pod, that comes online as well.
Except there's also thisconcept of IPAM IP address
management.
That comes about because it'snot like someone is going there
statically assigning IPs.
There's a system that has toassign these IPs, but then you
also have these conditions ofokay, if I assign IP addresses,

(08:18):
how do I handle the DNS bit?
So then you have this systemcalled core DNS or a DNS system
that effectively looks andwatches the system.
New workloads come online.
Well, let's go, you know, setup an A record and a PTR record
so that services can communicatewith each other, or pods can
communicate with each other.
But one thing that theKubernetes ecosystem really

(08:39):
thought about, was verythoughtful of, is this concept
of immutability andephemerability.
Right, things can suddenlydisappear.
Things also just simply cannotbe changed easily.
Like we cannot go in here andmodify a pod just like that.
Like we can.
But that's not the right way todo things.
That's right.
We have to be a little bit moredeclarative as to how we want

(09:02):
desired state to look.
So this concept of FQDN becamevery important in the Kubernetes
ecosystem.
Dns became very important,where you now have an
abstraction layer where you canstart swapping pods in and out,
even notes.
You can swap these in and outand no one on the consumer side
should ever see that anythingchanged.

(09:23):
Maybe they might see a slightblip, but nothing really changed
there.

Tim (09:27):
So this is interesting because I think a lot of network
engineers probably alreadyunderstand Kubernetes a little
bit better than they think theydo.
Like, for example, the conceptsthat you know you were just
talking about Marinos, this isjust DHCP with DNS registration,
right Like that part of itshould be very familiar to
anyone who is a network engineer.
And to go back just a stepbefore that, I think that you

(09:52):
know the need for more.
What was the word I'm lookingfor For the more granular and
more declarative networkingstacks is what, of course, gave
rise to like CNI, right, thecontainer network interfaces,
where now we're bringing thisnetwork layer observability,
like with Cilium and Calico andso on and so forth.

(10:13):
The CNIs are just providingsomething that the original I
don't know what you'd call it,the original branch, not branch
a project the Kubernetes projectdidn't really envision
application developers needing.
Right Like, app developers werenever going to necessarily need
that level of networkobservability.
They needed it, but they didn'tunderstand it.
They didn't know they needed it.

(10:34):
You know what I mean Like.
So I think, do you think youknow?
Cni is basically the method bywhich we get you know it up
levels.
So think of it like anapplication is how I think of it
.
I think of it like you'reinstalling an application,
essentially on the cluster thatthe application is, you know, in
the kernel it's, it's, it'snetworking, but yeah, anyway.

Marino (10:52):
Yeah, I mean, a CNI is just a switch at the end of the
day, that's really what it'sdoing Replacing the bridge?
Yeah, yeah, it's a verymulti-layer switch that's very
capable of so many things.
It's pluggable too.
But what's interesting aboutwhat a pod is is it's a Linux
network namespace.
And when you think aboutnetwork namespaces, these are

(11:12):
isolation boundaries that verymuch mimic this concept of a
VLAN Not entirely one for one,but the idea of having that
network namespace is to providethat isolation boundary for a
set of containers that mightneed to talk to each other but
still have some access to thenetwork, some priority on the
network, if you will.

(11:32):
Now, where we've gone with, likeCalico and Cilium Cilium
especially, I mean, if you'vebeen watching the news, they got
acquired.
Actually, the company thatbuilt Cilium got acquired by
Cisco Isovalent yeah, isovalent,probably about two years.
Yeah, isovalent Two years ago.
Big fan of them, big fan ofwhat they do and big fan of the
Cilium CNI because it spoke tome as a network engineer,

(11:56):
because it was able to do thingslike BGP, bgp.
Why do we need BGP inKubernetes?
Well, if you have all of thesedifferent pod networks, I mean,
someone outside of yourKubernetes ecosystem needs to be
able to get to it.
So you're not just going tohand them a service IP and be on
your way.
You've got to share thosenetworks back into BGP and they
need to distribute them toremote sites if you will.

(12:18):
And so they really touched onand were very thoughtful of how
to build a CNI.
That was very network centric,still Kubernetes centric too,
but the folks that engineeredthis were actually network
engineers.
Yeah, they sat there behind thescenes and really thought like
you know, what would a networkengineer do to move these

(12:38):
packets around?
But then we can get into likethe nitty gritty of eBPF and how
, like kernel based networkingworks.
But I think that would take awhole hour in itself.
But when you think about thenetworking part of Kubernetes,
you're starting to see layersand layers and layers of
abstraction going on here,because it doesn't stop there.
You've got a service mesh on.

(13:02):
If you're not familiar with it,or for the folks that listen on
later on, if you think aboutwhat a vpn was aiming to do, it
was supposed to securely connectbranch locations.
And when you think about pods,for a second, they're in a way
like their own little islandsdoing things.
They need to communicate withother islands as well, and you
can truly do so.

(13:23):
Like if you think about theinternet, if everything just
talked over the internet.
Sure, everything would gothrough very cleanly, but in a
very unprotected and plain textmanner.
So the idea of service mesh wasto not only provide a layer of
encryption through somethingcalled mutual TLS, but also to
add on to that networkcapability by bringing in this

(13:44):
concept of QoS, but nottraditional QoS.
Qos in the sense of let's justthink about service resiliency
implement things like timeouts,retries, inject faults, so we
can add some artificial delaywhere we need to, and then also
do things like hey, I'm going toroll out a new version of a
service, something like ablue-green or a V1, v2.

(14:07):
Let's wrap to thatappropriately.
And, by the way, we used to dothese things as well with ALBs,
with application gateways aswell, and we still do.
It's just we brought this intoKubernetes because we needed a
way to be able to handle thisfrom a service-to-service
standpoint we needed a way to beable to handle this one from a

(14:27):
service to service standpoint.

Tim (14:28):
I think of, uh, when I was trying to figure out what the
hell service mesh was, the onlythe closest thing I could think
of from a network engineeringperspective was like here's an
application layer softwaredefined vpn mesh, basically like
, like you know, with the wholeconcept of here's, here's a
sidecar which is like yourlittle tiny router or firewall
or whatever you would call itVPN Terminator on the side

(14:48):
that's allowing the applicationsto have protected connectivity
with each other.
But it's all at the app layerfrom a communications
perspective, right, because youcan do all of these policies and
stuff at the app layer.
Yeah.

Marino (15:02):
You get to be a lot more granular because at this point
you're communicating at eitherthat TCP layer or HTTP, where
you can get super granular,because now you can inject
certain kinds of headers intoyour HTTP request and filter or
provide policy against that.
You can bake yourauthentication in there as well.
There's just a lot of fancythings you can do with HTTP that

(15:25):
you couldn't really do at theTCP layer.
Like you're kind of stuck,you're limited, it's very static
, you're just working with IPsand ports and host names, but
when you're talking about HTTPthere's a lot of data you can
feed into that entire requestflow.
So service mesh became verypopular but then also became
problematic in its own way,because now it's just adding a

(15:47):
significant amount of complexityand overhead as well, and so
you have to think about youroperations teams and what they
have to take on as a burden tobe able to support your
applications running on top ofKubernetes or even other systems
too, other systems that mightbe using a service mesh.

Chris (16:05):
Now that we're on the topic of service mesh, I've seen
like in my experience talkingto a lot of customers and
colleagues and things like that,there's kind of people in two
camps that are runningKubernetes either in the cloud
or wherever Either the ones thatdo not want to deal with the
complexity of a service meshthey think that it's just too

(16:27):
much for what they need to doand then the ones that are like
some of them are begrudginglyimplementing a service mesh and
some of them are happily doingit.
But I guess in your experience,what have you seen as being
that breaking point where acustomer is like okay, we
absolutely need to do this forthis purpose ABC, enhancing our

(16:47):
applications, enhancing thebusiness, et cetera.
What does that typically looklike to you?

Marino (16:52):
The most common use case you find with companies wanting
to or organizations wanting touse a service mesh really comes
down to the security bit.
Right, they have this strongrequirement to adhere to things
like PCI, or they have toprotect their workloads and how
they communicate, and so MTLSbecomes that initial pathway.

(17:13):
But then organizations alsobegin to realize how important
the service resiliency alsobecomes, right, that QoS bit, if
you will.
The problem is like if youthink about when we were doing
networking at the hardware level, like how often would you
actually sit there and configureQoS?
The better answer would be yeah,like the better answer would be

(17:34):
let's just throw more bandwidthor higher powered switch if
there's a problem, right, andwe're starting to see that same
pattern arise, just in adifferent light, all over again.
Because no one wants to sitthere and baseline their
services and understand, like,the latency between their
services in that service requestflow.
And it takes a lot of tuning toset up that resiliency as well.

(17:58):
It's not easy to do because ata moment's notice, your
requirements might change andthen you have to change up
everything all over again.
And you also have to pair thiswith your testing, your testing
methodology too, which might notbe like bulletproof and some
things might slip through thecracks.
And when you think about scalingright, when you think about how
Kubernetes operates, it scales.

(18:20):
It scales based off of certaintriggers.
It scales your workloadsbecause it determines hey, you
know, you've gotten an increaseof requests coming inbound now
and I have to scale up right,but you cannot infinitely scale,
so you've got to do otherthings too.
Anyways, like we can go on andon about this, but the reality
is, if you're a large enterpriseorganization, chances are

(18:43):
you're probably investigating orcurrently using a service mesh.
You wouldn't be using somethingopen source.
You'd probably be usingsomething enterprise.
No enterprise grade.
But the other thing, too, ismost organizations that are much
smaller walk away from servicemesh because it's not warranted.
Within their organization, theyhave a better view of what

(19:06):
their workloads look like, aswell as their infrastructure, so
they have a lot more control.
But when you're thinking aboutenterprise-wide scale, you have
different teams needing tointeract with each other,
different applications doingdifferent things, calling each
other, if you will.
That's where that service meshbecomes very powerful and also
very important and very usefulas well.

Tim (19:27):
Yeah, I can't think of a single enterprise that I'm aware
of that actually has a workingapplication catalog of like
here's our, here's our apps.
Here's what's talking to whaton what ports.
Here's the bandwidthrequirements, here's the latency
requirement.
None of that happens, right,it's always like oh, it's broke.
Oh, yeah, I guess we can'ttolerate more than 50
milliseconds of latency.

(19:49):
Good to know, right, it's likethat.
So, yeah, totally, totally,totally agree on that.
Another thing that I've noticedand it broke my brain when I
was learning Kubernetes was thenetworking behind service, like
services, like service IPs, howthey can be shared between nodes

(20:09):
and like you know how, like the, the cube control, uh,
scheduler, and all of that likesets up the services and then,
no matter which node you hit,you're gonna end up at a pot.
It's it's just, it's weird tome.
So I I still don't know a greatway to explain that to people
that haven't sat down and readit.

Marino (20:26):
Basically, read through it to figure it out yeah, so a
lot of this really comes down toNAT and saying to that
requester or that consumer well,I'm going to NAT this request
and then send it off to wherethat actual workload exists.
And you know, behind the scenes, depending on what CNI you're
using, what underlay fabricyou're using as well, a lot of

(20:50):
that is just overlay and vx linenetworking.
At the end of the day, yeah, Imean that all sits behind the
scenes and we just don't see it.
I mean, we don't even have tosit there and troubleshoot it
anymore.
Um, our networks just take it,they just handle it at this
point in time.
Um what?
What really comes down to it,though, is for network engineers
out there, if you have a veryclear understanding of how nat

(21:10):
works, that's how the kube proxyworks, that's exactly what's
the kube proxy is doing noteffectively and it knows where
to direct workloads.
It uses a little bit of whatthey call ip tables.
So if you've ever worked in thelinux space you probably have
messed around with ip tables andmore recently, nf tables.
Yeah, yeah it's just anenhancement.

(21:33):
Now, like psyllium uses ebpf tohandle this, but at the end of
the day, um all it's.

Tim (21:39):
All that's really going on is a bunch of remapping and
rerouting yeah, I seem toremember and again I'm so late
to the game, but I seem toremember that, like IP tables
was a huge, just a hog, like itwas.
It was a problem like that.
One of the reasons that youknow CNIs gained popularity was
because you know IP tables couldjust be overrun essentially

(22:00):
with the requests and mappingtables and just swapping,
constant swapping.

Marino (22:05):
It's.
It's a little.
It's a significant overheadoverall, right, because you have
a bunch of nodes in yourcluster and they have the same
set of IP rule tables or sorry,ip table rules and you're just
thinking about a system that hasto read down a list, find that
entry and then route the request, which is not very efficient
when you're talking aboutmassive scale.

(22:26):
So a lot of CNIs havesignificantly improved upon that
experience overall, but at theend of the day it's still a
challenge and that's why we'reseeing a migration over to NF
tables, because it can processtraffic a lot better and it can
read rules.
It can write rules a lot betteras well and it just does really

(22:48):
well with connection handling alot better as well, and it just
does really well withconnection handling.

Tim (22:51):
Sorry, one sec, I lost the.
I accidentally closed thewindow with our list on it, so
all right, hold on, let me pickup.
So actually, scale's a greatsegue, right?
So I don't think.
I think one thing networkengineers that haven't really
leaned into Kubernetes haven'treally considered is the, not
just the scale itself ofKubernetes, where you can have,

(23:14):
you know, hundreds orpotentially thousands of nodes
and pods and all of this, butjust how, from a networking
perspective, that is evenhandled at the, you know, by
Kubernetes.
So you know what does that looklike Like?
You know, just to help thenetwork engineers understand
what scale looks like from thatperspective.

Marino (23:32):
Yeah.
So let's just say you're scalingworkloads, right, you have a
container and you need multiplecopies of it.
In Kubernetes when you create aservice type, it's actually a
load balancer that's frontingthese services with DNS and so
by default it'll just roundrobin the requests out to each
one of these containers, or pods, I should say.

(23:53):
But what actually causes thescale is triggers.
So within Kubernetes you havesomething called a horizontal
pod autoscaler, whicheffectively helps with the
scaling capabilities based offof certain conditions.
Okay, my CPU metric for all ofmy workloads, my replicas, has
hit 80%.

(24:13):
Start scaling to a certainnumber of replicas and then the
reverse, like once that demandgoes down, scale back down.
Because the other thing that youhave to consider too is your
cluster right, which runs all ofthese containers.
It's not infinite.
You also have to considerscaling that up too.
And let's just take, you know,one of the major cloud, your
cluster right, which runs all ofthese containers.
It's not infinite.
You also have to considerscaling that up too.
And let's just take, you know,one of the major cloud providers

(24:35):
out there.
When you have to scale yourcluster, it's not just a simple
hey, I'm going to throw a nodein here and then boom, now I
have more compute and CPU andmemory capacity.
It actually has to bootstrap it, it has to bring it online, it
has to do a bunch of validationchecks to make sure that the
node itself is working and canactually accept workloads, and

(24:56):
then, once it's in the cluster,joined to the cluster, you can
start scaling your workloads.
So there's it's a twofoldoperation.
But the other part to this toois the load balancer right.
So you have an internal loadbalancer and then, if you are
exposing your services, you'llprobably use an external load
balancer as well that just againaccepts the connections and

(25:16):
then distributes them to allavailable copies or any
available copy.
But what's interesting is thatthe way to operate these systems
is not just by like turning onyour HPA and being on your way.
You actually have to understandhow your workloads operate.
What might be considered demandalso could be considered a
denial of service attack.

(25:37):
Let's just say, you know,around the holidays is when we
expect to see traffic spikes andwe expect to see increased load
.
But it's not so difficult forsomeone to just create a denial
of service attack, slip it intothat same period of time and no
one really think about it.
And then, all of a sudden, ifyou haven't set your let's say,

(25:58):
your cloud bills or alarmsproperly, and you also haven't
set your scaling limits, well,guess what?
Now you have a million dollarcloud bill that you have to take
on and worry about.
So there are other mechanismsthat you can employ to be able
to handle that too, things likerate limiting.
Rate limiting is importantbecause, as you start to see
scale, you'll allow a certainnumber of requests to make it

(26:19):
into your cluster before you hita capacity.
But that rate limit is supposedto prevent something like a
denial of service attack,because now, once you start to
see an anomaly of requestscoming inbound, wherever you've
implemented that rate limiter,normally it's not at the load
balancer.
You do it somewhere like a WAFor maybe even like an API
gateway or something of sorts.

(26:41):
That's where that rate limitkicks in, so you don't run into
a scaling infinite scalingsituation.
So that load balancer isimportant.
It's just distributing load.
But you also have to rely ontelemetry data.
That telemetry data is what'senabling that scale to be
possible as well.

Tim (26:58):
I mean, that's very cloud, Sorry go ahead, go ahead.

Chris (27:00):
I was going to say, yeah, the DOS thing is kind of.
I feel like that just has a lotof parallels to a normal denial
of service, right, like, if youthink of like these, you know a
lot of carriers offer, likedenial of service, scrubbing
services, right, because thething is, if you buy a circuit
from someone and they hand youbasically a wire, once the

(27:22):
traffic's on your end of thewire, there's no way to stop it.
It's already there, right?
So you've got to withKubernetes.
It sounds like you've got touse some of these external
things, as well as maybe some ofthe internal components, to
stop it.
Before you know, it causesKubernetes essentially to think
that the trigger has beeninvoked, right?
So I think it's exactly thesame.

Marino (27:44):
It's very much the case, right?
I mean, whether you're in thecloud, even outside of
Kubernetes as well, straight upoperating in the cloud, you're
going to run into the samechallenges as well, because your
resources, you're exposing themto public customers, public
consumers, and unless you'reimplementing some strong
authentication and you're notjust freely exposing all of your

(28:05):
APIs to everyone, you're likelygoing to run into situations
like that.
But again, we've learned fromthis.
We've built a lot of practicesaround these situations and how
to design for and build forthese situations as well.

Tim (28:21):
So, speaking of because this is also a really good segue
I mean, we're talking aboutscale here, right, and nothing
has shown to stress scale latelymore than the deployment of AI
workloads and high-performancecomputing and GPUs and all the
stuff associated with doing alot of data very quickly and a

(28:43):
lot at scale.
So people are doing this inKubernetes.
I don't know which piecesthey're doing in Kubernetes
actually of the AI stack.
You know from their training.
I'm not sure exactly what it isthat they're doing, but you
know actually what does thatlook like?
Do you have any idea?
Like what people are doing withAI and Kubernetes now?

Marino (29:03):
Yeah, before it used to just be running systems and
workflows.
That would have some enginesthat are just doing training,
running training models withinyour cluster as well, and you
would still need some prettyhigh-grade hardware and GPs to
be able to handle that.
But we're well past thetraining.
We still do that training.
We still see it in Kubernetes.

(29:24):
There's a project out therelike Kubeflow which definitely
helps with setting that all upand helping you set up workflows
to be able to ingest differenttraining situations.
But what's actually happeningnow is you're actually running
models inside of Kubernetes, andnot only that, like it's not
just a cloud, it's not just I'mgoing to go to a cloud provider,

(29:45):
spin up a Kubernetes clusterand then run my model there,
Because the reality is they'renot giving you the best of the
best CPU memory.
It's all shared resources andwhat you're actually starting to
see is that clusters need tohave access to a GPU.

Tim (30:01):
That makes sense.

Marino (30:02):
GPU enabled and you need to effectively be able to pass
that GPU through to yourworkload.
So if you're running a model asa pod, it needs access to the
GPU and it's not just a let'sjust set up a GPU pass-through
capability.
It also means that you'repassing a lot of networking
traffic too, so now your CNIsare being updated to handle that

(30:25):
kind of traffic as well.
There's something calleddynamic resource allocation, DRA
, and DRA in Kubernetes iseffectively prioritization.
It's QoS for AI and LLM.
Yeah, for hardware and modeltraffic.
So you have a few models thatare running on top of Kubernetes
clusters.
It's expensive because GPUs areexpensive.

(30:46):
Um, but you also have to thinkabout that cost control too,
because this is a great way tohave that million dollar cloud
bill that you probably couldn'teven you know, conceive that you
never, even thought of orplanned for.
But you have a team ofdevelopers.
They need to build, they needto run their models, they need
to test out how these modelsoperate.
They're doing very uniquethings and they're consuming a

(31:07):
lot of GPU power.
It's not so much about computeanymore.
It's about access to goodquality GPUs that can process
thousands of tokens per secondnot just five, but thousands,
right and that's not cheap rightNow.
The other side to that, too, iswe're seeing that only because

(31:28):
what happens is you have teamsthat just go the shadow AI route
, go to open AI, go to cloud oranthropic, pull their API keys
and just like keep addingcredits.

Tim (31:40):
Just add credits, yeah.

Marino (31:42):
That's it Right, consume the APIs.
But here's the problem withthat.
So you're starting to enablethese developers to just send
PII, send IP, out into thesemodels without like clicking the
right compliance and guardrailsthat needs to be in place.
And so to control that there'sa few ways, like you know.

(32:05):
You can use something like anAI proxy or an AI gateway, or
you own the models on-prem,where you decide I'm going to
just build my cloud on-prem allover again and then I'm going to
send my developers to use localmodels.
Yeah, they can certainly pullmodels down and train them if
they want to customize them totheir needs and still have their

(32:27):
applications communicate, dothe things that applications can
do.
This is all like agent to agentstuff now yeah of course.
At the end of the day, like you,it becomes almost impossible to
scale GPUs because now you haveanother problem of electricity
bills going through the roof.
Providers are not going to comeafter you because you're using

(32:48):
GPUs.
They're coming after youbecause their electricity bills,
their HVAC bills, are throughthe roof.
So now you have other systemsthat come into play here.
Right, when you think aboutwhat Apple's doing, especially
with their Apple Silicon chip,right, they're basically
enabling GPU building on theirdevices.

(33:09):
Like, I've got this M4 that I'mbasically working off of right
now.
That's where the stream isgoing on, or where our recording
is going on, and I've got a fewmodels running behind the
scenes here and I don't hear anyfans.
Do you hear any fans?
No, right, but that's thedirection they're trying to go.
In fact, there was a Twitterpost, probably maybe two months

(33:34):
ago, where someone had justbought a bunch of mac minis, m4
based mac minis, racked them up,used a usbc or or what is it,
thunderbolt 5 to connect themall up and they had, like you
know, I think, more than 100gigs of bandwidth between each
of these nodes.
But now you have this supercluster of Macs that's running.

(33:55):
You know, you don't have a lotof the same constraints or
considerations around HVAC andit becomes a very powerful style
of cloud.
And then you find out likeApple's got their own container
runtime now, right.
So now you start to see thatthey're really trying to advance
the developer game, especiallywhen it comes to AI and working

(34:15):
with LLMs.

Tim (34:17):
Yeah, that's interesting.
I think and we've done a coupleshows on, basically, what does
it look like to build an AI datacenter for, like, what's the
new and exciting?
And I think anything we havecome up with is something it
will change, because thetechnology has to change,
because what is it?
Moore's law we're getting sofar ahead of Moore's law at this
point, like it's it's going totake a while, but figure out

(34:38):
what the next iteration of thatis.
Um, but yeah, so no, thatthat's that's really, that's
really good, and uh, one thoughtabout that.

Marino (34:47):
Do you remember infiniband?

Tim (34:49):
Yeah, yeah.

Marino (34:49):
Yeah Right, rdma.
All of a sudden, they'rereentering the space all over
again.
They're becoming popular allover again.

Tim (34:57):
It's circular.
As the technology hits limits,we explore what if we tried
before, is it going to workbetter?
Can we do it better this time?
Definitely agree, okay.
So I think actually we need towrap up.
So before we do, marino, marina, where can, where can people
find you on the web?
Where should they, uh, followyou and interengage with you?

Marino (35:18):
So I am active pretty much everywhere.
Uh, mostly LinkedIn, a littlebit of Twitter, x, um, I, I mean
, I, I, I do more trolling onTwitter and X more than I do the
professional stuff, right, um,but if you, if you want to
connect with me professionallyand see some of the work that
I'm working on or some of theprojects that I'm working on at
the moment, come hit me up there.

(35:39):
You can literally look me up bymy name.
I'm there as well.
I use Discord, but I just likekind of lurk.
I'm not really big on likeheavily contributing.
But one thing I would recommendfolks to do is I go to KubeCon a
lot.
Right, I try to go to everyevent that's out there in North
America and Europe.
I'm trying to get to, like someof the, you know, japan and

(36:02):
China.
Maybe next year we'll see, yeah, and if you're, if you're
planning to go out that way,definitely hit me up.
Let's connect for a bit, let'schat and even go check out those
sessions.
There's a lot of new AI basedsessions and you can see the
intersection of what Kubernetesis doing with AI as well.

Tim (36:21):
Awesome, and we'll get all that in the show notes as well,
so everybody doesn't have tohave to memorize it.
Absolutely All right.
Well, thanks for joining ustoday, marina.
This has been an awesomediscussion.
We'll have to have you, haveyou back and expand on it in a
month when everything changes.
I'm just kidding, but yeah, no,this is great.
Any final thoughts?

Marino (36:41):
Yeah, I think that everyone at this point should
start considering how they caninvolve the usage of AI in their
workflows, in the way theybuild.
I mean, all it takes is go,download Alama and run a model
locally and then, if you want totake that a step further, pair
that with Open Web UI or WebKitand if you want to take that

(37:02):
even further, check out AlamStudio, because I think not
everyone's got the capability togo out and source an API key
and then start building withpublic models and at the same
time they might have somerestrictions as well.
But if you've got a very recentMac or some sort of ARM-based
device, you can do some seriousdamage with some of those models

(37:25):
out there, some of the littleones yeah, definitely agree with
that, For sure.

Tim (37:29):
Awesome, all right.
Well, this has been Cable'sCloud Podcast and I was going to
say enjoy everything we do andsubscribe to us on everything,
but obviously, if you'relistening to us, you probably
already did that, so instead, ofyou subscribing, I'd say don't
enjoy everything we do.

Chris (37:44):
We don't get it right every time.

Tim (37:46):
No, no everything To enjoy everything we do.
It's a requirement, but yeah,if you liked it, at least share
it with a friend, maybe severalfriends and, yeah, we'll see you
guys next time.

Marino (38:01):
Thanks everyone.

All Episodes

Episode Transcript

Popular Podcasts

On Purpose with Jay Shetty

Stuff You Should Know

The Joe Rogan Experience

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}Network engineers already understand Kubernetes better than they think.

Episode Transcript

Popular Podcasts

.css-r6mb8g{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:1;overflow:hidden;}On Purpose with Jay Shetty

Stuff You Should Know

The Joe Rogan Experience

All Episodes

Network engineers already understand Kubernetes better than they think.

On Purpose with Jay Shetty