How Do We Solve the Cloud Visibility Problem?

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Chris (00:00):
The human torch was denied a bank loan.
I haven't heard that one.

Craig (00:04):
Really, I haven't heard that vocal exercise, yeah.

Chris (00:07):
Quote from Anchorman.

Tim (00:09):
Oh, that's, right, yes, this burrito is delicious, but
it is filling.
Welcome to the Cables to Cloudspodcast, your one-stop shop for
all things hybrid andmulti-cloud networking.
Now here are your hosts Tim,chris and Alex.

Chris (00:30):
Hello and welcome back to another episode of the Cables
to Clouds podcast.
My name is Chris Miles at BGPMain on Blue Sky.
I actually went and changed thedomain, tim, so you'll be happy
.

Tim (00:41):
Excellent.

Chris (00:42):
Kept it consistent everywhere.
And yes, as you heard, asalways I'm joined by my good
friend, tim McConaughey, myco-host, my partner in crime, my
heterosexual life mate,whatever you want to call each
other.
But yeah, we have a special onetoday.
So this is a post-holidayrecording.
We're all fat, dumb and happy.
From Christmas, we filled ourguts and now we want to chat

(01:07):
about some cloud observabilitystuff.
So we've brought on a good palof ours, craig Johnson from
Forward Networks.
So if you've been involved inany of the podcast and technical
media circuit, you've probablyseen Craig out there.
I think you've been on A1 aswell as the Cloud Gambit, right.
So, yeah, we're just, he'smaking his rounds, we hope you

(01:29):
know it.
But no, we had a good chat atAWS reInvent this year and we
thought it'd be a good chance tocome on here and talk about
some of the stuff that'shappening at Forward and the
concept of digital twins and howthat applies to cloud network
observability.
So with that, Craig, tell us alittle bit about who you are and
what you do.

Craig (01:49):
Yeah, thanks for having me.
So my name is Craig Johnson,live down here in the great
nation of Texas.
So I've been at Forward for alittle over five years.
So I'm a technical solutionarchitect.
I also lead our public cloudpractice, but really I didn't
start out in cloud so I reallyonly moved to the public cloud
probably about three years ago.

(02:10):
On my back left corner you'llsee my multiple expired CCIE, so
most of my time before that wascompletely on-prem working with
service providers.
I spent a decade or so at Ciscoworked on the operations side,
but most of that was either inthe data center storage area
networking or campus networking.
But yeah, for the past five,five and a half years at Forward

(02:31):
been focused on networkmodeling, digital twin, and for
the last three or so has beenpurely based on cloud modeling.

Chris (02:39):
I will say at least you've got one of the old school
ccie plaques that is, uh, intim and my's day.
I feel like the plaque we got.
We kind of I call it the uglybaby, because it's a hideous
thing that I don't want to hangon my wall because I hate the
way it looks.

Tim (02:52):
Yeah, you see, I don't.
I don't have mine behind me.
I I didn't have space for itafter I put all the posters up,
so it definitely degraded overthe years.

Craig (02:59):
I've got the the old metal, one from metal from like
2001, and then they starteddoing just kind of a simple one,
and then, yeah, by the 2010s itwas like come on.
It's like when I was there andthey took sodas away.
It's the same thing, you know.

Tim (03:12):
I'm going to play a dude.
I'm going to play a completelyderail.
This for a second.
I just noticed that you havethe red dragon.
I also have are.
I have that exact same RedDragon.
I noticed it.
I was looking at my bookshelflike holy shit, we have the same
Red Dragon.

Craig (03:28):
I had no idea.
Yes, yes, there is a plethora.
I just redecorated this room sothe color and the flooring is
all new, so I have that.
Below the Red Dragon over,there is several D&D third
edition books that.

Tim (03:42):
I still have.
They're not particularly usefulanymore, but yeah, absolutely.

Chris (03:46):
Yeah, good stuff, man.
Uh well, yeah, craig, thanksfor that.
Thanks for coming on to theshow, obviously, and um, yeah,
so let's, let's start with, um,I guess, kind of an overview of
the problem.
Right when we we had the ideafor the show, we wanted to talk
about what the problem was andand, and then we can kind of
venture into how to solve that.
So you know, network monitoringand network observability has

(04:08):
been a thing in the on-premisesworld for a very long time.
And I'm curious to get your take.
How has that made its way intocloud?
Because cloud seems likethere's.
It's this new paradigm wherethere's integrated APIs
everywhere.
You should be able to get asmuch visibility and
observability that you need, butthat doesn't seem to be the
case in day-to-day operations.

(04:30):
So, from your perspective, whatis broken about cloud
networking, monitoring andobservability?

Craig (04:37):
So what I'll say is particularly broken, in my
opinion, is one from the veryobvious.
In contrast to on-premnetworking where you know if I
want to swap out a Cisco, youknow router for a Juniper,
router for an Xtreme one, theyall basically do the same thing
and you know they're going tohave RIBs, they're going to have
FIBs, they're going to haveforwarding tables, input and
output ports, and when you goinside that you know it's going

(04:57):
at all.
If I want to switch from AWS,azure, gcp or OCI, obviously
there's something underneath allof those that do those, but
those are completely opaque tous and the visibility that I get
within those are very, verymuch limited.
For administrative controlreasons and just by nature, I

(05:20):
want to have any level ofvisibility.
In AWS I'm limited by, ofcourse, which account I'm in and
which region I'm in, becausethey made that decision to
separate the control planes.
Azure's a little bit different,but I'm separated by tenants and
subscriptions, similarly withGCP and projects.
So the scope of what I canactually see is quite limited

(05:40):
and the actual way that theforwarding function is is very,
very different.
As you probably know, the way Iforward in AWS is drastically
different in the level of toolsthat I have versus Azure, versus
GCP, so it's much, much moredifficult.
And with many, many analystssaying, oh, you should go to
multi-cloud or you should havemany, many different accounts

(06:01):
for separation, that sounds nice, but the challenge of actually
being able to one figure outwhere my packets are going, if
they are moving correctly and ifthey're taking the best path,
is extremely difficult.
And I've not seen any othertools out there.
There's a lot of other toolsthat will tell you, give you
some visibility into yourinstances, into your databases.

(06:21):
Very few that I've seen willactually do it on any sort of
network and transport layer.

Tim (06:26):
Yeah, definitely.
It's interesting what you'vementioned about how AWS
separates control plans and alsovisibility essentially between
accounts, because a lot ofpeople are like, oh yeah,
observability is tough, but youhave CloudWatch and they give
you all these tools, but thenthey split them all out from
each other, even within the samecloud, and it becomes a

(06:47):
Herculean effort to try tostitch it all together across
the different accounts, you know, and whatnot as well.

Craig (06:53):
Yeah, and it's just a very different paradigm from how
we do things on oops, I shouldhit myself using that word than
how we do things on-prem.
Because, yeah, I mean, I don'tcare how many applications,
clients and tenants I have, it'sall going through the same set
of infrastructure.
When I'm dealing with anythingon-prem, the separation just
makes it extremely difficult forme to figure out anything in

(07:14):
there.
And one, even just dealing withyou know people that really may
not be as familiar with how theforwarding works and that it
does just change quite moreoften than I would like to see
versus on-prem.
You know when I started, youknow 25 years or so ago, you
know speeds and feeds havechanged but forwarding is
roughly the same.
And you know, if I would do AWSfive years ago, I'd be woefully

(07:35):
out of date and things wouldn'twork anywhere the same way.

Chris (07:37):
Yeah, I mean to your point about you know the way
forwarding works differentlybetween the clouds, like the
monitoring piece is alldifferent as well.
So it's like in AWS you're anetwork engineer, but you have
to come up with this complicatedAthena query in order to
actually come up with what yourtraffic's doing on a day-to-day

(07:58):
basis, but that doesn't reallyapply to Azure, GCP any other
clouds you could be in.

Craig (08:04):
It doesn't apply and I don't get anywhere near the same
tools, and I will applaud thecloud providers for starting to
get a little better on this.
But it's not like I can go loginto a transit gateway and do a
trace route to figure out wheremy path is going and hopefully I
don't have things that areblocking my standard ICMP tools,
which is where you do get otherdata plane observability tools,

(08:24):
which is great for figuring outlatency and if things are
working or not, but it doesn'tactually tell you if something
isn't working where the problemhappens to be Well.

Tim (08:31):
And if you do something like, say, VBC reachability
analyzer, you know, God forbidyou have any non-AWS
architecture anywhere in the wayyou're done.
Right, Like that's it.
You're not going to use thattool at all.
Yeah, I mean, that's Like,that's it.
You're not going to use thattool at all.

Craig (08:43):
Yeah, I mean that's.
That's of course the pitch, youknow.
It's like oh, you want to usesomething else besides AWS
network firewall?
I'm sorry, yeah, you're out ofluck.

Chris (08:53):
It's kind of like the.
It's kind of like the Appleapproach.
I feel like when you're, whenyou're fully in the ecosystem,
like things work and thing looksall hging their shoulders like,
well, you could buy our shit.
And then you know, we know thiswill work.
But you know, I mean majorityof network people in the cloud
are using some type of thirdparty firewall for integration

(09:13):
and stuff like that and thatvisibility is completely lost
Exactly right, like you can see.

Craig (09:18):
oh, I'm hitting this.
You know ENI interface here tomy traffic and you know, cross
my fingers that whatever thatthing is doing is doing it
correctly.

Tim (09:26):
Yeah, I mean, people open tickets all the time for the
forwarding problems.
Chris is killing something overthere, but yeah, you know all
sorts of forwarding ticketswhere you have to get support
involved, because they'reliterally the only ones that can
actually see what are happeningto any of the packets.

(09:46):
Right, your VPC flow logs mightsay everything's great, but
there's still a problem.
And you go to find out that.
Oh well, you know, thisavailability zone is having an
outage at this time and you knowso.
The tools they give you areopaque, but also not I don't
know.
I don't feel like they're trulyreal-time or like they don't
dig far enough to really giveyou the visibility you need.

Craig (10:07):
Yeah, I think that's exactly right.
I'm reminded when I used towork in operations, the answer
to those sorts of problems in adata center or a campus was, you
know, let's get the sniffer outand let's see that.
And it's the same sort ofissues.
You can look at flow logs, youcan try to generate packets and
it's sort of going to tell yousomething, but it's not going to
tell you if you're taking agood path or if I'm hitting any.

(10:29):
It might tell me that it'sworking or not working, but,
yeah, it's only going to beworking if that traffic is
actually working at this time.
As an engineer, when I'mtroubleshooting, I don't always
have the application running atthe time that I'm doing it, or
I'm trying to pre-do it beforethe application's even on board.
And that's really the problemstatement that we set out to try
to solve is to be able to notjust observe what's going on on

(10:52):
the network in the cloud rightnow, but to model all possible
flows and anything as it mightbe traversing the environment.

Tim (11:00):
Yeah, and one more thing that is in the cloud that we
haven't thought about or not,that we haven't thought about,
but that we didn't have toreally think about that much
on-prem but we absolutely haveto think about in the cloud is
the cost of troubleshooting,like the visualization, the
observability costs of doing VPCflow logs, a reachability
analyzer, port mirroring, rightPort mirroring is all these

(11:23):
things require.
All of the port mirroring,right Port mirroring is all of
these things require.
You know those all have costsassociated with them and you're
doing it in every cloud that youexist in right.
Like, if you're doingtroubleshooting in AWS, you're
probably also doing it in Azureif you're there as well.
So you're paying for that aswell.
You know you're double, triplepaying for a lot of these tools
and it's something you neverwould have thought of on-prem

(11:44):
right.
On-prem, you stroke a check toI don't know, solarwinds or
whatever the hell tool thatyou're using and then you just
keep it.
Once a year you're paying thefee and you're good, right, but
here it's consumption-based.
So you know, every time youhave a problem, you're
essentially paying money tosolve your own problems.

Craig (12:02):
That's exactly right.
Like I remember, before I wasat Forward I worked for a
network packet broker companyand that was the line as well.
You know put taps and spans onevery place in your network and
as you're probably aware, that'scompletely infeasible and it's
exactly the same thing on thecloud.
Because it's just one, it's fartoo expensive and two, it's
just far too much data to doanything with.
I mean, I can't, you know, evenif I tap, you know, if I, if I

(12:29):
want to look at all North, southand all East West traffic,
that's a massive amount of dataand it's just too much to
actually be actual actionable inany way.

Chris (12:33):
Well, it's funny too, because you know, with with the
introduction of cloud, it waskind of this whole.
One of the big selling pieceswas like hey, you get to start
from scratch Pretty much.
You get to build an idealarchitecture that that's built
for the cloud, it's built inthis, but it's never, ever that
way because of costs.
Like I, if like it's it's,we've inherited that problem

(12:54):
like a tenfold in the cloud justas much as on-prem, whereas,
like we've, we've had to putthings in these, in these
centralized type architecturesand and you know, then then if
it's a multi-cloud flow, thenyou're doing it centralized in
two different ways.
You're paying for the storage,you're paying for the monitoring
on everything.
It just like it's exacerbatedlike like crazy.

Craig (13:14):
Yeah, you're precisely right.
Like to the account example,I've seen people get started
with with simple architecturesand you know a handful of VPCs
to some customers I work withthat have literally you know 800
accounts and you know different.
You know different transitgateway per account and they're
all paired to each other andshared and it's like this is a

(13:34):
nightmare.

Tim (13:35):
Yeah, that sounds truly awful, yeah, and.

Craig (13:40):
AWS's solution is like to your point oh, just use Cloud
WAN or just use VPC status orsomething like that.
But you're kind of just maskingthe complexity which, to be
fair, we've done the exact samething on-prem.
You know, if I have too muchcomplexity in my environment,
put an overlay on top of it Iget it, make a fabric 100%.

Chris (14:00):
So that's obviously we've talked about this for about 10,
15 minutes at this point.
So we know there's a problemright, there's something to
solve.
So let's kind of pivot here totalk a little bit about forward
networks and how you guys areaddressing this and solving some
of these problems.

Craig (14:19):
So to the example I said before where, if you look at all
of your devices on-prem, nomatter what the vendor happens
to be, they all basically do thesame thing.
Whether it's a switch, router,firewall, load balancer, you
know a packet comes into aparticular input port, that
device does something with thepacket, you know does a header
rewrite, does a macro, adds aheader on top of it, whatever it

(14:40):
happens to be, sends it out toan output port and that's all
the device does no matter whatyou know.
So it processes through anumber of tables and goes
through that.
So our when we started at Fordthat's we, you know we all came
from, you know your Cisco's andplaces like that, so we're very
familiar with that problem as westarted moving along.
Obviously we don't want to haveholes in the way we model the

(15:02):
network, because once you takeall of that data you want it to
be easily searchable andnormalized.
So you don't have to be, youknow, a pure expert.
You know I'm pretty good atmost Cisco devices, but you know
Juniper's and Paolo's andFortinet's.
You know I would reallystruggle on that.
So being able to understand theforwarding characteristics and
the key to that is notnecessarily tell me what's the

(15:22):
actual traffic that's goingthrough that device, but tell me
you know what's the.
You know if I'm looking for aparticular source, destination,
ip with support characteristics,tell me how it's going to pass
through all those devices basedon the current RIB, fib,
everything, all the tables onthat device the innovation that
we had is essentially the publicclouds work the exact same way.

(15:43):
Now they all have their ownlittle quirks to them.
But when I go like, if I'mleaving my data center and I'm
going off of an express route ordirect connect, that's going to
connect to a VGW or a TGW or aVNet gateway.
Once I hit that TGW, it hasessentially the same things.
It's going to have a number ofit's going to have this that go
in and out of it.

(16:04):
It's going to have a number ofit's going to have this that go
in and out of it.
It's going to have peeringconnections and associations.
It's going to have a number ofroute tables which function very
much like VRFs.
As you pass through each ofthose constructs, you go through
a TGW that's going to connectto a VPC somewhere, or that VPC,
more specifically, is going toconnect to a route table that's
going to have connections into asubnet and that's going to have
EC2 ENIs attached to it.

(16:25):
And we saw that, okay, well, wealready have the concept of
doing this on-prem.
It's very easy for us to extendthat and at the same time, we
can take everything that we have, because I can say well, I
don't just need to collect fromone account.
I can collect from multipleaccounts using just a simple IAM
role.
I can do it across multipleregions.
The key insight that we had toyour point earlier is well, wait

(16:47):
, if I do have something like aPalo or a Fortinet or whatever
firewall inside my cloud, that'sjust a collection of a couple
of ENI, maybe IPsec tunnels,whatever they happen to be as
the traffic hits that ENI, I canprocess that firewall just like
it was a my end-to-endconnectivity, just by putting in
that source destination IPaddress.

(17:08):
It's going to tell me how itpasses through each one of those
constructs.
That was really the keyinnovation and that's really
where I got.
My start is when I starteddoing any sort of cloud thing,
probably around 2020, 2021, Ididn't really know very much at
all but using the ability tocollect from these and like,
okay, well, when I'm trying toconfigure anything on AWS, if I
want to set up a file, how'sexactly it's forwarding?

(17:30):
Did I configure this routetable correctly?
Did I configure this thing?
Did I configure the securitygroup?
There's a hundred differentplaces to check inside the cloud
for any number of things.
Being able to step through itstep by step and, to the point
earlier, not have to log into abunch of different accounts, a
bunch of different regions, madeit a very, very key insight
that we were able to use.

Tim (17:50):
So no, that's great.
So what I'm hearing isbasically with Forward.
Now, forward got startedon-prem, but this is now an
add-on.
Basically, now You've expandedinto cloud, the idea is that you
can onboard your accounts toForward's to to forwards uh, I
guess it's a, is a, it's aplatform right, it's a appliance

(18:12):
or yeah, okay, um, and thenbasically because of that,
forward can go reach out to allof your accounts, uh, in
whatever clouds.
Also, I assume there's alsogoing to be like login
information or or or whatnot, tohit um firewall, any
third-party isVs, like firewalls, cisco, whatever.

Craig (18:30):
Yeah, API CLI, whatever it happens to be.

Tim (18:32):
However, you can access it right, and that's going to help
pull all that in, and becauseyou get all that data and
because the options are limited,in which case how traffic can
be forwarded, you know how it'sgoing to work.
You can just model, right, youcan just model.

Craig (18:52):
It doesn't actually send the traffic, but you can model
the entire path because you knoweverything the packet's going
to do.
Basically, yeah, and that'sreally the cool part is, when
you start looking into thislevel of modeling, it's not
anything you couldn't doyourself.
Like, when I log into router, Ican look at the fib, I can look
at the route table, I can lookat all the ACL tables, I can
look at all the cam tables, Ican look at any MPLS forwarding,
any labels that are gettingpushed about.

(19:12):
These are all things that youcould do and probably have done
many times, not really that muchdifferent.
On the cloud, I'm using EC2APIs to grab the transit
gateways, transit gatewayattachments, transit gateway
route tables.
Now, the cool thing about thisis this doesn't have a cost.
This is completely free to grabany of that from AWS.
There's no observability cost.
This is just the same APIs thatwhen you log in the console,
you're seeing the exact samething.

(19:33):
We're just grabbing it from anAPI basis to show you that.
So it makes it be somethingthat you have right there and
it's also something you cantrack over time, which is one
thing I really like because, yesto your point, you could look
at CloudTrail logs and seewhat's changed.
You could look at CloudWatchand see if something is not
working.
But trying to go back in timeto say, hey, what was the state

(19:54):
of the forwarding information inthe cloud two days ago, a week
ago, between changes, that'sreally really not easy to do,
and not easy to do in a way thatmakes sense to a network
engineer Like I.
Can grab API output fromdifferent points in time, but
it's a lot of data to parsethrough and not in a way that
makes a lot of sense for anetwork engine.

Chris (20:10):
So it sounds like there's kind of I mean, the way I'm
thinking about it there's kindof two pieces here.
There's either the predictiveanalysis, where you're judging,
based on the existing controlplane, at a certain snapshot in
time, if a packet were to gothrough this network, how is it
going to get from A to B?
You can predict that.
Is there a bit of a postpartumtype analysis piece to this as

(20:33):
well, where you are looking atflow logs and you say like, okay
, actually this packet did comein and you followed the
trajectory in a kind of a, likeI said, a postmortem analysis
manner versus a predictivemanner?
Is that?

Craig (20:44):
possible.
So the postmortem side isabsolutely doing kind of diffs
between before and after thingsand saying, ok, between two
points in time.
Here is not just what changedfrom routing, security basis,
but also on an intent basis.
So all those things.
If I have a particularapplication or a flow that says,
hey, this application exists inmy data center A and it goes,

(21:07):
you know, hops between region Band maybe it goes to a different
cloud, whatever.
That's an important applicationthat we have imported from
those kinds of flow logs.
And then every time that adigital twin like Forward takes
a snapshot of the environment,it's always checking those
dozens or hundreds of things tosay, hey, is this changed?
Is it always taking theshortest path?
Are there loops in the network?

(21:28):
Is there anything that wouldstop the connection between all
of those?
And because it's analyzingeverything along the path layer
two, layer three, overlays,firewalls, mpls, forwarding all
the way up to the cloud it'sgoing to tell you if one, if
it's changed, and two, if you'vedone something to break the
connectivity.
So you can see that postmortemlevel like, hey, this particular
flow doesn't work anymore andit's because somebody changed

(21:50):
the security group or somebodyyou know.
You know it's just anunassociated something from the
transgate or whatever happens.

Tim (21:56):
So how, I guess, um, the big problem with, uh, cloud
observability we were talkingabout, of course, is the cost of
doing that analysis, which fordtakes care of, because you know
it's just predictive,predictive modeling, but also
the storage, how much data?
I mean, it sounds like you havea pretty, because you're doing
diffs right Of some kind.
So what does that data storagemodel look like?

(22:20):
Or how are you storing the dataor doing the data?

Craig (22:24):
So there's two pieces of data you're dealing with.
One is kind of the collecteddata, which tends to be API data
from anyone.
Now, this is metadata.
So we're not talking aboutgoing into an instance and
grabbing all of the data aboutall of the instance storage or
anything like that.
This is going to be metadata on.
You know, in my VPCs these arethe entries for all my route
tables, these are theassociations for all the subnets

(22:44):
, these are all the ENIs thatare attached to the subnets.
So in a large account it can be, you know, dozens or hundreds
of megabytes, but we're nottalking gigabytes and terabytes
of data that you're grabbing Nowon the backend.
Once that data is gathered, thenthere's kind of derived data.
The IP is what they call amathematical model.
It crunches all of that,figures out all of the literally
quadrillions of possible flowsbetween all of the places, and

(23:08):
that's really kind of the key towhat Forward does a little
differently is because there areother things out there, like
you've got local stack out therethat tries to emulate some AWS
things and you've got thingson-prem that try to emulate what
actual devices would be.
You run into that kind of hardlimit when you want to emulate a
large environment.
So that's where modeling reallycomes in is because we're

(23:31):
modeling, you can, you knowscale up to, you know 50, 60, 70
, 80,000 devices and you knowmany, many, many hundreds of
accounts, without an issue,because all of that data is
derived and it's just based onmathematical crunching, it's not
based on each individual deviceand looking at all those quirks
, because it's all normalized.

Chris (23:49):
Yeah, so on this, you did mention digital twin a moment
there and the concept ofmodeling.
So should we, should we kind ofmaybe define exactly what
digital twin means to forward inthis context, and I'm assuming
this is something thatoriginated in the on-prem
product and now has moved intothe cloud, right?

Craig (24:07):
Yeah, it's a term you'll see a lot and people use it in a
lot of different ways.
The way we sort of define it isa way to what we said before to
essentially take the exactforwarding characteristics of
any device that you have andessentially turn it into a
common model.
So if you're familiar with whatOpenConfig used to be which is

(24:29):
still around, of course there'salready this concept of all of
the devices have a sort ofcommon model.
Now, some devices are betterthan some vendors are better
than others about conforming tothat model and, of course, the
cloud providers aren't very.
So we've taken that andextended this a little bit to
say what's common across all ofthese and by turning it into
this open config plus extensionskind of common model, then we

(24:51):
can use that to figure out theforwarding characteristics and
then you can simply query thatdigital twin to do what I said
before.
You know, tell me the source IP,this destination IP, with these
ports, protocols, app IDs, urlfiltering, whatever you have,
and it's going to tell you theexact path that it takes
overlays, underlays, whatever ithappens.
On top of that, because youalso have that same sort of

(25:14):
common model, now you can alsoquery it not just based on flows
, but you can also query itbased on configurations.
You know, if you want to dogolden config checking, if you
want to look at security groupanalysis, all sorts of things
that sort of get layered on topand that's what's kind of
changed over time, like when Istarted most of us were in the
network troubleshooting sort ofbusiness, and it's really
expanded a lot.
And because a lot of people aretrying to, you know, figure out

(25:35):
inventory compliance, thingslike that, but when you have it
all in one place, you can queryit, not based on here's all of
these individual vendorcharacteristics, but just tell
me, you know, something simpleacross all of my clouds is very
easy.

Chris (25:47):
I can definitely see.
You know, there's kind ofprobably a I don't know if I'd
call it a greater level ofdifficulty, but definitely
within on-premises this isobviously much more intricate.
Right, it's going to pullcontrol plan, it's going to look
at forwarding bases, thingslike that, and you know several
different vendors in the mix, Atleast I would think, with the

(26:31):
move to cloud, a lot of thisbeing API based.
Did you know?
We've probably all heard thestruggles of the major
networking vendors supportingAPIs?

Tim (26:34):
use really any constructs that you're able to use are
purely limited by what the CSPexposes to you, right, and
because of that, 99% of the timeit's purely control plane.

Chris (26:40):
You're not looking at actual data plane stuff.
Does that change what you dowith the concept of a digital
twin or this kind ofpredictiveness at all?

Craig (26:48):
So when you start looking at data plane, those kinds of
things are really more of anenhancement on top of what
you're looking at from the fromthe modeling standpoint.
So once you figure out all ofthe possible flows, then you can
figure out what the actual youknow low drops, things like that
are on a per link basis.
And what's handy about that isI'm not just seeing here's all

(27:10):
of my hundred thousand links inmy environment telling what the
load is.
I can actually start totroubleshoot based on an
application or a per flow basis,like if I know, you know, my
this one application is going totake these 15 hops in my
network and it's going totraverse these links.
I can see on a per app basiswhat's slowed down, what's being
dropped, what's anything thatyou that you would that would

(27:32):
cause issues there.
And you can also see it from anoverlay underlay basis, because
if you're just hitting tunnelson top of it, of course you know
VXLAN or whatever it's going tohave a VTEP to VTEP.
You need to be able to see thatas well.
So yeah, those things.
That's a pretty well wellversedvendor space.
So there's no really need toreinvent that, but overlaying
that data on top of what'spossible in the network winds up

(27:53):
being very useful.
Where the digital twin becomesmore useful as well is when
you're pre-provisioning thingsor, like you said before, that
sort of post-mortem analysis,where this is not my high
transaction volume right now,but I need to be able to see
what the path is before I do theapplication or afterwards to
see what's going on.

Chris (28:10):
Right.
That makes sense I imagine whata lot of people are using this
for is like, hey, I have anupcoming change, right, and I
need to, you know, move this VPCor you know, start advertising
this route from on-prem, orsomething like that, and they
just want to see what's going tochange right.

Craig (28:33):
Is that, would you say, whether your customers are
extracting the biggest amount ofvalue from the product?
Yeah, it's where you get a hugeamount of value.
Is doing that sort of pre andpost change analysis?
Because, like I said before,those sort of flows that are
important to my network, thosesort of intent checks I have
those predefined in the networkor just created on the fly, and
whenever I do my change,wherever I close it out, I can
verify.
Here's all the checks that Ihave, here's all the
configuration checks, here's allthe intent based flow checks.

(28:54):
Are all those still functioningso that you can close out
whatever change record you do ina more you know, in a more
holistic way?
And you have, you know we liketo joke, we call it meantime to
innocence, where it's basicallythe network is definitely not
the problem because I canmathematically verify what the
flows are inside my network andthat's not just on a manual
basis.
We have hooks into Ansible andI have a Terraform provider

(29:19):
written that will let you dothat on a pre-post change basis
there, because we know mostpeople in the cloud probably
aren't doing manual changes likewe're still doing on Brim.

Chris (29:27):
You'd be surprised.
You'd be surprised More thanyou think.
That is true, you're not wrong,you're not.

Craig (29:32):
You're not wrong on that, but it's an ideal so talk to uh
.

Tim (29:38):
Tell us a little bit about the uh journey, if you will,
like.
This is a new.
It's not new, right, but it's anew where it's.
It's not where Ford started.
You brought in cloud, so howdid you go from like hey, we
have no cloud whatsoever, to youknow?
Actually, I'm kind of curiousnow how far I guess Ford has
gotten in its cloudobservability journal journey.

(30:00):
Maybe that could be the lastlittle bit that you tie up with.

Craig (30:03):
So where it kind of came from is you know when you're
starting out in a particularpiece of the network.
You know when you start outwith just modeling the data
center or modeling a campus, thequestion always comes up where,
okay, you know, my applicationsspan multiple places, so when
are you going to be able tomodel you know, not just that
part, more my SD-WAN provider oranything.

(30:25):
That's an overlay.
So it was kind of anincremental journey to say
here's more and more things thatwe can model.
Now we can do end-to-endeverything in the data center.
Okay, now we can add yourwireless piece if you want to,
we can add your SD-WAN toconnect all of your sites and,
as other SD-WAN providers gotinto, sort of, hey, we can
connect you to any one of yourpublic clouds as well.

(30:50):
It became very obvious andpeople were like well, yeah,
we're doing a.
You know everyone's somewherein a cloud migration, either
going or coming in some way oranother, and you know if they're
doing multi-cloud.
That's a whole other thing too.
So it became very obvious thatto really extract the best value
and give people that full, youknow you don't want to, when
you're trying to do anend-to-end path modeling, having
a hole in the middle winds upbecoming a real, you know, very

(31:11):
sore spot.
For people it's like, well,yeah, I can go up to this hole
here, and then I'm kind of stuckthere.
To your point earlier, ifyou're using native AWS tools,
you get to that hole whetherit's a third-party firewall or
something third-party overlaywhere it's just well, I can
check to this point, but that'sabout it.
So it became very obvious thatthat hole was something that
needed to be filled.
So you know, we started,started on AWS, added Azure GCP,

(31:33):
we're adding Oracle soon.
So, yeah, there's ways to addmore and more things to that to
get the most out of it.

Chris (31:38):
It's just a natural progression to start filling
gaps, right?
You want no holes?
Yep, exactly right, no holes inthe network.

Craig (31:43):
Exactly right.

Tim (31:45):
No holes in the network Actually.
So what did you?
I'm kind of I'm curious abouthow, like CloudWind, like
CloudWind obviously came outwhat like two, two and a half
years now, when it went GA.
So what was the?
Because this is justfascinating to me in general.
So, like, how did it go from?
Like hey, here comes CloudWindto you know, okay, forward has

(32:06):
to write Like, what was the?
Like what did you actually haveto do to start supporting
CloudWind?
Like, did you have to go figureout like all the APIs?

Craig (32:14):
basically, yeah, and that's really the tricky part
and that's really what kind ofseparates us from most.
It's not, and same AWS oron-prem as well is trusting what
the documentation says is justa recipe for disaster.
You have to packet test all ofthis.
You have to check every featurethat they have and you have to

(32:34):
packet test every bit of itbecause you have to know exactly
how it forwards.
If I have a VGW connected to aTG, does it actually forward
between each other when it'sconnected to a Direct Connect
gateway?
We've added things thatcustomers have that even are
hard for us to get access to,like AWS outposts and things
like that.
So, yeah, there is, it takesone.

(32:55):
Yeah, the APIs, fortunately,are very public.
Aws is really good about givingyou APIs, azure slightly less
so.
Like they don't give you asgood forwarding characteristics.
Like, if you want to figure outwhat's the routing table
between any VNet, you have tohave a VM attached to it.
Look at the effective routes,effective security groups.
So it's a little bit more of apain there.
But yeah, it takes packettesting every bit of it to make

(33:17):
sure that we know exactly howit's going to forward in all use
cases.

Chris (33:21):
Not to go down any specific rabbit hole, but how
does it par with VMC?
Like all the VMware cloud stuff?

Craig (33:33):
Yeah, I mean, that's just .
That's just NSX T.
So NSX T is an overlay.
We support that just the same.
So, yeah, whether that'son-prem or on the cloud, yeah,
it's going to use all of the youknow.
If it's anything, um, you knowon-prem ESX or whatever, that's
going to use vCenter APIs.
And then the NSX team manager,nsx T edge, that's going to be
just another overlay on top ofit.
So anytime you're doinganything there, yeah, it's going
to have that.
And yeah, when you have anyother firewalls, you know we

(33:54):
support gateway load balancerconnections to those so you can
model that as well.

Tim (33:57):
We'll have to see if we can get a demo at some point, cause
we've done demos before and putthem on our YouTube channel and
stuff.
And yeah, I I'm having troublevisualizing it, but but you've I
mean I say that now, but you'veactually you've actually shown
me when we were at like RSA orone of the other shows and it
looked really cool.
I think that's how we firststarted having the discussion
about it, but uh, and that'swhat I've kind of.

Craig (34:18):
Uh, when I talk to people about this, um, a lot of people
are reminded with the old likelike, like Packet Tracer and
things like that where you cando some level.
So the idea behind this isn'tparticularly new.
It's like almost everyone saysI wish I had this 10 years ago
or 15 years ago.
And, yeah, there have beensmall lab versions of this sort
of technology with emulation orwhatever.

(34:40):
It's the scale that reallymakes a difference and the
support for pretty mucheverything out there changes the
game that we've seen.
And the support for pretty mucheverything out there changes
the game that we've seen Because, yeah, I mean being able to see
your entire network offline andtelling what the path is of
anything is hugely usefulBecause, honestly, I mean I
spent much of my career doingexactly the same thing and just,
you know, pings and traceroutes, logging in this, hop to

(35:00):
this, hop to this hop, figuringout where it is.
And now, yeah, with the cloud, Ican't really do that.
Like I said, I can't go loginto a transit gateway and look
at the table.
I mean I can pull some APIs,like just recently at Summit DC
in the community day in the BayArea.
I did a session on VPCreachability analyzer one and

(35:21):
it's fairly painful.
It's got some pretty starklimitations on what you can do.

Tim (35:25):
Yeah, when we were writing the book that Chris and I were
working on, the PACT oneReachability Analyzer was one of
the ones I had to play with.
My section of the book wasobservability, and so I got real
used to understanding what thelimitations are for any business
limitations on observabilityand yeah, I mean Reachability

(35:46):
Analyzer.
It's one of the things where itworks, when it works, yeah.

Craig (35:50):
If you know the limitations.
Yeah, absolutely yeah.

Chris (35:52):
Yeah, because there's like, because they have
reachability analyzer, thenthere's also route analyzer, and
then I think the transitgateway network manager TGNM.

Tim (36:01):
Yeah.

Craig (36:02):
They have several different analyzers and they
don't really work together.

Chris (36:05):
That's the thing, analyzers, and they don't really
work together.
I think there's one for AWSNetwork Firewall too.
There is now.

Tim (36:11):
But, like you said, I think it's one of those things where
AWS is shipping its org chart ina way, because that's how you
got ahead at AWS was you come upwith a new service?
So you'd have a bunch of peoplecreating new services, and I
feel like all of theseobservability services are just
like little pet project, likethings that became, you know,

(36:32):
and that's why they don't.
They don't work together.
Basically, you need somethinglike Q or whatever they're
saying you know to, to stitchall that data together for you.

Craig (36:40):
Yeah, not not to get on a rant there, but yeah, every
time I go to one of thosesessions or one of those places
at reInvent or something, it'slike you know, I'm struggling to
find networking content in anyway.
It's just like give me one.
It's like I get it.
This is a developer conference,but it's like give me something
here, guys.
It's like I'm just trying tofill my schedule with any of the

(37:00):
networking content.
It's not so easy.

Chris (37:03):
Yeah, I couldn't agree more on that one for sure that
was rough, maybe because peopledo networking talks and they put
them at the other end of theVegas strip in the top floor of
the main.
That was rough Finding my roomwas hard.

Tim (37:15):
I actually I'll be completely honest, I did not.
I've been to the Mandalay like20 times when I worked at Cisco
Obviously, we used to haveimpact or GSX there and
everything.
And it was this time going tofind my room that I realized
that it was a third floor to theconvention center.
Because I've never gone upthere in all the years that.

Chris (37:34):
I've been there.
I didn't know there was a thirdfloor.
It was the first time I'veheard about it.

Tim (37:39):
You know they have the little meeting rooms.
It was a nice room.
To be fair, it was a nice roombut it was hard.
I'll be honest.
It was hard to find and Iappreciate you coming out to
support me for that.
But yeah, I mean of the of thethree other networking talks
that were there.
It's funny.

Chris (37:51):
There's there's only so many, and then one of them is
always a repeat of the lastyears.

Tim (37:58):
So it's just like.

Craig (37:59):
So it's like, yeah, we're , we're, we're we're definitely
bringing up the rear there andyou know it's.
You know, and I think to yourpoint is it's very much the way
they're incentivized there'salways a new service that
they're coming out with andsomething that is not really you
know.
It's kind of orthogonal.
You know you've got theirminimum viable product out for

(38:19):
everything else, so it's like,well, we'll bring something else
and it's like, okay, well, yeahall.

Tim (38:26):
Yeah, I am curious to see how much more, because every
time I think that they've gottento the point where, like, all
right, you guys are probablyexposed about as much as you can
without impacting thehyperplane.
Like they managed to scrape alittle bit closer to the skin.
But yeah, I'm curious to seehow much further down the stack
they can go to make availablefor people before they impact

(38:46):
themselves.

Craig (38:47):
Basically, yeah, I've noticed they're trying to move a
little more into observability.
They put out kind of a flowchecking system where it will
kind of do a service assurancesort of thing.
So we'll see how much they gointo with that.
But I tend to agree, they'veonly kind of uncovered as much
as I think they're probably ableto do at this point.
They've only kind of uncoveredas much as I think they're

(39:07):
probably able to do at thispoint.
You know it was definitely ayou know trust sort of it works
mindset which you know.
And they definitely are pushingthe more native services which
I totally get.

Chris (39:16):
Yeah, of course.
Yeah, it's just yeah, greg,thanks again for coming on.
I think this was a cool, a verycool conversation.
We might have to pull you backin and do a demo.

Tim (39:25):
Yeah, I would love to see a demo.

Chris (39:27):
Prepping for this, I actually did watch one of the
Forward Networks videos onYouTube about preparing for an
audit and you're kind of showingthe visibility piece between
AWS Transit Gateway and doing afirewall VPC and then looking
for VPC pairings.
It was really cool stuff.

Craig (39:43):
Might have been me doing it.
So yeah, yeah, it was.

Chris (39:47):
I will say whatever camera they were using when they
were filming you guys talking.
It was a very good qualitycamera.

Craig (39:53):
It looked very nice.
Oh, I give my regards to thevideographer.

Chris (39:57):
Yeah.

Tim (40:00):
Actually, where can people find you Go ahead and plug
anything you want?
Yeah?

Craig (40:04):
so you can find me on most social medias.
I'm at at captain packet, soblue spy x linkedin, xbox live,
you know ps pro, you know, soeither one of those places I'm
I'm pretty well, I'm on the uh,the, the, the discord as well,
the all about the journey one,so I'm pretty active on there as
well.
So, yeah, you can find me atmost of those places.
So, um, but yeah, I'm alwaysavailable there.

(40:25):
So yeah, awesome.

Chris (40:26):
We'll have to put a link to your uh peer talk in there as
well um, yeah because Iremember watching that when that
was good as well.

Craig (40:32):
So yeah, yeah, that's true.
At the reinvent I did a coupleof peer talks and that was a lot
of fun, that's right sweet, allright.

Chris (40:37):
Well, that'll do it for today.
So thanks again for joining usfor another edition of the
cables to clouds podcast.
I think this is coming out inthe first week of january, so
hopefully you had a great newyear and maybe by now you're
starting to receive the book Timand I wrote together.

Tim (40:54):
Maybe we'll receive it at some point.
Yeah, maybe we'll get one, whoknows.

Chris (40:59):
But with that we'll wrap it up and thanks again and we'll
talk to you next week.
Bye-bye, Hi everyone, it'sChris and this has been the
Cables to Clouds podcast.
Thanks for tuning in today.
If you enjoyed our show, pleasesubscribe to us in your
favorite podcatcher, as well assubscribe and turn on
notifications for our YouTubechannel to be notified of all

(41:20):
our new episodes.
Follow us on socials at Cablesto Clouds.
You can also visit our websitefor all of the show notes at
cables to cloudscom.
Thanks again for listening andsee you next time.

All Episodes

Episode Transcript

Popular Podcasts

Stuff You Should Know

Dateline NBC

On Purpose with Jay Shetty

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}How Do We Solve the Cloud Visibility Problem?

Episode Transcript

Popular Podcasts

.css-r6mb8g{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:1;overflow:hidden;}Stuff You Should Know

Dateline NBC

On Purpose with Jay Shetty

All Episodes

How Do We Solve the Cloud Visibility Problem?

Stuff You Should Know