Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Speaker 1 (00:00):
This is the Art of
Network Engineering podcast.
In this podcast, we'll exploretools, technologies and talented
people.
We aim to bring you informationthat will expand your skill
sets and toolbox and share thestories of fellow network
(00:21):
engineers.
Welcome to the Art of NetworkEngineering.
I am AJ Murray and tonight I amjoined by a very special
co-host.
He is Tim McConaughey from AtCables to Clouds.
Tim, good to see you.
Speaker 2 (00:33):
Hey man, it's good to
be here.
It's been a while.
Speaker 1 (00:36):
Yeah, it's been a
while since we've had you on the
show.
Our guest tonight is Sean Ligon.
He is no stranger to the showeither.
He's been on the show a fewtimes.
Sean, thanks again for joiningus.
Speaker 3 (00:45):
Yeah, thanks, guys,
appreciate it.
Tim, thanks for having me.
Speaker 1 (00:48):
So Sean is a senior
product manager for data center
at Juniper Networks and we'rehere to talk about AI ops in the
data center and you know I gotto tell you up until now we've
heard a lot about AI ops andenterprise, particularly around
access layer, wireless and stufflike that.
Is AI ops ready for data center?
Speaker 3 (01:06):
Yeah, that's a good
question, and I love approaching
the subject with a healthyamount of skepticism.
Speaker 1 (01:13):
You know, I try I try
Absolutely, as you should.
Speaker 3 (01:17):
Quite, frankly,
that's how we approach it
internally as well.
You know, look, you know it's alittle bit cliche, but you know
it's a bit of a journey.
So, as we say, is it ready forthe data center?
Yeah, I think it is.
We're seeing some things thatyou know we're bringing into the
data center within Jutiper.
That, you know, we think iscompelling, but it's also a
start, right, this isn't like,hey, we've launched a few things
(01:39):
around AIOps, like put a bow onit and let's call it a day
Right For me.
I really kind of look at this aslike a tool in the toolkit and
so I think, like that's the waythat I would recommend folks to
to kind of approach it as well.
Right, it's not a you know it's.
It's not a magic wand You'regoing to wave that makes all
your problems go away, no matterwhat.
You know, here I am as a vendoron a podcast telling you that,
(02:01):
right, you know it's not a magicwand, but you know, look, we
think that there's somethingthere and, like I said, it's
kind of the start of the journey, you know, around AIOps and the
data center.
So, yeah, happy to you know,looking forward to the
conversation and for the folksthat are familiar, like Mist and
Marvis and kind of what we'redoing there you know, I, myst
(02:22):
and Marvis, and kind of whatwe're doing there you'd all be
able to kind of relate somestuff to what we've done in the
data center.
It's similar in a lot of ways.
Speaker 2 (02:28):
I've heard some
really good stuff about the AI
in your Myst product, so I mean,if it's the same general idea
under the hood, it's probablypretty good, I imagine.
Speaker 3 (02:39):
Yeah, it's the same
idea and, like, the use cases,
though, are a bit different,right, you know the way I kind
of think about it.
You know, when you think aboutlike the campus and you know
it's like you know, especiallylike on Wi-Fi too, which you
know caveat, like I'm not awireless guy, I like packets on
the wire you start doing themlike RF and I get lost.
I'll leave that up.
(03:02):
You know, if we think aboutlike you know, on the campus
side of the house, right, it'skind of like okay, I have a
bunch of clients and thoseclients are trying to connect to
applications that existsomewhere, right, but the idea
is like you don't really youmight control those applications
.
You know they might reside inyour data center.
They might not, right, theymight be like some SaaS servers,
or it just might be generalinternet traffic going out to
other things, and so you know,there you're really concerned
(03:24):
and looking at you know, like,how the client connects back to
these different services, and inthe data center I think it's
quite a bit different wherewe're there.
You know you're saying like,well, I have these services that
I'm advertising out and I haveall these other clients that I
don't necessarily controlconnecting to my services, right
, it's the opposite side of it.
And so you know a little bitabout kind of like what we're
(03:45):
doing, you know, in the datacenter.
First of all, it does start with, you know, essentially the Mist
slash Marvis platform, and sowhat do I mean by that?
We've kind of taken, you know,the platform built on Mist and
leveraging like all the MarvisAI stuff and essentially kind of
expanded that throughout theproduct portfolio in Juniper.
(04:05):
So you'll hear about thingsoutside of the data center too
that we're doing AI ops in, youknow, in other areas within our
product portfolio.
But when we look at it from thedata center perspective,
essentially what we've done iskind of take this I like to
think of it like a PAS layer,like, if you think about that,
we've taken kind of like Mistand Marvis not the controller
and the access points, but theAI engine aspect of it and think
(04:28):
about that like we've just kindof like created this PaaS layer
and we've said, okay, you knowwhat are the use cases that are
worthwhile to go solve in thedata center and you know, are
there things here that we canhelp, you know, use what's been
done in Mist and Marvis to dothat, and so you know, we've
found a good handful of them andwe have a bunch more that we're
working on too, all right.
Speaker 1 (04:48):
Well, I mean let's
dive into a little bit.
I mean just thinking about youknow what I've seen of Marvis
and enterprise and Wi-Fi.
Some of the common things aremisconfigurations, right.
Like maybe it's a wrong nativeVLAN on a trunk or mismatched
somewhere else.
Native VLAN on a trunk ormismatch somewhere else Is that
going to apply to here in thedata center?
Let's start to build on top ofthat, like let's take what we
(05:10):
know and kind of go from there.
Yeah.
Speaker 3 (05:12):
I love it.
It does apply and build on topof that.
You know, and I think it'sworthwhile mentioning too, the
approach that we've taken around, kind of like AIOps in the data
center, first and foremost,like it revolves around Apstra,
and kind of like AIOps in thedata center, first and foremost
like it revolves around Apstra.
And you know, without spendinglike a ton of time on it,
there's a couple of reasons forthat.
That's pretty important.
Some folks may be familiar withApstra, you know.
(05:35):
For those that aren't,essentially it's, you know,
juniper's data center fabricmanagement solution.
But it does more than justmanage a bunch of switches
together.
You know, one of the thingsthat it really focuses on is
around understanding theoperational state of the network
.
So we kind of have this conceptof what we call intended state
and actual state of the networkand it's a matter of rectifying
(05:56):
those two.
And this all gets built andkind of put into this graph
database that operates under theHODAS, the engine of Apstra.
And the reason that that'simportant is because we know
things like what does thenetwork topology look like?
Right, we know that from thegraph database, so we can easily
go render that in a UI.
Quite frankly, like I don'tneed AI to tell me what the
(06:16):
topology is, right.
So there's a bunch of thingsthat we've done there and so
when we think about, we thinkabout like cabling mismatch.
We know from the cablingdiagram and basically the
cabling map that AppStore buildsout, we know exactly what the
cabling should be between, likeall the leaps and spines, right,
and we also know what thatshould be going down to the
(06:37):
server as well.
So you know, we can essentiallymap all that out.
So we'll know like, hey, ifthere's like a cabling mismatch,
like Apstra knows that as asingle source of truth, but how
do we convey that back to usersin a meaningful way?
I have a couple of videos.
I grabbed some screenshots andsome things of that.
I can kind of show that mightbe helpful.
But you know, we look at likethat exact use case around like
(07:00):
a cabling mismatch, right, ifyou think about that, you know
one.
If it's like, hey, you're justdeploying the network and
there's a cabling mismatch, likeokay, you know there's really
no operational impact, right,like you might have to get back
on the phone with smart hands oryou know whoever was racking
everything and say, hey, youknow, or maybe you're the one
racking it, right?
Just all kind of depends, right, figure out what's happening,
(07:21):
and so we do things like fetchLDP information just to go
easily tell you hey, we thinkthe cable should be in port
XE002 and it's actually in XE003, right?
Do you want us to eitherreconfigure it or go move the
cable back?
Speaker 2 (07:37):
How are you
expressing the intent of how it
should be cabled?
And then Juniper's you know,abstra is saying, oh well, it's
actually miscabled and that'show it's communicating to you.
Or because the part I'm missingis you're saying you know, hey,
it should be cabled this way.
But are you, as anadministrator, expressing that
intent and then Astra's kind ofkeeping you to it, or how does
(07:57):
that work?
Speaker 3 (07:58):
Yeah, great question,
appreciate that.
So what happens instra?
You can, you know, start tokind of build out your network.
You define what a rack lookslike.
You know we're going to.
There's basically likepre-populated all the network
devices that are supported inthere, whether it's, you know,
juniper or Cisco or Rista right,we support a handful Dell
Sonics, which is all couldselect like, hey, it's a
(08:29):
three-stage Clo fabric or it'sfive-stage.
You can go in there and kind ofEVPN VXLAN fabric or an IP
fabric.
You specify kind of like whatall that looks like and what
will happen is like Apstra builtout the cabling map for you so
you don't have to go and specifyeach particular physical link.
You know Apstra kind of does itand there's a bunch of value in
(08:50):
that.
You know it's repeatability.
The great thing is, like youknow, tim, if you use Apstra to
go build one, you know blueprintin the data center, and then I
turn around, I go use Apstra tobuild the other blueprint in the
data center.
The cool thing is, all theports are in the same way.
It's not like, well, I decidedto put all the Leafs, fine
connections in the middle of theswitch and use port 26 by
(09:11):
random and you use 00 and 01,for example, so it keeps it
consistent.
And so by being the operatorusing App Store, expressing that
intent, yes, app Store turns iton, builds it out, stores it
all in the graph database andthen that's how it keeps track
and it knows.
And the cool thing about that,you can do it before any
hardware shows up, so you canactually go build all that out.
(09:34):
You take the cabling map, youexport that and you go hand it
to somebody, like maybeSmartHands or whoever's doing
the install if it's not yourself, right, and then they're able
to go, you know, kind ofimplement everything and you can
easily, you know, app Storealone will kind of alert you to
it if there's a cabling issue.
Yeah, great, great question.
Hopefully that helps.
Speaker 2 (09:53):
Hey, that was great
man.
That's exactly what I waslooking for.
Speaker 3 (09:55):
Yeah, so you know we
kind of take that issue.
But you know, let's take it astep further.
Let's think about this like asa day two issue.
Right, like, your network's upand running, you know it's
online and active, there's abunch of applications and
services running out of it.
You know, and I don't know,maybe something happens and you
know there's some sort of daytwo change that needs to happen
(10:16):
and you know it could be anumber of things.
And all of a sudden somebodyunplugs a cable, right, and you
know somebody unplugs a cableand they decide, like you know,
to go plug that back in and youknow they misidentified the port
.
Right, they plugged it intoport three instead of port two.
So in this case there's acouple different things that'll
happen.
One, yes, apstra, without anyAIOps stuff, apstra's just going
(10:38):
to tell you that, like, samething applies, right, that cable
evens, right.
But what we're doing with kindof the suite of services that
we're essentially throwing AIops against, is now what we'll
do is we're making likecorrelation to one.
What are the services andapplications that's running on
that switch and out of whatports are they running right?
(10:59):
What physical interfaces, right?
So we can kind of render thatand visualize that and show it
to you.
But then the other one isactually, you know, turning
around and saying, okay, youknow, now, maybe that was a
connection between, like, a leafand a spine, and maybe a BGP
session went down andtraditionally, right, like, all
these messages would getgenerated.
(11:19):
We all know from, like, lookingat, like show log messages, you
know you're going to get abunch of cryptic stuff and
you're going to get some thingsthat actually, like just make
sense, right.
And then you know, well, maybeyou just saw BGP went down and
you don't know that it's aninterface issue.
Yet, right, like how do youkind of separate the signal from
the noise, if you will?
And so that was the first thingthat we've really focused on
around some of the AIOps stuffthat we're doing is we're
(11:40):
essentially running a bunch ofcorrelation.
Against that to say, okay, youknow, these types of events are
grouped together, we know thatthese are all related to one
another.
And go look here, right, golook at the interface, don't
look at BGP.
That's a symptom, that's notactually the cause of the issue
here.
Go look at the cabling that'sbeen, you know, kind of
reconfigured or set updifferently, right, either you
(12:02):
know correct it by fetching LDPand updating the config, or you
know correct it by havingsomebody move the cable back.
Right, you have some optionsthere.
Speaker 1 (12:11):
I love that because
you know we've all troubleshot
something right.
And you're like you said.
You're looking at this sea oflog messages and you spot
something like oh, BGP, oh, Iknow BGP, I can troubleshoot
that.
And then you just grab ontothat for dear life and think
like maybe I can fix thisproblem if I troubleshoot BGP
and then, like you said, it's aninterface down issue.
(12:32):
You're never going to get thatsession back up if your cable's
plugged into the wrong port.
So just to be able to use thisto, hey, no, that's like you
said, that's a symptom, that'snot the problem.
Go look here.
Speaker 3 (12:42):
That's a time saver,
right that was kind of our
target really too.
What we wanted was the simpleuse cases, not because we're
afraid of the complexity, but itwas like I want the things that
are high value and happen often.
I don't want to go after thecorner case that you ran into
three years ago in your datacenter, and it was really hard.
Those are good to solve too,but it's just like man, where
(13:03):
can we make it easier for usersto troubleshoot these things?
And so that's the approach thatwe've taken.
Speaker 2 (13:10):
Yeah, aim for 80% and
you'll get most of your
absolute use cases.
Speaker 3 (13:18):
Yeah, that's
definitely been the goal, so it
makes a ton of sense.
We've done other things, likeif people are familiar with the
Marvis chatbot, I thinkeverybody.
Now Chatbots are going tobecome table stakes, right,
let's just be real, they'regoing to become table stakes.
This kind of goes back to theother thing, though.
Right, I was like the firstthing we should do with our
(13:39):
chatbot is make sure that peopledon't have to go in docs again
again, right, Like, if I can goask a chatbot a question and get
like an answer back with a highdegree of, you know, confidence
in that answer that it'saccurate, and then also, like,
give me some proof behind that,right?
So we've taken the Marvischatbot that people are used to
with Wi-Fi and we've kind oflike expanded that everywhere
(13:59):
you know.
So I'm here talking to youabout data center.
We've we've added in all thedocumentation you know to it, so
now you can just go query thechat bot and ask your questions
and get information back.
But we've added other juniperproducts in there as well, um,
you know, as far asdocumentation goes, and so the
the next step on that, wherewe're going next, is actually
using the chatbot as aconversational interface to pull
(14:21):
state data out of the network,right?
So if you think about thatgraph database that I talked
about, you know we have likethis really powerful source of
single source of truth.
You know what, if now I justwrap this conversational
interface in front of it thatallows you to just ask normal
questions and normal humanlanguage and get back, you know,
meaningful information aboutthe live state of your network
(14:43):
and again, not like on aswitch-by-switch level.
I think that's the other part ofthis.
That is really nice, right?
You could take a chatbot andsay okay, let me go ask on a
switch-by-switch level what'shappening.
That doesn't help, right.
Especially like in modern datacenters.
You know the scale-out's prettybig.
Speaker 1 (15:00):
Yeah, absolutely.
So we've kind of talked aboutthe physical interface, the
cabling and stuff like that, butnow let's go up a layer right,
Like can it help us troubleshootrouting protocol issues?
Speaker 3 (15:13):
Yeah, it can, like
we'll call out things like BGP
mismatch, so for example, likeif the peer ASN is different
than what you have configured,right, like we'll call that out.
But we won't just call that outlike again in a message or you
have to go dig through.
We actually use, like thevirtual network assistant to
kind of show you what that lookslike.
Speaker 2 (15:30):
AI apps is great if
you you know, but if you don't
know what to chat for or what tolook for, like it's almost like
a.
It doesn't help you that much.
So how does apps just surfacethese problems to you as the?
Speaker 3 (15:43):
Yeah.
So again, you know greatquestion we're using essentially
it's called the Marvis VirtualNetwork Assistant and we have
the Marvis Virtual NetworkAssistant for data center.
So again for the Wi-Fi folks,they're going to be pretty
familiar with it.
It kind of gives a little bitof like this octopus layout view
and tells you, like you know,here's the Marvis actions to go
(16:04):
take, and what it does is itkind of summarizes different
information under categories,right.
And what we've done on the datacenter side is we've said, okay
, you know the differentcategories, like layer one and
two.
You know we kind of bundlethings underneath that
connectivity device stuff.
You know things that's going onaround like traffic capacities,
so we might be looking at likehot and cold interfaces.
We'll surface this and kind oflike this easy to consume Some
(16:30):
places.
We'll surface this and kind oflike this easy to consume.
Some folks call it a coffee cupview and I kind of like that
because it's like you know,abstracting away a lot of like
the low level detail, justproviding it to you to really
like high level for you to.
You know, you come into themorning, you know you're a knock
or something, right, maybe youturn around, you just take a
look at this, right, whileyou're sipping your coffee, you
kind of take a look at it andyou say, hey, is there something
here that I need to payattention to.
Maybe you go look at it and yousay, all right, man, this
(16:52):
device has an issue and it'striggering an environmental
alarm with a power supply.
So we're going to call that tothe forefront, right, and you
might look at it and you go, ohman, that's no big deal, that's
a new rack getting installed.
We put that switch in lastnight.
All right, I'll get somebody tolook at that, but not really
impacting.
Now we then take that like astep further.
So the other thing that we'vedone and to me this one's like
(17:17):
pretty important We've taken astep further with what we call
service awareness, and whatservice awareness does is it's
actually like mapping theservices and applications
running in your data center tothe infrastructure and the
resources that it is dependentupon.
So what do I mean by that?
You know, I kind of think backto my days as a network engineer
(17:37):
and doing net ops, and it wouldbe like some application issue.
Right, you're running a bigenough company.
There's always something goingon.
You know, some applicationissue going on in an enterprise
and the application is doing adatabase query and it's slow.
Database teams, like databaselooks fine.
You know everybody's like mustbe a network issue, right, like
(17:58):
it said slow, so it's got to bea network problem.
You know.
So it's like you know there's ap1 going on because I don't know
.
You know like maybe it's a, abooking engine for a hotel,
right, like that's a pretty bigimpact, you know.
So you're like all right, youknow like we got to get this
fixed and so you know you're thenetwork guy, you get called,
you jump on the P1, you're likeyou know what application is it?
Oh, it's okay, well, it's.
(18:19):
You know it's the lodgingbooking app.
Oh, okay, server, that's used.
Well, you know there's thisload balancer, vip, and it's
actually okay.
Well, which server is itactually using right now, though
, for that one query?
You know you kind of spend allthis time like back and forth,
right.
Then you're like all right, letme, let me go look at an art
table and go figure out whatthat server's connected to.
And then you know, oh, I seeit's.
You know, on leaf 6, port 14,right, and so you're looking at
(18:43):
that and then you realize like,oh, actually port 14 is part of
a lag.
You know, I'm kind of going onand on about this, but you know,
I think anybody like a NetOpshat, right, you know you're
familiar with this, right,you've kind of gone down this
track.
And then you know you go, lookat everything and you're like
well, you know, I looked at both.
Like you know, I looked at theBGP table, I looked at the type
(19:05):
twos, like I'm good, you knoweverything looks fine there.
And so you know they're likehey, you know you've been quiet
on this P1.
You know, hey, network engineer, you know how are things
looking.
And you're like well, I checkedthe EVPN table and I see the
type two routes.
Right, like everybody else onthe call is like I don't know
what this guy's talking about.
Right, it's utterly meaninglessto the rest of your
(19:25):
organization.
And what you really need to beable to do is how do we get to
mean time to resolution?
I actually think mean time toinnocence is such a in the
networking world.
Nobody, what do you do?
You hang up the phone andyou're like good luck
troubleshooting this.
The network's fine, right,you're not helping your employer
, right?
So to me this is all about?
(19:46):
How do I make the networkperson give them?
You're still the networkengineer, you're not the
application guy, you're not thedatabase, you know.
You're not the DBA, right?
No, but how do we empower youto be able to talk back to your
organization in ways that aremeaningful?
And so, with service awareness,what we've done is we're
actually turning around andtaking like flow data plus the
(20:06):
topology and we map out so we'llshow you like you know, hey, I
have, uh, this server isconnected to this leaf and out
of that server we have likethese services that are on that
are they're running out of it.
So what I mean by that like sqlservice is is turning around
and running out of server serverone, but it's also running out
of like server four and serverseven and server eight, you know
(20:28):
out of these different ESXicluster, because, again, data
center, everything's distributedright, and so you know.
You say, like, well, theseservices are running out of here
, and so I see where this isgoing.
So you don't have to ask thosequestions, right, you
automatically know already.
And even if that's in aparticular VRF or a routing zone
, again, like with Apster andApster cloud services is an
(20:49):
extension.
It's just a another way thatwe're delivering.
You know, services around this.
It tells you all of that, itdoes all the discovery for you.
You know all of that is isessentially built in, right.
You're not having to go do this.
So now you can map thoseservices that make up your
critical application or anyapplication, quite frankly to it
(21:10):
.
And the other thing that we'vedone to kind of like bring to
surface, you know what we thinkis important back to the user in
a way that we can build thatinto a product right that we
think.
So we do things like we bundletraffic together.
So when you first go look atthis, right, we're going to show
you like, within a given timeframe and it could be a
15-minute increment, it could bean hour, it could be a day
(21:30):
we're going to go show you hey,here's some chunks of data.
So these 14 services make up700 gigabytes of data that was
transferred in the last hour,for example.
And then there's 27 otherservices, but they made up I
don't know two gigabytes of data.
So we kind of show that to you,like in these kind of different
blocks.
(21:51):
But the idea is like draw yourattention to the 700 gigabytes
of data, because more thanlikely either A one that's the
one that you could be ofinterest in or B the inverse
could be true.
It could be that there's nodata because there's a network
issue and nobody can access theapplication.
But either way, right, like,you can start to kind of like
divide these things and start.
Everything that we've done iskind of like and I think there's
(22:15):
an important approach, right,you got to kind of like abstract
things out a little bit tofigure out, like how do I give a
high level view first beforepeople dive deep, Right, cause
if you dive deep too quick, youknow you kind of put yourself
down a rabbit hole and you mightbe in the complete wrong
direction than where you need tobe.
Speaker 1 (22:33):
Well, you know, I
think this is great because,
when I think back to when Iowned a network, I had all of
these tools that checked a boxright, like it was a requirement
to have a log collector and arequirement to have this and a
requirement to have that.
I didn't use it and even whenit came to troubleshooting I
forgot that I had it at mydisposal.
And even if I did use it, itwas just so full of information
(22:55):
I couldn't make anything of it.
It just caused more confusionand heartache than actually
helped me.
So to have something that canlook at and examine this data
and make it useful and provideme some I don't know decisions
or some really usefulinformation, rather than than
cloudiness, I think this isgreat man.
Speaker 3 (23:14):
I know that pain all
too well.
I'm right there with you.
Like I remember, you know,working in a fairly large
enterprise.
I mean, we had 9,000 switchesand routers in our environment.
Right, it was pretty, prettydecent size, four data centers,
58 different campuses and, andyou know, the the security team
brought in like a new seamsolution.
(23:36):
And you know, it was like youknow, hey, we need you to send,
you know, syslog messages fromall your devices to this thing.
Right, it was like, okay, youknow, what do you want me to
send?
They're like any, any, and I'mlike I'm not saying any.
Like, know, so we kind of hadthis.
Well, why not?
Right, like why are you tryingto keep stuff from us?
No, I'm not, my network is notgoing to be the top talker,
right, like, my switches are notgoing to be the number one
thing using bandwidth on thisnetwork, all the servers,
(23:58):
everybody.
And it was like they wouldn'tlet me get access to it.
Our team can't troubleshoot.
Like I don't, I just want readaccess to be able to.
And they're like no, no, no,like that's a security risk,
okay.
Speaker 2 (24:19):
You know, so yeah.
Speaker 3 (24:21):
I feel the pain.
Speaker 2 (24:22):
Yeah, I think the
number one thing that AI ops at
any level can do for networkteams is to is that correlation
piece to be able to drawinsights and surface them in a
in a meaningful way, because weall have access to thousands and
thousands of data sources.
We all got solar winds or someversion of net flow that's
(24:43):
running and we've all got theSIM.
You know, maybe not the the sim, but like syslog and you know
all the box checked, like likeaj said, but when it comes down
to it and there's a p1 andeverybody's breathing down your
neck, you don't have time tocorrelate 40 different data
streams and try to figure outwhere they correlate right yeah,
100.
Speaker 3 (25:04):
And you know the
other thing that I think about
too, like in that, in that sameaspect of it, you know, you're
absolutely right, like you'regetting all this data, you know.
The other thing is, I mean, youend up with people that, like
spend a ton of time configuringand setting up the tools because
, like and don't get me wronglike the intent is there and
(25:26):
it's like, well, you know, know,but what if a user wants to do
this?
And what if a user wants to dothat?
Well, let's give them everyoption so they can configure
this thing however they like.
Right, and a lot of times, likeI don't, I don't want to
configure it however I want, Idon't want to become an expert
in this one tool, right, like, Ijust want it to set up and work
and give me value.
(25:46):
And and that's the other thingthat I really like about the
kind of like the idea behindAIOps as well, right, it's like
if I have a whole set of dataand I could just kind of like
point something at it and giveme meaning back, right, and I
mean that's a gross oversimplification of what's
happening, but if I can go pointsomething at it and give me
meaning back right, like if Ican go leverage a at it.
(26:07):
And give me meaning back right,like if I can go leverage a rag
and you know understand, likehow to go do that.
And now, like I just turnaround and use the rag to go do
that, right, you know, it's tome like that's the part that
gets me excited is like, oh wait, I don't have to.
You know, yeah, here's a toolthat we have for you, but by the
, you know, here's a 1400 pagebook on all the different
(26:28):
configuration options.
It'll take you a month to gofigure out what it's capable of
and another three months to setit up, right?
Speaker 2 (26:33):
Like that or you got
to get a vendor to set it up for
you.
Who's, who's the whole solebusiness is going to set up this
Sure, this application.
Speaker 3 (26:43):
Absolutely, yeah, no,
absolutely.
And I think like that's thepart that kind of gets me
excited.
And then even better, right,like the technology.
(27:04):
Just I mean, as we've beenworking on this in Juniper,
we've watched like the paradigmshift in the advancements in.
You hear claims.
It's like well, let's goactually kind of validate that.
But maybe in the networkingspace there's all these kind of
tools and interesting thingsthat's been happening everywhere
else in the infrastructurestack there's tools that the
server teams have had for quitesome time.
Let's be, honest, we've beenpretty slow to adopt these
(27:27):
things in the network realm andwe're, at like, the source of
all the actual information,right Like we could actually see
packet-level data right.
Speaker 1 (27:36):
It's like what are we
doing?
Speaker 3 (27:38):
So yeah, I'm pretty,
you know, healthy skepticism, as
I said, but pretty optimistic.
I've seen proof in the puddingso far and you know I like what
I've been seeing around AIOpsand I think there's some real
value there.
Speaker 1 (27:52):
Yeah, Sean, I'm a
longtime VMware, Windows and
Windows guy, Like I didinfrastructure for a really long
time.
So to kind of put an analogy onthis, a long time ago on early
versions of Windows you wouldhave to deploy things like
Active Directory or any Windowsservice in a very manual fashion
.
But as new versions of Windowscome out, there's wizards where
(28:15):
you click a few boxes and put alittle bit of information and
the wizard does it for you.
You know, like in earlyversions of VMware, if you
wanted to deploy VMware vCenter,you had to install Windows
Server, install vCenter, set upthe database server and then
later versions, you put in someinformation and it deploys it
all for you and it uses the bestpractices and security
(28:36):
standards and all this otherstuff.
So why do I want to go hand jama bunch of that standards and
best practices and stuff on abunch of switches in a data
center when I can put in somebasic information about what I
want my network to look like andhave something else go do that
for me?
It saves me a ton of time.
It lets me rise up my skillsand focus on other things that
AI ops can't handle.
(28:57):
That I can handle.
That I should be handling.
I think this is long overdue,right, Like you were saying.
Speaker 3 (29:04):
Yeah, agreed, and
it's a great analogy and rings
know so, true, right, I meanit's just, um, yeah, agreed,
definitely, definitely, ringstrue for sure.
And and you know, the the nextkind of phase of this, when we
look at it too, right, and Ithink, like tim, you mentioned
something a little bit earlierthat kind of made me think about
it.
Right, you know, we're alsogoing to get to this point, like
(29:27):
, and we're actually fairlyclose because we can get to it
pretty quick but getting to thepoint of, like, you know,
predictive analysis around stuff.
So let me bring up kind of anexample, one that's there today.
You know, we have this piecethat's then called like so I
talked about service awareness.
So if you kind of grab like themental model and apologies, I
(29:49):
can't share my screen, but ifyou grab like the mental model
around, you know, okay, so Ihave a visual topology that maps
ports and protocols, you know,and the services in the data
center, you know.
And oh, by the way, we also show, like, all the clients that are
connecting to it as well.
Again, we aggregate that, right, who wants to see, you know?
(30:13):
Hey, there's 30,000 clientsconnecting to the service, like
in the data center, that's not.
You don't really need to drilldown into one.
We give you the option to dothat.
But but you're more interested,you know, in kind of like hey,
there's 30,000 clients connectedto the service, that's a great
thing, right.
Like that's good.
Hey, there should be 30,000 andthere's only one, that's a
problem, right, and so one.
We surface that right away inservice awareness.
(30:35):
But then you get to impactanalysis and what we've done
there is we're actually turningaround and we're taking those
anomalies that occur that Italked about from Appster, right
, we're like the cablingmismatch, we call that an
anomaly.
And we have a bunch of othersBGP mismatch, there's a bunch of
predefined probes in App Storeitself that we turn around and
highlight these anomalies on andwith impact analysis, we send
(30:58):
these anomalies basically fromyour App Store cluster.
They're able to get sent intoApp Store cloud services, which
is where all this stuff lives.
This again, is that PAS layer.
And so now, you know, now we'reable to say like, okay, you,
you have this service and theseapplications running and then,
and then you have this event orthis anomaly that happened on
your network.
(31:19):
And then we do, we turn aroundand we say all right, like you
know, because of these events,one, we're going to group them
together where there togetherwhere there's already
correlation, like we're doingthat correlation for you, right?
So we're going to group thosetogether.
And then, two, we also turnaround and we tell you hey,
these are the services that thatcould go impact.
Case in point let's just saythat you have a fairly large
(31:41):
data center and you come in andone of the power supplies is
down on a leaf in a rack andyou're like, oh well, if I don't
fix it, it's got dual powersupplies, so I'm okay.
Maybe the other power supply isrunning a little hot right now
but, all right, I'm all right,and you're like man, do I need
to go get that power supplyreplaced immediately or have
(32:04):
somebody reseat the power cable,whatever?
Do I need to get eyes on thatright away?
And what we're able to show youone is again mapping those
services.
So power supply has an issue.
We'll turn around and tell youthese are the services that
could be affected if you don'tdo something about this.
We're not saying that they'reimpacted right now, but we're
saying that if something doesn'thappen, they could be impacted.
(32:26):
So you can easily draw thedistinction between a switch or
a leaf that there's nothingconnected to it, there's no
services running on it.
Or like maybe there's serversconnected to it.
Right, there's physical serversconnected, but there's no
active services being ran onthat, for whatever reason, who
knows?
Right, okay, I don't need toprioritize that one, but but
maybe I have four leafs in mydata center where this is the
(32:47):
case.
I'm going to go prioritize theone that I do have active
services running on.
The other one I bring up likearming and switch, it happens.
Right, we'd love to say it.
Like switch was never fail,arming a switch, or or even
differently, right, like I needto.
I need to swap some switches.
Maybe you're you're doing like ageneration update of your
fabric.
You know, inside the datacenter, you know, and you have
to go to a change board and youhave to go coordinate with
(33:09):
another team to go say, hey, Ineed to take this switch offline
, can you vMotion some serversaway from the switch?
We automatically map all thatstuff out for you so you know
right away where all thesethings are connected, what
services are going to beimpacted.
You'll know, oh, okay, again,hey, it's the SQL service that's
(33:29):
running and it's running in,like you know, vrf.
You know, for this VRF, for thispoint of sale system, you know
I have in its own virtualnetwork and it's in VRF.
And so you'll know, like, okay,it's the SQL service that's
running in that VRF.
Let me go and talk to that team, you know, because I need to go
kind of map that out.
So it helps with capacityplanning, maintenance actions,
(33:50):
like all those types of things.
Again, you know they're notalways like the sexiest thing to
go and talk about, you know,make a big marketing splash
about, but like these areimpactful for everyday type of
stuff, right To people that aredoing the job.
Speaker 1 (34:02):
Yeah, I mean, how
many times have I had to like
temporarily move a service orwhatever and forgot it was there
and did?
Speaker 3 (34:11):
Yeah, and other
things like just you know, in
App Store too, right, becauseit's just an extension of it.
Like we have things like drainmode so you can go put the
switch in drain mode and whenyou do that, right, it's going
to basically drain all thetraffic off that switch.
You don't have to go in thereand figure out how to manually
configure it and change PHPvalues and all those types of
things and manipulate the NLRI.
(34:32):
It'll turn around and do thatfor you to drain all the traffic
off of it.
And then we'll actually alertyou if there's, if there's
traffic over like a certainthreshold.
It's configurable, but there'slike a default threshold.
So it'd be like, hey, ifthere's over, like you know, a
megabit per second of traffic onthe switch as a as a whole,
right, like it'll generate ananomaly and it only does it when
it's in drain mode by the way.
(34:53):
So it's like that switches indrain road and you're getting
more than one bag of it persecond of traffic on there Like
there's.
You know there's somethinggoing on.
Speaker 1 (34:59):
Yeah, right.
Speaker 3 (34:59):
Yeah.
Speaker 2 (35:01):
That makes sense.
Speaker 3 (35:02):
Yeah, so, so there's
some cool stuff like that.
And then, and then you knowwhere this is going to, right,
you can, you know predictabletype of analysis that's very
specific to your environment,right, so you know, these are
the things to me that's alwaysimportant as well, right, like
(35:23):
we might try and say like hey,these interfaces on the switch
is cold, right, but if you goback and you think about, like
whatever your type of businessis, maybe your type of you know
operation, that, like Mondaythrough Friday is when all the
traffic is, and the weekendsnobody's using your data center,
that much so, rather thangetting alerts that the
interfaces are cold on aSaturday, now it could be like
(35:46):
all right, well, look, we knowfor your environment because
we've turned around.
And essentially, I remember,like trending analysis isn't new
, that's not a new thing.
There's all sorts of tools thatcould do that.
I remember having to do that and, like you had to baseline it
for at least 30 days and usually60, because I had to get like
the full cycle of whatever your,whatever your normal business
operation was in to do a eveneven a minor trend, right, yeah,
(36:08):
now, like with ML, right, youcan go run that calculation that
was being ran before muchfaster.
Right, and that's the cool partabout it.
Like you, can do this muchfaster and get value out of it a
lot quicker.
You don't have to be like, well, buy this tool and in six
months we'll know what yourbaseline looks like in your
environment.
Like, no, you, you can do thisa lot faster.
Awesome this is.
Speaker 1 (36:28):
This is uh, I'm
surprised that that I I'm glad I
brought a healthy dose ofskepticism, but you're winning
me over.
Speaker 3 (36:47):
Well, that's good.
And, like I said, look, theskepticism is good.
We have different things to.
People can go take a look atthis.
For App Store customers, it'sreally easy to use.
For Marvis customers, therealso more integration between
Marvis and what we're doing onthe data center too.
It has the same look and feel,you know.
So that part is really nice.
And then, you know, for folksthat aren't a Marvis customer,
not an App Store customer, youknow, look there, there's
(37:10):
marketing videos and demo videosout there.
But of course, you know,anybody at Juniper will be happy
to happy to walk you through itand you can always reach out to
me as well, um, you know.
So, yeah, the skepticism againis, um, like I said, I think I
think healthy and people shouldshould, you know, approach it
and ask questions and beinginquisitive and, and you know,
(37:32):
um, ask to see it right, like,don't from any vendor, like from
anybody, us included.
Right, I agree, you know,vendors should be earning your
business every day.
Yeah, I'm a big believer ofthat.
Speaker 1 (37:43):
And you know two
important points here.
So one is we're talking aboutApstra, and that's not just
Juniper, that's vendor agnosticthat works with.
So everything that you'vetalked about here tonight works.
Speaker 3 (37:54):
That's a good point.
Speaker 1 (37:57):
Works with.
You've talked about heretonight works.
Speaker 3 (37:58):
Good point.
Works with with any switchright go well, well anything
there's, there's, there'sswitches that we qualify yeah,
yeah, yeah, any switch qualifiedby abstra um so yes you know,
obviously we, we, you know,first and foremost qualify all
the juniper qfxs and juniperdevices first.
But of course, yes, uh, youknow, cisco devices, arista
devices, um, you know, there wesupport Sonic as well.
Speaker 1 (38:18):
And then I feel like
whenever you start to talk about
AI, there's always somebody inthe group that rolls their eyes
and like AI is going to take ourjobs, and I don't hear that in
this conversation.
This sounds like the coworker Iwish I had not somebody that's
going to replace me, wish I hadnot somebody that's going to
replace me.
Speaker 3 (38:37):
Well, you know the
way I look at it and I I can't
remember, you know, I I wish Iremembered exactly who said it,
because I'd want to give themappropriate credit, but it was
like.
It was like AI is not going totake our job.
Somebody that uses AI is goingto take your job.
Yeah, and and, and you know thedifferences, right, like I mean
, I use different AI tools allthe time in my day to day, and
(39:04):
it took me, like I was slow.
I was slow to do it.
To be honest with you, it tookme a little while to be like,
okay, you know I need to changemy, my way of thinking around
this, right, but it's like itjust allows you to be more
efficient, you know.
And so, really, it's like youknow that same approach here,
right, like it allows you, youknow, to be more efficient.
Like that's the goal.
So, again, it's not, you know,like you're not going to be able
(39:24):
to.
Like AI is not, again, like themagical wave of wand.
Let me just throw AI atsomething and we'll make it all,
you know, make it all better.
You know you need good data,right, the data.
Like data in data out, stillapplies.
You can't get rid of that, ifyou think about it and I'm going
to oversimplify it quite a bit,but it's a lot of math.
(39:47):
It's a ton and ton of math andmath transactions.
It still needs good data to runthat math against you can't.
This is why, this is why youknow, again like kind of tooting
the juniper horn a little bit.
But this is why, like in theApstra framework, because we
have that graph database, thisis why we went that route in the
data center, right Was like wehave the data that we're
(40:09):
collecting and we also collectall this telemetry data, right,
like already.
So now it's like I'm gonnathrow ai at that, like I don't,
I don't need to.
You know I don't, since I knowthe state.
I already know all these things, I know what it should look
like.
You know, like now like let mego help, you know, solve some
kind of some some kind ofoperational challenges here, um,
so yeah, it's a.
(40:30):
It's a much different kind ofapproach, I would say, than you
know people are probably used to.
Speaker 1 (40:35):
I mean 100%.
If you're responsible forupdating documentation and you
have to run into a problem andyou pull out your network
drawings and you go God, thesenetwork drawings suck, they're
not up to date.
Well, like you said, good datain, good data out.
If you don't update yournetwork drawings, you can't use
them to troubleshoot Now withthe same concept you know it's a
(40:56):
live network map that justupdates itself as changes are
made and when problems happenit's correct data right there
100.
Speaker 3 (41:04):
I've been, uh, I've
been sipping on my water cup,
but I um have my networkengineering stick awesome.
Speaker 1 (41:13):
Well, we appreciate
that.
Uh, sean, this has been a funconversation is.
Is there anything else you wantto add before we put a bow on
it?
Speaker 3 (41:31):
in the Juniper space.
You know if you, if you havefurther questions, reach out and
if you also just have questions, you know, if you're just
interested.
I mean, at the end of the day,like I'm a network nerd at heart
so I don't need to come aroundand, you know, give everyone a
product pitch around stuff.
People are just interested inkind of like what we're you know
, like just our experience andkind of what we've done.
You know around this, you know,feel free to reach out as well.
(41:53):
So happy to share.
Speaker 1 (41:55):
Awesome.
Well, if you want to learn more, you can go to junipernet
forward slash A1.
And he is at Sean LV on X slashTwitter whatever we're calling
it these days so if you have anyfollow up questions, you can
certainly reach out to him thereor leave a comment wherever you
found the show.
Sean, thank you so much forjoining us.
I really appreciate it.
And Tim, thanks for co-hostingtonight.
Speaker 2 (42:16):
Yeah, that's been
great man.
Thanks for dropping by Awesome.
Speaker 1 (42:18):
Yeah, thank you,
gentlemen.
Appreciate it.
Great conversation, absolutely,and we'll see you next time on
another episode of the Art ofNetwork Engineering podcast.
Hey everyone, this is AJ.
If you like what you heardtoday, then make sure you
subscribe to our podcast andyour favorite podcatcher, smash
(42:41):
that bell icon to get notifiedof all of our future episodes.
Also, follow us on Twitter andInstagram.
We are at Art of NetEng, that'sArt of N-E-T-E-N-G.
You can also find us on the webat artofnetworkengineeringcom,
where we post all of our shownotes.
You can read blog articles fromthe co-hosts and guests and
(43:02):
also a lot more news and infofrom the networking world.
Thanks for listening.
We'll see you next time.