Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Speaker 1 (00:00):
Welcome back to the deep dive. We are really getting
into the weeds today. Oh yeah, yeah, we're tackling vxland
BGP EVPN fabrics. I mean, this is the stuff running
modern data centers, the real backbone.
Speaker 2 (00:12):
It definitely is. And you know, it's a huge shift
from how things used to be done totally.
Speaker 1 (00:17):
We're talking about moving away from those older kind of
clunky three tier designs.
Speaker 2 (00:22):
Right exactly. This is the fast track guide basically to
understanding that jump to well high speed, really scalable software
defined networks.
Speaker 1 (00:30):
So what problem does VXLANT actually solve? Why do we
need this shift?
Speaker 2 (00:34):
Well, the old way just hit a wall. Traditional campus designs,
heavy on layer two spanning tree, they just couldn't scale,
not for the kind of density and speed modern applications demand.
Speaker 1 (00:46):
Yeah, you think about virtualization, multi gig traffic everywhere. The
old models just choked, didn't.
Speaker 2 (00:50):
They They really did. You need a predictable performance, active
paths everywhere, not just stand by links, and that's what
this fabric is built for.
Speaker 1 (00:58):
Okay, So let's unpack this, starting right the bottom, the
physical setup, the spine and leaf architecture.
Speaker 2 (01:03):
Right. The keyword here is symmetry. Every leaf switch connects
to every spine switch.
Speaker 1 (01:10):
Bull mesh between layers basically exactly.
Speaker 2 (01:12):
And what that do you is a super predictable traffic pattern.
Any device connected to a leaf is always just two
hawks away from any other device source leaf.
Speaker 1 (01:21):
To a spine to the destination leave always always.
Speaker 2 (01:25):
That predictability is golden for performance.
Speaker 1 (01:27):
Makes sense, Yeah, So let's define those roles a bit
more clearly. The spine layer it connects the leaves, aggregates traffic.
Speaker 2 (01:35):
It connects the leaves, yes, but it's doing much more
than just layer two aggregation. Now, in these layer three fabrics,
the spines are pure routers. They are the high speed
core and crucially they act as the BGP EVPN route reflectors.
Speaker 1 (01:49):
AH, reflecting the roots down to the.
Speaker 2 (01:51):
Leaves precisely, and they often handle multicasts two acting as
rendezvous points or oreps for the underlay network and.
Speaker 1 (01:57):
The leaf layer. This is where things really change. The
traditional core functions move down here.
Speaker 2 (02:02):
That's the fundamental shift. The leaves are where your servers,
your device is connect but they're also making the main
routing decisions. Now. They are the layer three cores distributed
across the whole fabric.
Speaker 1 (02:14):
So Instead of one big core, you have lots of
smaller active.
Speaker 2 (02:17):
Course exactly, all working together.
Speaker 1 (02:19):
Okay, quick question. Then, if the spines are central for
commutation like those route reflectors, what happens if one fails?
Doesn't that break things?
Speaker 2 (02:30):
That's where the design shines. You always have at least
two spines, since every leaf connects to all of them.
If one spine goes down, traffic just instantly moves over
the links to the other active spine or spines.
Speaker 1 (02:42):
Okay, built in redundancy totally.
Speaker 2 (02:44):
Built in, and you get redundancy at the leaf level
too for servers connecting to multiple leaves using things like VPC.
The physical resilience is just inherent.
Speaker 1 (02:52):
Okay, physical structure makes sense. Now, This is where my
brain starts to hurt a little. The underlay and the overlay.
Speaker 2 (02:58):
Ha. Yeah, let's use that roller coaster analogy. It actually
works pretty well.
Speaker 1 (03:02):
Okay, hit me with it.
Speaker 2 (03:03):
So the underlay, think of it as the physical roller
coaster track, the motors, the breaks. It's the foundation the
physical links between spines and leaves exactly. Its only job
is basic IP reachability, making sure all the key interfaces
on the leaves and spines can talk to each other.
It usually runs a simple routing protocol like OSPF or
(03:24):
isis just to.
Speaker 1 (03:25):
Build that basic connectivity.
Speaker 2 (03:26):
Map, right, And because of that spine and leaf connection pattern,
we get to use equal cost multipath routing ECMP. All
those links between a leaf and the multiple spines, they're
all active, all forwarding traffic at the same time.
Speaker 1 (03:39):
So it's like layer three link aggregation, super fast.
Speaker 2 (03:42):
Super fast, and super resilient. That's the underlay, the solid
fast foundation.
Speaker 1 (03:46):
Okay, foundation lay. Now the overlay the roller coaster cars
and the riders.
Speaker 2 (03:51):
That's it. This is where the vxcellent magic happens. The
overlay runs on top of that physical underlay. Vxcel An
takes your normal layer two frame like an Ethernet frame
from a server, and wraps it up inside a layer
three UDP packet. It tunnels L two over L three.
Speaker 1 (04:05):
Okay, and it uses BGP EVPN to manage all this exactly.
Speaker 2 (04:12):
EVPN is the control plane for this overlay. But the
real game changer vx land brings is multi tenancy AH.
Speaker 1 (04:20):
Separating different customers or departments. So in the analogy, each
roller coaster car is a tenant like a VRF.
Speaker 2 (04:28):
Perfect Each car is a VRS a separate routing world,
and the riders in that car are the vlands belonging
to that tenant.
Speaker 1 (04:36):
And riders in different cars can't just talk to each other.
Speaker 2 (04:38):
Nope, completely isolated by default unless you specifically build bridges
can figure route leaking between them.
Speaker 1 (04:45):
Okay, that isolation is huge now practical point. Yeah that wrapping,
that encapsulation adds overhead, right, yeah.
Speaker 2 (04:51):
Big time, about fifty bytes or so. So so you
absolutely must enable jembo frames on all those underlay links.
And to you nine thousand maybe.
Speaker 1 (04:58):
Higher, right, because if you don't, what happens.
Speaker 2 (05:01):
Fragmentation city small pings might work, but try moving a
real file. Packets get chopped up performance tanks. It's like
the number one deployment.
Speaker 1 (05:07):
Gotcha, good tip, don't forget the mtumm. Okay, So moving
up to the control plane. This is really the heart
of EVPN, isn't it. Shifting from data plane learning.
Speaker 2 (05:16):
Like flooding and praying with spanning tree.
Speaker 1 (05:18):
Yeah, that mess, shifting that intelligence to the control plane
with BGPEVPN. What's the big win there? Operationally?
Speaker 2 (05:26):
Oh, operationally it's night and day. Think about troubleshooting layer
two loops before it.
Speaker 1 (05:30):
Was awful, tell me about it.
Speaker 2 (05:32):
With EVPN, the local leaf sees m address, learns its
IP and bang, it advertises that MSEP pair as a
BGP route to.
Speaker 1 (05:41):
The route reflectors the spines right.
Speaker 2 (05:43):
The spines reflected to other leaves that need it. No
more network wide flooding to learn max CS.
Speaker 1 (05:48):
So finding where a device is becomes a routing lookup,
not a frantic MP table search across dozens.
Speaker 2 (05:54):
Of switches exactly. Troubleshooting L two problems becomes essentially L
three troubleshooting, much much easier.
Speaker 1 (06:01):
Okay, so we've killed most broadcast issues, but what about
necessary evils like you know, ARP broadcast, unknown unicast, multicast
the bum traffic.
Speaker 2 (06:10):
Right, you still need some way to handle that. VX
lane uses multicast in the underlay for this, Basically bum
traffic for a specific VX line segment of V and
I gets mapped to a specific multicast group in the underlay, and.
Speaker 1 (06:22):
The spines act as the rps for those multicast groups.
Speaker 2 (06:26):
Usually, yeah, you'd configure the spines as the rendezvous points,
and you'd want redundancy there too, using something like any
cast RP. So both spines can handle it actively makes sense.
Speaker 1 (06:36):
Let's nail down a couple of interface terms, NVE and VTT.
They sound similar, they work together.
Speaker 2 (06:42):
The NVE Network Virtual Interface is kind of the logical
engine on the leaf switch that does the actual vx
land encapsulation and decapsulation.
Speaker 1 (06:51):
Thing, doing the wrapping and n wrapping right.
Speaker 2 (06:53):
And the VTT Virtual Tunnel end point is the IP
address used as the source and destination for those vxland tunnels.
Speaker 1 (07:00):
And that VTTPIP. It's usually a special loopback interface.
Speaker 2 (07:04):
Always typically loop back one, and this is important. It's
different from the loopback you might use for the underlay
BGP peering, which is often loopback zero.
Speaker 1 (07:12):
Why the separation why two loopbacks stability?
Speaker 2 (07:16):
Mainly loopback zero is for the underlay riding protocol itself
rock solid reachability loop back one. The VTT address is
what the overlay tunnels use. Keeping them separate means if
something weird happens with your BGP peering, it doesn't necessarily
break your established vx land tunnels. It isolates the planes.
Speaker 1 (07:34):
H good design practice. Okay, and we mentioned the spines
are rote reflekers. Why is that so vital for scaling?
Speaker 2 (07:40):
Imagine if they weren't, every leaf switch would need a
direct BGP peering with every other leaf switch.
Speaker 1 (07:46):
The full mesh Nightmare.
Speaker 2 (07:48):
Total nightmare ten leaves forty five BGP sessions, twenty leaves,
one hundred and ninety sessions with route reflectors on the spines.
You add a new leaf, it just peers with the
say two spines, two new sessions.
Speaker 1 (07:59):
That's it. That makes adding capacity way.
Speaker 2 (08:01):
Way easier, massively easier. It's built for scale.
Speaker 1 (08:04):
All right. Let's dig into that multi tenancy piece more
separation and routing between tenants. You mentioned. Roade targets are ts.
Speaker 2 (08:09):
Yeah. Root targets are basically tags we attached to the
EVPN routes. Think of them like labels. They usually look
like as number, dot, I D maybe six five five
five zero one point one zero zero one or something.
Speaker 1 (08:20):
And these tags control who gets witch routes exactly.
Speaker 2 (08:23):
Each VRF each tenant has specific root targets. It exports
tags its routes with and imports accepts routes tagged with.
It's the BGP way of controlling visibility between virtual networks.
Speaker 1 (08:34):
Okay, and we have two kinds of v and ies involved,
L two v and I and L three V and
I back to the roller custom let's do it.
Speaker 2 (08:39):
The L two V and I. Think of that as
the identifier for a specific group of riders within one car.
It maps directly to a traditional VLAN So.
Speaker 1 (08:48):
VLAN ten might become L two v and I one
zero ten something like that.
Speaker 2 (08:51):
Yeah, it carries that layer two traffic across the fabric
and the big plus you can have over sixteen million
v and ees, way past the old four thousand VLAN.
Speaker 1 (09:01):
Limit, huge scale increase. Okay, So that's L two V
and I for L two traffic within a tenant. What
about the L three V and I.
Speaker 2 (09:07):
The L three V and I is different. It's not
for carrying VLAN traffic directly. It's the dedicated routing interface
for the entire VRF, the whole tenant car.
Speaker 1 (09:15):
Okay, so what's its job?
Speaker 2 (09:16):
Its only job is Layer three routing interviewland routing for
devices within that tenant, especially when they live on different
leaf switches.
Speaker 1 (09:23):
Ah. So if a server on leaf one in VLAN
ten needs to talk to a server on Leaf five
in VLAN twenty, but they're in the same tenet VRF.
Speaker 2 (09:31):
The traffic goes from the source server to its leaf.
Leaf one gets routed using the shared gateway, encapsulated using
the L three V and I tunnel, sent across the
underlay to Leaf five, decapsulated, and then routed to the
destination server in Vland twenty.
Speaker 1 (09:46):
Got it. The L three V and I is the
dedicated interview land highway for that tenant across the fabric.
Speaker 2 (09:51):
Perfect analogy, and.
Speaker 1 (09:53):
This ties into that fabric any cast gateway feature right,
making every leaf an active router.
Speaker 2 (09:57):
Absolutely Forget old HSRP or VA, where one router was
active and the other just sat there waiting to fail.
Speaker 1 (10:03):
Over wasting half your capacity.
Speaker 2 (10:05):
Right with any cast gateway, all the leaf switches share
the exact same virtual MAC address and the same default
gateway IP address for a given v Land.
Speaker 1 (10:13):
So a server just sends traffic to its gateway.
Speaker 2 (10:15):
IP and whichever leaf receives it can immediately route it,
no tromboning traffic to a specific active core switch. Every
leaf is an active Layer three core for the v
lands it serves. It distributes the routing load beautifully.
Speaker 1 (10:28):
Very cool. Okay, final big section, getting traffic in and
out of this fancy fabric external connectivity. We need a
border leaf for that YEP.
Speaker 2 (10:38):
One or more leaves need to be designated as border leafs.
They're the ones physically connected to the outside world, the wan,
maybe a firewall cluster, the rest of the campus network.
Speaker 1 (10:47):
And they need some extra configuration.
Speaker 2 (10:48):
Obviously for sure, because now you're bridging the BGPEVPN world
with you know, potentially OSPF static routes whatever is running externally.
Speaker 1 (10:55):
So when that border leaf learns an external route, say
from OSTF, how does it tell the rest of the
fabric about it.
Speaker 2 (11:02):
Inside EVPN, it advertises those external prefixes as EVPN Type
five routes. That's the key type for external.
Speaker 1 (11:09):
Reachability, okay, type five. And what is that rout Perry.
Speaker 2 (11:12):
It carries the external prefix like your WHAN subnet. And critically,
the next hop for that route is set to the
vtep ip address of the border leaf itself.
Speaker 1 (11:21):
Ah. So if leaf seven needs to send traffic to
the one it.
Speaker 2 (11:24):
Sees the type five route sees, the next hop is
the border leaf's VTEP encapsulates the packet using the L three,
V and I tunnel pointed at the border leaf and
fires it off slick.
Speaker 1 (11:34):
Okay, But now the tricky part integrating OSPF for something externally.
Isn't there a big risk of asymmetric routing?
Speaker 2 (11:42):
Huge risk. This is probably the second biggest deployment headache
after MTU.
Speaker 1 (11:46):
Traffic goes out border leaf A tries to come back
in be a border leaf B, and the firewall freaks
out because it didn't see the outbound.
Speaker 2 (11:52):
Flow exactly that scenario state mismatch, broken connections. So the
goal has to be symmetric routing.
Speaker 1 (11:59):
How do you force that?
Speaker 2 (12:00):
You have to play with the routing protocol metrics or
administrative distances. You need to make sure the path back
into the fabric from the external network prefers the same
border leaf the traffic went out on.
Speaker 1 (12:09):
So maybe making the EVPN route learn via IBGP more
preferred than the OSPF route.
Speaker 2 (12:15):
That's a very common way. Default IBGP eight is two hundred,
but you can lower it, say below OSPFS one ten.
That usually ensures return traffic prefers the EVPN path back
through the correct border leaf. You got to make sure
forward and return paths match.
Speaker 1 (12:29):
Gotcha last detail, the default route getting zero points zero
point zero zero zero advertised across the fabric, so everything
can reach the Internet, presumably via the border leaf you
mentioned needing two BGP commands for this.
Speaker 2 (12:43):
Yeah, people trip over this. Sometimes you need default information
originate under the BGP VRF canfig that tells BGP, hey,
I want to advertise a default route.
Speaker 1 (12:52):
Okay, step one.
Speaker 2 (12:52):
But BGP won't advertise a route unless it's actually in
its BGP table. So you also need the network point
zero point zero point zero or a zero zero command
in that same VRF address family configuration.
Speaker 1 (13:03):
To actually inject the route into BGP so it can
be originated exactly.
Speaker 2 (13:06):
You need both default information originate enables it, networks point
zero point zero point zero zero zero, provides the route itself,
then the border LAFE advertises it as a Type five
and the whole fabric knows how to get out.
Speaker 1 (13:17):
Makes perfect sense. Wow, Okay, we covered a lot of ground,
we really did, but the summary feels clear now. VX
land BGPEDPN. It smashes the old vland limits from massive scale,
gives you amazing performance with ECMP, and crucially uses that
smart control plane to dish the nightmare of layer two
loops and flooding.
Speaker 2 (13:34):
Yeah, the big picture is this blend of layer two
VPN technology running over a solid layer three foundation. It's
all about flexibility, speed and making network deployments, especially using templates,
much faster and more reliable.
Speaker 1 (13:49):
And for you listening, if you want to go even deeper.
The source material mentioned things like multipod and multi site,
extending this fabric idea across different data centers.
Speaker 2 (13:58):
Right, making multiple physical sites look like one giant logical
fabric multipod more connecting separate fabrics together multi.
Speaker 1 (14:06):
Site, which leads to our final thought.
Speaker 2 (14:08):
If you're stretching one logical fabric across multiple physical locations,
maybe even miles apart, what's the absolute most critical routing
principle you have to maintain everywhere across all sites to
keep things working smoothly.
Speaker 1 (14:20):
It's not just about can packets get there?
Speaker 2 (14:23):
No, it comes back to symmetry and filtering, ensuring traffic
flows follow the same path out and back globally and
making sure your route filtering is consistent everywhere. That becomes
absolutely paramount. Get that wrong in a multi site setup
and you're in for a world of pain.
Speaker 1 (14:37):
Something to definitely keep in mind. Okay, that was a
fantastic deep dive. Thanks for bringing this complex topic.
Speaker 2 (14:43):
Glad we can unpack it.
Speaker 1 (14:44):
And thank you for joining us. We'll Catch you next
time on the Deep Dive