How's About a little VXLAN...Just for Fun

How's About a Little VXLAN...
Just for Fun.

by Micheline Murphy

 

Have you ever trouble-shot a system that wasn’t broke? What? Troubleshooting something that isn’t broken might seem like total madness, but it is a really good way to really become intimately familiar something while it’s actually working.  So, in the spirit of fun and unbridled exploration, let’s tackle one of the more complex technologies in the data center arsenal—VXLAN.

 

I’m not going to go into the how and wherefores of VXLAN. If you’re interested (and you should be because it’s interesting!), I would suggest reading Cisco Programmable Fabric with VXLAN BGP EVPN Configuration Guide, or any of a number of available resources on VXLAN. If you feel like you need to go on a VXLAN binge, I would recommend Building Data Centers with VXLAN BGP EVPN:  A Cisco NX-OS Perspective.

 

To start off with, here’s my topology.

 

 

There are two VRFs in this VXLAN environment, GREEN and BLUE.  Leaf 1, 2, and 3 have VTEP IP addresses 4.4.1.1, 4.4.1.2 and 4.4.1.3, respectively. Notice that Leaf 1 and Leaf 2 have been configured as a vPC pair. The VMs directly attached to these two leaf switches are attached by vPC member ports. The pair has been configured with a vIP of 4.4.1.100.  Each VM can reach each other VM within their VRF.  (Going between the VRFs is a post for another day!)

 

Within each VRF you can find:

 

VNI/VLAN

Server

IP

GREEN

10001/401

VM1

  1. 172.24.1.101

10001/401

VM3

  1. 172.24.1.103

10002/402

VM2

  1. 172.24.2.102

10003/403

---

---

BLUE

10011/411

VM11

  1. 172.24.10.101

10011/411

VM13

  1. 172.24.10.101

10012/412

VM12

  1. 172.24.20.102

10013/413

---

---

 

 

One rational strategy for troubleshooting VXLAN closely follows how a MAC, or MAC and IP pair get learned and then distributed throughout the fabric. [i]

That is:

 

  1. The VTEP learns the MAC address of a locally connected host.  The MAC is entered into the VTEP’s MAC address table.
  2. The MAC address and an associated VTEP IP address get entered into the L2RIB.
  3. From the L2RIB, two routes are generated in the BGP table.  A Type 2 route for the MAC address and a Type 2 route for the MAC address with an associated VTEP IP address. 
  4. Routes get advertised to remote neighbors.  The two Type 2 routes generated by the originating VTEP get placed into the BGP table of the remote VTEPs.
  5. From the BGP table, the remote MAC address and its corresponding VTEP IP get placed in the L2RIB.
  6. From the L2RIB, an entry is made in the remote VTEP’s MAC address table.

 

Now before we can even begin this troubleshoot, we should check to make sure that our NVE interface is in good shape, and that we have peering with the VTEP(s). After all, no interface, no traffic for you! No peers, no traffic for you! To start, use show nve interface.

 

pod4-leaf-1# show nve int

Interface: nve1, State: Up, encapsulation: VXLAN

VPC Capability: VPC-VIP-Only [notified]

Local Router MAC: 5897.bd50.223f

Host Learning Mode: Control-Plane

Source-Interface: loopback0 (primary: 4.4.1.1, secondary: 4.4.1.100)

 

There’s a ton of really good information here. You got the status of the interface and its encapsulation type. When everything is working right, you should have “State: Up”, “encapsulation: VXLAN”, and “Host Learning Mode: Control-Plane.” Also included in this command are such goodies as the local router MAC, the local VTEP IP listed as the primary IP, and, in the case of a vPC pair, the vIP is listed as the secondary IP.

 

Moving on, who are the VTEPs in our neighborhood? (Sorry, I grew up on Mr. Rogers’ Neighborhood!) For this information, use show nve peer detail.

 

pod4-leaf-3# show nve peer det

Details of nve Peers:

----------------------------------------

Peer-Ip: 4.4.1.100

    NVE Interface : nve1

    Peer State : Up

    Peer Uptime : 00:19:27

    Router-Mac : 5897.bd50.223f

    Peer First VNI      : 10001

    Time since Create : 00:19:27

    Configured VNIs : 10001-10003,10011-10013

    Provision State : peer-add-complete

    Learnt CP VNIs      : 10001,10003,10011-10013

    vni assignment mode : SYMMETRIC

    Peer Location : N/A

 

This output was taken from Leaf 3, and you can see that it peers to the vIP representing the vPC pair. You also get the router MAC of your peer, the VNIs configured on the peer, and the VNIs learned. One thing interesting to note is that this command will give you all VNIs regardless of what VRF they might belong to. In this case, 10001, 10002, and 10003 belong to VRF GREEN, and 10011, 10012, and 10013 belong to VRF BLUE.

 

Another interesting thing to note is that this leaf switch doesn’t have an NVE adjacency with any spine. In my topology, the spines are not configured with any of the VXLAN commands—VLAN to VNI mapping, NVE interface, not even VRF. The only thing the spines are configured with are the underlay and BGP peering (with route-reflector).

 

Now, stepping through the troubleshooting strategy I just outlined, the first question is whether the MAC address got entered into the MAC address table. For this exercise, we’re interested in traffic between VM2 and VM3. So, starting with Leaf 1, and using an oldie-but-goodie, show mac address-table dynamic.

 

pod4-leaf-1# show mac add dy

Legend:

        * - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC

age - seconds since last seen,+ - primary entry using vPC Peer-Link,

(T) - True, (F) - False, C - ControlPlane MAC, ~ - vsan

VLAN    MAC Address      Type age    Secure NTFY Ports

---------+-----------------+--------+---------+------+----+------------------

C  401    0050.5688.7637  dynamic 0        F      F nve1(4.4.1.3)

* 401    0050.5688.ea66  dynamic 0        F      F Po402

+  402 0050.5688.8847  dynamic  0 F      F    Po402

C 411    0050.5688.1963  dynamic 0        F      F    nve1(4.4.1.3)

* 411    0050.5688.7072  dynamic 0        F      F Po402

+  412 0050.5688.cbec  dynamic  0 F      F    Po401

 

This should be pretty familiar. You get the VLAN, the MAC address and the port that the MAC address is associated with. I’ve highlighted the MAC address we’re interested in just to make life a little easier, and you can see that VM2 is actually “directly attached” to this leaf’s vPC peer, and learned via the vPC peer-link.

 

Next step… did this MAC address—8847—get into Leaf1’s L2RIB? Let’s check, using sh l2route evpn mac-ip all.

 

pod4-leaf-1# sh l2route e mac-ip all

Flags -(Rmac):Router MAC (Stt):Static (L):Local (R):Remote (V):vPC link

(Dup):Duplicate (Spl):Split (Rcv):Recv(D):Del Pending (S):Stale (C):Clear

(Ps):Peer Sync (Ro):Re-Originated

Topology Mac Address    Prod  Flags Seq No    Host IP        Next-Hops     

----------- -------------- ------ ---------- --------------- ---------------

401 0050.5688.ea66 HMM    --            0          172.24.1.101  Local         

401 0050.5688.7637 BGP    --            0          172.24.1.103  4.4.1.3       

402        0050.5688.8847 HMM    --            0          172.24.2.102  Local

411 0050.5688.7072 HMM    --            0          172.24.10.101  Local

411 0050.5688.1963 BGP    --            0          172.24.10.103  4.4.1.3

412 0050.5688.cbec HMM    --            0          172.24.20.102  Local

 

 

Here’s our MAC address. You can see it’s associated with VM2’s IP address, 172.24.2.102. HMM indicates that this switch got the information from its own hardware, as opposed learning it through BGP.

 

The next step is to check to see if this information got put into the BGP table. Now we use, show bgp l2vpn evpn vni-id.

 

pod4-leaf-1# sh bgp l2 e vni-id 10002

BGP routing table information for VRF default, address family L2VPN EVPN

BGP table version is 201, Local Router ID is 4.0.0.101

Status: s-suppressed, x-deleted, S-stale, d-dampened, h-history, *-valid, >-best

Path type: i-internal, e-external, c-confed, l-local, a-aggregate, r-redist, I-injected

Origin codes: i - IGP, e - EGP, ? - incomplete, | - multipath, & - backup

 

Network            Next Hop            Metric    LocPrf Weight Path

Route Distinguisher: 4.0.0.101:33169    (L2VNI 10002)

*>l[2]:[0]:[0]:[48]:[0050.5688.8847]:[0]:[0.0.0.0]/216

                      4.4.1.100                        100      32768 i

*>l[2]:[0]:[0]:[48]:[0050.5688.8847]:[32]:[172.24.2.102]/272

                      4.4.1.100                        100      32768 i

 

 

You can issue this command without specifying a VRF or a VNI, but then you might get a ton of output to have wade through, so here I am using the command specific to the MAC address we are following.

 

Again, this command gives you output that is pretty dense, so let’s break it down a bit. First, you can find the route distinguisher. If you auto-generated the RD (like I did here), the RD is composed of the router-ID of the originating switch (in this case 4.0.0.101) and a randomly generated number assigned to the VNI (here the lucky number is 33169).

 

Route Distinguisher: 4.0.0.101:33169    (L2VNI 10002)

 

Next, you can see that there BGP has generated two routes.  Both are Type 2, as indicated by the very first number in brackets.

 

*>l[2]:[0]:[0]:[48]:[0050.5688.8847]:[0]:[0.0.0.0]/216

                      4.4.1.100                       

*>l[2]:[0]:[0]:[48]:[0050.5688.8847]:[32]:[172.24.2.102]/272

                      4.4.1.100 

 

Type 2 routes are routes to an end-host. Notice how both routes carry VM2’s MAC address. The second route carries VM2’s MAC address and its IP address. The other kind of route that is used by BGP in a VXLAN fabric is a Type 5 route, which are routes to subnets with variable masks. Type 5 routes carry IP address information only.[ii]

 

 

*>l[2]:[0]:[0]:[48]:[0050.5688.8847]:[0]:[0.0.0.0]/216

                      4.4.1.100                       

*>l[2]:[0]:[0]:[48]:[0050.5688.8847]:[32]:[172.24.2.102]/272

                      4.4.1.100

 

The other thing to notice is that each route identifies the next-hop. Notice how in this case, the next-hop is the vIP of the vPC pair, 4.4.1.100. By using the vIP as the next hop for all end-hosts attached to vPC, BGP can take advantage of redundant routes to the end-host through either vPC switch.

 

OK, so now we know that BGP has generated the right routes. The next step is to verify that those routes got distributed to the BGP table of the remote leaf. For that we want to look at the same command on the remote leaf.

 

pod4-leaf-3# sh bgp l2 e vni 10002

pod4-leaf-3#

 

Wait a minute! What’s going on? Before we panic and start clucking like a chicken, let’s backtrack a bit and use the strategy we just used. First step, what does Leaf3’s NVE interface look like?

 

pod4-leaf-3# sh nve int

Interface: nve1, State: Up, encapsulation: VXLAN

VPC Capability: VPC-VIP-Only [not-notified]

Local Router MAC: 84b8.02ca.595f

Host Learning Mode: Control-Plane

Source-Interface: loopback0 (primary: 4.4.1.3, secondary: 0.0.0.0)

 

Does this look OK to you? Looks good to me, too. The NVE is in the Up state, it’s using VXLAN encapsulation, and its host learning by Control-Plane. So, let’s look at who our NVE peers are.

 

pod4-leaf-3# sh nve peer det

Details of nve Peers:

----------------------------------------

Peer-Ip: 4.4.1.100

    NVE Interface      : nve1

    Peer State          : Up

    Peer Uptime        : 22:12:31

Router-Mac          : 5897.bd50.223f

    Peer First VNI      : 10001

    Time since Create  : 22:12:31

Configured VNIs    : 10001-10003,10011-10013

Provision State    : peer-add-complete

    Learnt CP VNIs      : 10001,10003,10011-10013

    vni assignment mode : SYMMETRIC

    Peer Location      : N/A

 

Do you see anything out of place here? I do. Compare the configured VNIs with the Learnt CP VNIs. This leaf is not learning any routes associated with VNI 10002. Hmm. Why not? Let’s explore.

 

The first thing is to check the VLAN to VNI mapping. Use show vxlan.

 

pod4-leaf-3# sh vxlan

Vlan VN-Segment

==== ==========

401 10001

402            10002

403 10003

411 10011

412 10012

413 10013

 

Here you can see the VLAN we’re interested in paired to the correct VNI. So now let’s take a look at VLAN 402’s SVI interface, using show interface brief | include vlan

 

pod4-leaf-3# sh int br | i Vlan

Vlan1 -- down  Administratively down 

Vlan401 -- up    --       

Vlan402  --                                      down  Administratively down

Vlan403 -- up    --       

Vlan411 -- up    --       

Vlan412 -- up    --       

Vlan413  --                                      up    --

 

AH-HA! VLAN 402 has been administratively shutdown. But wait! I thought you said this system was passing traffic! No, I didn’t trick you. Check it out.

 

[cisco@pod4-vm3 ~]$ ping 172.24.1.101 -c 5

PING 172.24.1.101 (172.24.1.101) 56(84) bytes of data.

64 bytes from 172.24.1.101: icmp_seq=1 ttl=64 time=0.475 ms

64 bytes from 172.24.1.101: icmp_seq=2 ttl=64 time=0.302 ms

64 bytes from 172.24.1.101: icmp_seq=3 ttl=64 time=0.315 ms

64 bytes from 172.24.1.101: icmp_seq=4 ttl=64 time=0.307 ms

64 bytes from 172.24.1.101: icmp_seq=5 ttl=64 time=0.310 ms

 

--- 172.24.1.101 ping statistics ---

5 packets transmitted, 5 received, 0% packet loss, time 4000ms

rtt min/avg/max/mdev = 0.302/0.341/0.475/0.070 ms

 

[cisco@pod4-vm3 ~]$ ping 172.24.2.102 -c 5

PING 172.24.2.102 (172.24.2.102) 56(84) bytes of data.

64 bytes from 172.24.2.102: icmp_seq=1 ttl=62 time=0.419 ms

64 bytes from 172.24.2.102: icmp_seq=2 ttl=62 time=0.278 ms

64 bytes from 172.24.2.102: icmp_seq=3 ttl=62 time=0.268 ms

64 bytes from 172.24.2.102: icmp_seq=4 ttl=62 time=0.294 ms

64 bytes from 172.24.2.102: icmp_seq=5 ttl=62 time=0.272 ms

 

--- 172.24.2.102 ping statistics ---

5 packets transmitted, 5 received, 0% packet loss, time 4000ms

rtt min/avg/max/mdev = 0.268/0.306/0.419/0.058 ms

 

VM3 can, in fact, still reach VM1 and VM2, even though VM2 is in a different VLAN and a different VNI, and VM2’s VLAN and VNI are shutdown on VM3’s leaf. Take a moment and think about why this might be.

 

The reason VM3 can reach VM2 even though VM2’s VNI/VLAN is shut down on VM3’s leaf is because of the way Cisco switches handle routing, using symmetric Integrated Routing and Bridging (IRB). With symmetric IRB, the leaf switches aren’t required to have both source and destination VNI on each leaf. The source leaf requires the source VNI and the transit VNI. In our case, that would mean that Leaf2 needs the VNI for VLAN402, 10002 and the transit VNI, 10003. The destination leaf requires the transit VNI and the destination VNI. For us, Leaf 3 needs the transit VNI and the VNI for VM3, which is 10001. That is exactly what we are seeing here.[iii]

 

I hope that you enjoyed this troubleshooting jaunt through VXLAN, and I encourage you to try out these show commands yourself. Finally, I encourage you to share your favorite VXLAN show command.

 

If folks find this post useful, I would be happy to produce more. I welcome suggestions on topics.

 

 

 


[i] Cisco Live Las Vegas 2017 BRKDCN-3040 by Vinit Jain.  If you want to go deeper, I suggest Mr. Jain’s textbook Troubleshooting BGP: A Practical Guide to Understanding and Troubleshooting BGP, at 687-690

[ii] Building Data Centers with VXLAN BGP EVPN, at 39-40.

[iii] Building Data Centers with VXLAN BGP EVPN, at 69-73.  Cisco supports only symmetric IRB. Per Mr. Krattiger, “So far, nobody was able to convince us to drop all the scale and functional advantage of [symmetric IRB] and go in an Asymmetric IRB direction.”