How's about a little VXLAN ...just for fun?

    Have you ever trouble-shot a system that wasn’t broke? What?  Troubleshooting something that isn’t broken might seem like total madness, but it is a really good way to really become intimately familiar something while it’s actually working.  So, in the spirit of fun and unbridled exploration, let’s tackle one of the more complex technologies in the data center arsenal—VXLAN.

     

    I’m not going to go into the how and wherefores of VXLAN.  If you’re interested (and you should be because it’s interesting!), I would suggest reading Cisco Programmable Fabric with VXLAN BGP EVPN Configuration Guide, or any of a number of available resources on VXLAN. If you feel like you need to go on a VXLAN binge, I would recommend Building Data Centers with VXLAN BGP EVPN:  A Cisco NX-OS Perspective.

     

    To start off with, here’s my topology. 

     

    Screen Shot 2018-02-23 at 1.02.41 PM.png

     

    There are two VRFs in this VXLAN environment, GREEN and BLUE.  Leaf 1, 2, and 3 have VTEP IP addresses 4.4.1.1, 4.4.1.2 and 4.4.1.3, respectively. Notice that Leaf 1 and Leaf 2 have been configured as a vPC pair.  The VMs directly attached to these two leaf switches are attached by vPC member ports. The pair has been configured with a vIP of 4.4.1.100.  Each VM can reach each other VM within their VRF.  (Going between the VRFs is a post for another day!)

     

     

    Within each VRF you can find:

     

    VRF

    VNI/VLAN

    Server

    IP

    VM MAC

    GREEN

    10001/401

    VM1

    1. 172.24.1.101

    ea66

    10001/401

    VM3

    1. 172.24.1.103

    8847

    10002/402

    VM2

    1. 172.24.2.102

    7637

    10003/403

    ---

    ---

    ---

    BLUE

    10011/411

    VM11

    1. 172.24.10.101

    7072

    10011/411

    VM13

    1. 172.24.10.103

    cbec

    10012/412

    VM12

    1. 172.24.20.102

    1963

    10013/413

    ---

    ---

    ---

     

     

    One rational strategy for troubleshooting VXLAN closely follows how a MAC, or MAC and IP pair get learned and then distributed throughout the fabric. [i]

    That is:

     

    1.     The VTEP learns the MAC address of a locally connected host.  The MAC is entered into the VTEP’s MAC address table.

    2.     The MAC address and an associated VTEP IP address get entered into the L2RIB.

    3.     From the L2RIB, two routes are generated in the BGP table.  A Type 2 route for the MAC address and a Type 2 route for the MAC address with an associated VTEP IP address. 

    4.     Routes get advertised to remote neighbors.  The two Type 2 routes generated by the originating VTEP get placed into the BGP table of the remote VTEPs.

    5.     From the BGP table, the remote MAC address and its corresponding VTEP IP get placed in the L2RIB.

    6.     From the L2RIB, an entry is made in the remote VTEP’s MAC address table.

     

    Now before we can even begin this troubleshoot, we should check to make sure that our NVE interface is in good shape, and that we have peering with the VTEP(s).  After all, no interface, no traffic for you!  No peers, no traffic for you!  To start, use show nve interface.

     

    pod4-leaf-1# show nve int

    Interface: nve1, State: Up, encapsulation: VXLAN

    VPC Capability: VPC-VIP-Only [notified]

    Local Router MAC: 5897.bd50.223f

    Host Learning Mode: Control-Plane

    Source-Interface: loopback0 (primary: 4.4.1.1, secondary: 4.4.1.100)

     

     

    There’s a ton of really good information here.  You got the status of the interface and its encapsulation type.  When everything is working right, you should have “State: Up”, “encapsulation: VXLAN”, and “Host Learning Mode: Control-Plane.”   Also included in this command are such goodies as the local router MAC, the local VTEP IP listed as the primary IP, and, in the case of a vPC pair, the vIP is listed as the secondary IP.

     

    Moving on, who are the VTEPs in our neighborhood? (Sorry, I grew up on Mr. Rogers’ Neighborhood!)  For this information, use show nve peer detail.

     

    pod4-leaf-3# show nve peer det

    Details of nve Peers:

    ----------------------------------------

    Peer-Ip: 4.4.1.100

        NVE Interface : nve1

        Peer State : Up

        Peer Uptime : 00:19:27

        Router-Mac : 5897.bd50.223f

        Peer First VNI      : 10001

        Time since Create : 00:19:27

        Configured VNIs : 10001-10003,10011-10013

        Provision State : peer-add-complete

        Learnt CP VNIs      : 10001,10003,10011-10013

        vni assignment mode : SYMMETRIC

        Peer Location : N/A

     

    This output was taken from Leaf 3, and you can see that it peers to the vIP representing the vPC pair.   You also get the router MAC of your peer, the VNIs configured on the peer, and the VNIs learned.  One thing interesting to note is that this command will give you all VNIs regardless of what VRF they might belong to.  In this case, 10001, 10002, and 10003 belong to VRF GREEN, and 10011, 10012, and 10013 belong to VRF BLUE.

     

    Another interesting thing to note is that this leaf switch doesn’t have an NVE adjacency with any spine.  In my topology, the spines are not configured with any of the VXLAN commands—VLAN to VNI mapping, NVE interface, not even VRF. The only thing the spines are configured with are the underlay and BGP peering (with route-reflector).

     

    Now, stepping through the troubleshooting strategy I just outlined, the first question is whether the MAC address got entered into the MAC address table.  For this exercise, we’re interested in traffic between VM2 and VM3. So, starting with Leaf 1, and using an oldie-but-goodie, show mac address-table dynamic.

     

    pod4-leaf-1# show mac add dy

    Legend:

            * - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC

    age - seconds since last seen,+ - primary entry using vPC Peer-Link,

    (T) - True, (F) - False, C - ControlPlane MAC, ~ - vsan

    VLAN     MAC Address      Type age     Secure NTFY Ports

    ---------+-----------------+--------+---------+------+----+------------------

    C  401    0050.5688.7637   dynamic 0         F      F nve1(4.4.1.3)

    * 401     0050.5688.ea66   dynamic 0         F      F Po402

    +  402    0050.5688.8847   dynamic  0 F      F      Po402

    C 411     0050.5688.1963   dynamic 0         F      F nve1(4.4.1.3)

    * 411     0050.5688.7072   dynamic 0         F      F Po402

    +  412    0050.5688.cbec   dynamic  0 F      F      Po401

     

    This should be pretty familiar.  You get the VLAN, the MAC address and the port that the MAC address is associated with. I’ve highlighted the MAC address we’re interested in just to make life a little easier, and you can see that VM2 is actually “directly attached” to this leaf’s vPC peer, and learned via the vPC peer-link. 

     

    Next step… did this MAC address—8847—get into Leaf1’s L2RIB? Let’s check, using sh l2route evpn mac-ip all.

     

    pod4-leaf-1# sh l2route e mac-ip all

    Flags -(Rmac):Router MAC (Stt):Static (L):Local (R):Remote (V):vPC link

    (Dup):Duplicate (Spl):Split (Rcv):Recv(D):Del Pending (S):Stale (C):Clear

    (Ps):Peer Sync (Ro):Re-Originated

    Topology Mac Address    Prod   Flags Seq No     Host IP         Next-Hops     

    ----------- -------------- ------ ---------- --------------- ---------------

    401 0050.5688.ea66 HMM    --            0          172.24.1.101   Local         

    401 0050.5688.7637 BGP    --            0          172.24.1.103   4.4.1.3       

    402 0050.5688.8847 HMM    --            0          172.24.2.102   Local         

    411 0050.5688.7072 HMM    --            0          172.24.10.101  Local

    411 0050.5688.1963 BGP    --            0          172.24.10.103  4.4.1.3

    412 0050.5688.cbec HMM    --            0          172.24.20.102  Local

     

    Here’s our MAC address.  You can see it’s associated with VM2’s IP address, 172.24.2.102. HMM indicates that this switch got the information from its own hardware, as opposed learning it through BGP. 

     

    The next step is to check to see if this information got put into the BGP table. Now we use, show bgp l2vpn evpn vni-id.

     

    pod4-leaf-1# sh bgp l2 e vni-id 10002

    BGP routing table information for VRF default, address family L2VPN EVPN

    BGP table version is 201, Local Router ID is 4.0.0.101

    Status: s-suppressed, x-deleted, S-stale, d-dampened, h-history, *-valid, >-best

    Path type: i-internal, e-external, c-confed, l-local, a-aggregate, r-redist, I-injected

    Origin codes: i - IGP, e - EGP, ? - incomplete, | - multipath, & - backup

     

    Network            Next Hop            Metric     LocPrf Weight Path

    Route Distinguisher: 4.0.0.101:33169    (L2VNI 10002)

    *>l[2]:[0]:[0]:[48]:[0050.5688.8847]:[0]:[0.0.0.0]/216

                          4.4.1.100                         100      32768 i

    *>l[2]:[0]:[0]:[48]:[0050.5688.8847]:[32]:[172.24.2.102]/272

                          4.4.1.100                         100      32768 i

     

     

    You can issue this command without specifying a VRF or a VNI, but then you might get a ton of output to have wade through, so here I am using the command specific to the MAC address we are following. 

     

    Again, this command gives you output that is pretty dense, so let’s break it down a bit.  First, you can find the route distinguisher.  If you auto-generated the RD (like I did here), the RD is composed of the router-ID of the originating switch (in this case 4.0.0.101) and a randomly generated number assigned to the VNI (here the lucky number is 33169).

     

    Route Distinguisher: 4.0.0.101:33169    (L2VNI 10002)

     

    Next, you can see that there BGP has generated two routes.  Both are Type 2, as indicated by the very first number in brackets.

     

     

    *>l[2]:[0]:[0]:[48]:[0050.5688.8847]:[0]:[0.0.0.0]/216

                          4.4.1.100                        

    *>l[2]:[0]:[0]:[48]:[0050.5688.8847]:[32]:[172.24.2.102]/272

                          4.4.1.100  

     

    Type 2 routes are routes to an end-host.  Notice how both routes carry VM2’s MAC address.  The second route carries VM2’s MAC address and its IP address. The other kind of route that is used by BGP in a VXLAN fabric is a Type 5 route, which are routes to subnets with variable masks. Type 5 routes carry IP address information only.[ii]

     

     

    *>l[2]:[0]:[0]:[48]:[0050.5688.8847]:[0]:[0.0.0.0]/216

                          4.4.1.100                        

    *>l[2]:[0]:[0]:[48]:[0050.5688.8847]:[32]:[172.24.2.102]/272

                          4.4.1.100

     

    The other thing to notice is that each route identifies the next-hop.  Notice how in this case, the next-hop is the vIP of the vPC pair, 4.4.1.100.  By using the vIP as the next hop for all end-hosts attached to vPC, BGP can take advantage of redundant routes to the end-host through either vPC switch.

     

    OK, so now we know that BGP has generated the right routes.  The next step is to verify that those routes got distributed to the BGP table of the remote leaf.  For that we want to look at the same command on the remote leaf.

     

    pod4-leaf-3# sh bgp l2 e vni 10002

    pod4-leaf-3#

     

    Wait a minute!!  What’s going on?  Before we panic and start clucking like a chicken, let’s backtrack a bit and use the strategy we just used.  First step, what does Leaf3’s NVE interface look like?  Do you remember the command?

     

    pod4-leaf-3# sh nve int

    Interface: nve1, State: Up, encapsulation: VXLAN

    VPC Capability: VPC-VIP-Only [not-notified]

    Local Router MAC: 84b8.02ca.595f

    Host Learning Mode: Control-Plane

    Source-Interface: loopback0 (primary: 4.4.1.3, secondary: 0.0.0.0)

     

    Does this look OK to you?  Looks good to me, too.  The NVE is in the Up state, it’s using VXLAN encapsulation, and its host learning by Control-Plane.  So, let’s look at who our NVE peers are.

     

    pod4-leaf-3# sh nve peer det

    Details of nve Peers:

    ----------------------------------------

    Peer-Ip: 4.4.1.100

        NVE Interface       : nve1

        Peer State          : Up

        Peer Uptime         : 22:12:31

    Router-Mac          : 5897.bd50.223f

        Peer First VNI      : 10001

        Time since Create   : 22:12:31

    Configured VNIs     : 10001-10003,10011-10013

    Provision State     : peer-add-complete

        Learnt CP VNIs      : 10001,10003,10011-10013

        vni assignment mode : SYMMETRIC

        Peer Location       : N/A

     

    Do you see anything out of place here?  I do.  Compare the configured VNIs with the Learnt CP VNIs.  This leaf is not learning any routes associated with VNI 10002. Hmm.  Why not?  Let’s explore. 

     

    The first thing is to check the VLAN to VNI mapping.  Use show vxlan.

     

    pod4-leaf-3# sh vxlan

    Vlan            VN-Segment

    ==== ==========

    401 10001

    402             10002

    403 10003

    411 10011

    412 10012

    413 10013

     

     

    Here you can see the VLAN we’re interested in paired to the correct VNI. So now let’s take a look at VLAN 402’s SVI interface, using show interface brief | include vlan

     

    pod4-leaf-3# sh int br | i Vlan

    Vlan1 -- down   Administratively down 

    Vlan401 -- up     --       

    Vlan402   --                                      down   Administratively down 

    Vlan403 -- up     --       

    Vlan411 -- up     --       

    Vlan412 -- up     --       

    Vlan413   --                                      up     --

     

    AH-HA! VLAN 402 has been administratively shutdown.  But wait! I thought you said this system was passing traffic!  No, I didn’t trick you.  Check it out.

     

    [cisco@pod4-vm3 ~]$ ping 172.24.1.101 -c 5

    PING 172.24.1.101 (172.24.1.101) 56(84) bytes of data.

    64 bytes from 172.24.1.101: icmp_seq=1 ttl=64 time=0.475 ms

    64 bytes from 172.24.1.101: icmp_seq=2 ttl=64 time=0.302 ms

    64 bytes from 172.24.1.101: icmp_seq=3 ttl=64 time=0.315 ms

    64 bytes from 172.24.1.101: icmp_seq=4 ttl=64 time=0.307 ms

    64 bytes from 172.24.1.101: icmp_seq=5 ttl=64 time=0.310 ms

     

    --- 172.24.1.101 ping statistics ---

    5 packets transmitted, 5 received, 0% packet loss, time 4000ms

    rtt min/avg/max/mdev = 0.302/0.341/0.475/0.070 ms

     

    [cisco@pod4-vm3 ~]$ ping 172.24.2.102 -c 5

    PING 172.24.2.102 (172.24.2.102) 56(84) bytes of data.

    64 bytes from 172.24.2.102: icmp_seq=1 ttl=62 time=0.419 ms

    64 bytes from 172.24.2.102: icmp_seq=2 ttl=62 time=0.278 ms

    64 bytes from 172.24.2.102: icmp_seq=3 ttl=62 time=0.268 ms

    64 bytes from 172.24.2.102: icmp_seq=4 ttl=62 time=0.294 ms

    64 bytes from 172.24.2.102: icmp_seq=5 ttl=62 time=0.272 ms

     

    --- 172.24.2.102 ping statistics ---

    5 packets transmitted, 5 received, 0% packet loss, time 4000ms

    rtt min/avg/max/mdev = 0.268/0.306/0.419/0.058 ms

     

     

    VM3 can, in fact, still reach VM1 and VM2, even though VM2 is in a different VLAN and a different VNI, and VM2’s VLAN and VNI are shutdown on VM3’s leaf.  Take a moment and think about why this might be.

     

    The reason VM3 can reach VM2 even though VM2’s VNI/VLAN is shut down on VM3’s leaf is because of the way Cisco switches handle routing, using symmetric Integrated Routing and Bridging (IRB).  With symmetric IRB, the leaf switches aren’t required to have both source and destination VNI on each leaf.  The source leaf requires the source VNI and the transit VNI. In our case, that would mean that Leaf2 needs the VNI for VLAN402, 10002 and the transit VNI, 10003.   The destination leaf requires the transit VNI and the destination VNI.  For us, Leaf 3 needs the transit VNI and the VNI for VM3, which is 10001.  That is exactly what we are seeing here.[iii]

     

    I hope that you enjoyed this troubleshooting jaunt through VXLAN, and I encourage you to try out these show commands yourself.  Finally, I encourage you to share your favorite VXLAN show command. 

     

    If folks find this post useful, I would be happy to produce more.  I welcome suggestions on topics. 


    References:

    [i] Cisco Live Las Vegas 2017 BRKDCN-3040 by Vinit Jain.  If you feel the need to go deeper, I suggest Mr. Jain’s textbook Troubleshooting BGP: A Practical Guide to Understanding and Troubleshooting BGP, at 687-690

    [ii] Building Data Centers with VXLAN BGP EVPN, at 39-40.

    [iii]  Building Data Centers with VXLAN BGP EVPN, at 69-73.  Cisco supports only symmetric IRB. Per Mr. Krattiger, “So far, nobody was able to convince us to drop all the scale and functional advantage of [symmetric IRB] and go in an Asymmetric IRB direction.”