1 2 Previous Next 17 Replies Latest reply: May 10, 2019 12:53 PM by Micheline RSS

    Connect L3 device to VxLAN vPC peers

    Artem

      Hello Friends!

       

      Please help me to understand how in the right way configure L3 connectivity(routing adjacencies) between pair of VxLAN VPC Leafs and some L3 device(router/Firewall etc.)

       

      At this moment I`ve already done some configuration.

      I have two pairs of regular vPC peers, Border-Leafs(vPC as well), and dedicated Spines

       

      So I try to emulate firewall connection to VxLAN domain(particular to Border-Leafs)

      In a picture below some details.

       

      There is sample config from Border-Leafs(nx7706 with F3 cards)

      I just use SVI for establish routing adjacencies

      Is it right way to configure this connectivity?

      There is no good documentation about it..if you have it please share some config examples for this situations)


      Does it mandatory to configure some special SVI-Over-Peer-Link or not, I don`t understand why it would be need?

      And whether it nessesary to configure "layer-3 peer-gateway" on vpc domain? BGP routing adjacencies established without it..


      vxlan1111.JPG

       

      vrf context OVERLAY

        vni 99999

        rd auto

        address-family ipv4 unicast

          route-target both auto

          route-target both auto evpn

       

      bridge-domain 999

        member vni 99999

       

      interface nve1

        no shutdown

        host-reachability protocol bgp

        source-interface loopback1

        member vni 99999 associate-vrf

       

      interface Bdi999

        no shutdown

        mtu 9216

        vrf member OVERLAY

        no ip redirects

        ip forward

        no ipv6 redirects

       

      interface Vlan222

      description ## For establishing routing adjacencies ##

        no shutdown

        vrf member OVERLAY

        no ip redirects

        ip address 2.2.2.71/24 tag 999

        no ipv6 redirects

       

      interface Vlan1999

        description ## SVI over PeerLink ##

        no shutdown

        no ip redirects

        ip address 10.71.72.0/31

        no ipv6 redirects

        ip ospf network point-to-point

        ip router ospf UNDERLAY area 0.0.0.0

        ip pim sparse-mode

       

      interface port-channel222

      description ## vPC to L3 device ##

        switchport

        switchport mode trunk

        switchport trunk allowed vlan 222

        vpc 222

       

      vpc domain 173

        peer-switch

        peer-keepalive destination 10.99.99.76 source 10.99.99.75

        delay restore 150

        peer-gateway

        ip arp synchronize

       

       

      interface loopback0

        description ## For IGP/iBGP ##

        ip address 10.10.0.73/32

        ip router ospf UNDERLAY area 0.0.0.0

        ip pim sparse-mode

      interface loopback1

        description ## For NVE ##

        ip address 10.1.1.73/32

        ip address 10.10.73.74/32 secondary

        ip router ospf UNDERLAY area 0.0.0.0

        ip pim sparse-mode

       

      router ospf UNDERLAY

        bfd

        log-adjacency-changes

        auto-cost reference-bandwidth 400 Gbps

      router bgp 65510

        log-neighbor-changes

        address-family l2vpn evpn

        template peer BGP_FW

          update-source Vlan222

          address-family ipv4 unicast

            default-originate

            soft-reconfiguration inbound

        template peer SPINE_PEERS

          update-source loopback0

          address-family l2vpn evpn

            send-community both

        neighbor 10.10.0.71 remote-as 65510

          inherit peer SPINE_PEERS

        neighbor 10.10.0.72 remote-as 65510

          inherit peer SPINE_PEERS

        vrf OVERLAY

          router-id 2.2.2.71

          log-neighbor-changes

          address-family ipv4 unicast

            advertise l2vpn evpn

            redistribute direct route-map REDISTR_VRF_NETWORKS_IN_BGPL2VPNEVPN_L3VNI

          neighbor 2.2.2.1 remote-as 65511

            inherit peer BGP_FW

            no update-source Vlan222

          neighbor 2.2.2.31 remote-as 65511

            inherit peer BGP_FW

            no update-source Vlan222

          neighbor 2.2.2.72 remote-as 65510

            update-source Vlan222

            address-family ipv4 unicast

              soft-reconfiguration inbound

         
        • 1. Re: Connect L3 device to VxLAN vPC peers
          Micheline

          Hello Artem--I think typically, service appliances such as firewall are deployed in active/active or active/standby with one FW attached to each vPC peer.  (If you're using active/standby, connect the active device to the primary peer and the standby peer to the secondary peer.). You can use the command vpc orphan-port suspend on the secondary peer's link to the FW to force it to shut this port down in the event that the peer-link fails.  This would force the secondary's FW to failover to the one attached to the primary.

           

          Straight L3 devices such as routers should be attached to a vPC using L3 links (and whatever routing protocol you want for ECMP)  rather than creating a back-to-back vPC and trying to leverage L2 multipathing.

           

          The peer-gateway feature allows the peers to answer ARPs addressed to the well-known vMAC generated when SVIs on the peer switches are configured with a FHRP (such as HSRP).  The feature is meant to reduce the amount of traffic going across the peer-link by letting the HSRP standby peer answer an ARP for the default gateway rather than pass that ARP request to the primary.  It's not necessary, but it's a good feature to enable to tune your system. I saw that you had included it in your config.

           

          I looked at your attached config and I didn't see that you had included the configuration for the peer-link for vPC domain 173.  It will need to carry all VLANs that you expect to travel on member ports, and (depending on your platform) might also need to carry a VXLAN specific VLAN.  (The command to designate the VXLAN VLAN across the peer-link is vpc nve peer-link-vlan xx.). The VXLAN VLAN will need to be configured on both peers and carried on no other trunks in the network.

           

          Last question, is your VXLAN fabric passing traffic?  I see that you've configured a template for BGP adjacency to the spine, but I think you need to have the template also have address-family ipv4 uni and enable sending extended community and community information. 

           

          I found this chapter in the VXLAN Config Guide about L2 firewall insertion, but I'm not sure this will help much.  I found it interesting, though.n. Cisco Nexus 9000 Series NX-OS VXLAN Configuration Guide, Release 7.x - EVPN with Transparent Firewall Insertion [Cisco…

           

          Good luck!  MM

          • 2. Re: Connect L3 device to VxLAN vPC peers
            Artem

            Hello Micheline!

             

            Thanks for your reference, in the end it works.(but I didn`t tested some failover scenarios)

             

            It is 7700 platform, according to documentation there is no need to vpc nve peer-link-vlan feature.

            Do I need to configure something else for travelling traffic through vPC-PeerLink? (like vpc nve peer-link-vlan  fo 5600 series)

             

            It works without  address-family ipv4 unicast in Spine template, but I will try to implement this.

             

            So I just try to accomplish best level of redundancy, if I will connect Firewall in straight-throw fashion I lose traffic load-balancing and redundancy instead of dualhomed each FW cluster member to BorderLeafs.

            Our cluster works in Active/Standby mode, now Active node(that responsible for routing adjacencies) established EBGP routing with  both vPC members. It looks fine, but I don`t know is it good way or not..

             

             

            P.S.

            It slightly offtopic, but..


            I`ve faced with some weird problem. We use in Leaf roles 5600 series switches(vPC pair, dualhomed servers).

            If I try to simulate failower scenario and push down uplinks on first vPC Leaf, then traffic from server destined to the rest of network droped.(now traffic have to travel accross peer-link)

            If I understand coorect to address this ussue there is feature nve peer-link-vlan, but it doesn`t help((


            There is sh mac address-table from second vPC node(that have uplinks UP, member ports DOWN)


            nx5-4_DC2# sh mac address-table
            Legend:
                    * - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC
                    age - seconds since last seen,+ - primary entry using vPC Peer-Link
              VLAN    MAC Address      Type      age    Secure NTFY  Ports/SWID.SSID.LID
            ---------+-----------------+--------+---------+------+----+------------------
            * 1999    00de.fb4f.fa3c    dynamic  0          F    F  Po1   (Special SVI for peer-link "nve peer-link-vlan 1999")
            * 999     00de.fb4f.fa3c    static    0         F    F  nve1/10.1.1.53
            * 22      0000.0000.1050    static    0         F    F  sup-eth2
            * 22      0000.0000.2222    dynamic  0          F    F  nve1/10.10.55.56
            * 11      0000.0000.1050    static    0         F    F  sup-eth2
            + 11      0000.0000.1111    dynamic  0          F    F  nve1/10.1.1.53   (Host behind vPC peer, through peer-link)
            * 11      00de.fb4f.fa3c    static    0         F    F  nve1/10.1.1.53
            
            
            
            
            
            


            There is sh mac address-table from first vPC node(that have uplinks DOWN, member ports UP)


            nx5-3_DC2# sh mac address-table
            Legend:
                    * - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC
                    age - seconds since last seen,+ - primary entry using vPC Peer-Link
               VLAN     MAC Address      Type      age     Secure NTFY   Ports/SWID.SSID.LID
            ---------+-----------------+--------+---------+------+----+------------------
            * 1999     00de.fb50.4801    dynamic   0          F    F  Po1   (Special SVI for peer-link "nve peer-link-vlan 1999")
            * 999      00de.fb50.4801    static    0          F    F  nve1/10.1.1.54
            * 22       0000.0000.1050    static    0          F    F  sup-eth2
            * 11       0000.0000.1050    static    0          F    F  sup-eth2
            * 11       0000.0000.1111    dynamic   0          F    F  Po222  (Host behind PO 222)
            * 11       00de.fb50.4801    static    0          F    F  nve1/10.1.1.54
            
            
            
            
            

             

            There is sample config from first node(on second it`s similar)

             

            vpc domain 151
              peer-switch
              peer-keepalive destination 10.99.99.54 source 10.99.99.53
              delay restore 150
              peer-gateway
              ip arp synchronize
            
            vpc nve peer-link-vlan 1999
            
            interface Vlan11
              description ## PC1 ##
              no shutdown
              vrf member INSIDE_NET
              no ip redirects
              ip address 11.0.0.2/24 tag 999
              no ipv6 redirects
              fabric forwarding mode anycast-gateway
            
            
            interface Vlan999
              description ## L3-VNI ##
              no shutdown
              mtu 9216
              vrf member INSIDE_NET
              no ip redirects
              ip forward
              no ipv6 redirects
            
            
            interface Vlan1999
              description ## PeerLink NVE ##
              no shutdown
              mtu 9216
              no ip redirects
              ip address 10.10.53.0/31
              no ipv6 redirects
              ip ospf cost 100
              ip ospf network point-to-point
              ip router ospf 1 area 0.0.0.0
              ip pim sparse-mode
            
            interface port-channel222
              description ## to PC1 ##
              switchport mode trunk
              switchport trunk allowed vlan 11
              speed 1000
            
            interface nve1
              no shutdown
              source-interface loopback1
              host-reachability protocol bgp
              member vni 11111
                suppress-arp
                mcast-group 224.11.11.11
              member vni 99999 associate-vrf
            
            interface loopback0
              description ## For IGP/BGP ##
              ip address 10.10.0.53/32
              ip router ospf 1 area 0.0.0.0
              ip pim sparse-mode
            
            
            interface loopback1
              description ## For NVE ##
              ip address 10.1.1.53/32
              ip address 10.10.53.54/32 secondary
              ip router ospf 1 area 0.0.0.0
              ip pim sparse-mode
            
            evpn
              vni 11111 l2
                rd auto
                route-target import auto
                route-target export auto
            
            
            
            

             

            there is example of ping(before down uplinks and after)

             

            64 bytes from 11.0.0.11: icmp_seq=59 ttl=126 time=1.053 ms
            64 bytes from 11.0.0.11: icmp_seq=60 ttl=126 time=1.036 ms
            64 bytes from 11.0.0.11: icmp_seq=61 ttl=126 time=1.028 ms
            64 bytes from 11.0.0.11: icmp_seq=62 ttl=126 time=1.075 ms
            64 bytes from 11.0.0.11: icmp_seq=63 ttl=126 time=1.036 ms
            64 bytes from 11.0.0.11: icmp_seq=64 ttl=126 time=0.919 ms
            64 bytes from 11.0.0.11: icmp_seq=65 ttl=126 time=1.004 ms
            64 bytes from 11.0.0.11: icmp_seq=66 ttl=126 time=1.051 ms
            64 bytes from 11.0.0.11: icmp_seq=67 ttl=126 time=1.112 ms
            64 bytes from 11.0.0.11: icmp_seq=68 ttl=126 time=1.405 ms (down uplinks on Second node)
            Request 69 timed out
            Request 70 timed out
            Request 71 timed out
            Request 72 timed out
            Request 73 timed out
            Request 74 timed out
            Request 75 timed out
            Request 76 timed out
            Request 77 timed out
            64 bytes from 11.0.0.11: icmp_seq=78 ttl=126 time=1.178 ms   (Sometimes ping works, I don`t understand whats going on)
            Request 79 timed out
            Request 80 timed out
            Request 81 timed out
            Request 82 timed out
            64 bytes from 11.0.0.11: icmp_seq=322 ttl=126 time=1.032 ms
            Request 84 timed out
            Request 85 timed out
            Request 86 timed out
            Request 87 timed out
            Request 88 timed out
            Request 89 timed out
            64 bytes from 11.0.0.11: icmp_seq=322 ttl=126 time=1.032 ms
            Request 91 timed out
            64 bytes from 11.0.0.11: icmp_seq=300 ttl=126 time=1.05 ms
            Request 93 timed out
            Request 94 timed out
            Request 95 timed out
            Request 96 timed out
            .......
            
            
            

             

            On the Second node in sh bgp l2vpn evpn type-2 route flap in this moment..it`s strange..

            Route Distinguisher: 10.10.0.54:32778    (L2VNI 11111)
            *>l[2]:[0]:[0]:[48]:[0000.0000.1111]:[0]:[0.0.0.0]/216
                                  10.10.53.54                       100      32768 i
            x l[2]:[0]:[0]:[48]:[0000.0000.1111]:[32]:[11.0.0.11]/248          
                                  10.10.53.54                       100      32768 i
            
            Route Distinguisher: 10.10.0.54:32778    (L2VNI 11111)
            *>l[2]:[0]:[0]:[48]:[0000.0000.1111]:[0]:[0.0.0.0]/216
                                  10.10.53.54                       100      32768 i
            *>l[2]:[0]:[0]:[48]:[0000.0000.1111]:[32]:[11.0.0.11]/272          (after 2-5 seconds state change to *)
                                  10.10.53.54                       100      32768 i
            
            

             

            And sh ip arp vrf INSIDE_NET

             

            nx5-4_DC2(config-if)# sh ip arp  vrf iNSIDE_NET 
            
            Flags: * - Adjacencies learnt on non-active FHRP router
                   + - Adjacencies synced via CFSoE
                   # - Adjacencies Throttled for Glean
                   D - Static Adjacencies attached to down interface
            
            IP ARP Table for context INSIDE_NET
            Total number of entries: 1
            Address         Age       MAC Address     Interface
            11.0.0.11       00:00:01  INCOMPLETE      Vlan11    
                 
            nx5-4_DC2(config-if)# sh ip arp  vrf iNSIDE_NET 
            
            IP ARP Table for context INSIDE_NET
            Total number of entries: 1
            Address         Age       MAC Address     Interface
            11.0.0.11       0.074243  0000.0000.1111  Vlan11           +
            
            nx5-4_DC2(config-if)# sh ip arp  vrf iNSIDE_NET 
            
            IP ARP Table for context INSIDE_NET
            Total number of entries: 1
            Address         Age       MAC Address     Interface
            11.0.0.11       0.741738  0000.0000.1111  Vlan11  
            
            nx5-4_DC2(config-if)# sh ip arp  vrf iNSIDE_NET 
            
            IP ARP Table for context INSIDE_NET
            Total number of entries: 1
            Address         Age       MAC Address     Interface
            
            
            
            • 3. Re: Connect L3 device to VxLAN vPC peers
              Micheline

              Hi Artem--glad to hear you got it working.

               

              So I just try to accomplish best level of redundancy, if I will connect Firewall in straight-throw fashion I lose traffic load-balancing and redundancy instead of dualhomed each FW cluster member to BorderLeafs.

              Our cluster works in Active/Standby mode, now Active node(that responsible for routing adjacencies) established EBGP routing with  both vPC members. It looks fine, but I don`t know is it good way or not..

               

              So I'm not entirely certain I understand you correctly here.... two border leaf switches in a vPC pair dual-connected to an active/standby pair of FW.    And they're all eBGP peers?  Do you need L2 connectivity to the FW cluster?  If you don't I think a cleaner approach would be to not have the vPC to the firewalls, and just have the border leafs L3-peered to them.  The routing protocol will take care of your loop prevention, failover, and also load-balancing via ECMP, and you can attach the firewalls using redundant port-channels to guard against single link failure. 

               

              There's a failure scenario that's covered pretty well in the INE course on vPC that has to do with a L2/L3 hashing mismatch when vPC peers also participate in routing protocols as individual switches.  It happens when the routing protocol hashes one way but the L2 hashes the other way and the traffic ends up crossing the peer-link and getting black-holed. It manifests as intermittent performance issues. 

               

               

              I`ve faced with some weird problem. We use in Leaf roles 5600 series switches(vPC pair, dualhomed servers).

              If I try to simulate failower scenario and push down uplinks on first vPC Leaf, then traffic from server destined to the rest of network droped.(now traffic have to travel accross peer-link)

               

              You mention that you're failing uplinks... are we talking about the server's uplinks to the vPC pair or the VTEP's uplinks to the spine? 

               

              If we're talking about server uplinks, then the behavior is weird.  What should be happening is that the server becomes an orphan and its traffic goes across the peer-link and gets delivered.  Orphan traffic is the one exception to the vPC loop-prevention rule.  So I would expect that traffic would still flow from the server to the network.

               

              If we're talking about the VTEP's uplink to the spine... well in a fully-redundant Clos fabric, the loss of one leaf-spine uplink won't result in leaf isolation, but just for the sake of argument, let's say that your leaf did get isolated.  If you don't plan failover in the event that a vPC peer loses its northbound connections, the traffic will get black-holed... with the exception of any traffic that is for or from orphans on the isolated peer.  Best practices recommend that vPC switches be configured with a track object that monitors the connectivity of northbound uplinks and fails the vPC over in the event that the northbound link fails.  Also recommended is that the peer-link and any northbound links be configured from different line cards and use port-channels.

               

              These comments above aren't VXLAN-related responses, BTW.  They are all vPC.  But if you're losing your northbound links from VTEP to spine, VXLAN BGP EVPN will respond by reporting that the MAC routes previously available on the failed VTEP are no longer available.  Assuming the failed VTEP's vPC peer has routes to the failed VTEP's endpoints, all the traffic will now get routed to the remaining leaf.  The remaining leaf will then use vPC rules of loop prevention to deliver the traffic.  That is, any traffic that goes to member ports on the failed leaf (after having crossed the peer-link) will get dropped unless it is destined for an orphan.

               

              Not sure this helps or not.... just thinking the "problem" through.  Have a great Wednesday.  MM

              • 4. Re: Connect L3 device to VxLAN vPC peers
                Artem

                Hello Micheline)

                 

                So I'm not entirely certain I understand you correctly here.... two border leaf switches in a vPC pair dual-connected to an active/standby pair of FW.    And they're all eBGP peers?  Do you need L2 connectivity to the FW cluster?  If you don't I think a cleaner approach would be to not have the vPC to the firewalls, and just have the border leafs L3-peered to them.  The routing protocol will take care of your loop prevention, failover, and also load-balancing via ECMP, and you can attach the firewalls using redundant port-channels to guard against single link failure.

                I connect vPC pair to CheckPoint FW Cluster, in active/standby mode only active node will establish EBGP/IGP/PIM adjacencies with vPC peers(using cluster ip), when failover occur standby node intercept routing adjacencies(using the same cluster ip)

                If I understand You correct, better way is to configure some point-to-point L3 links from each vPC peer to each FW_Cluster members.

                It looks more resilient.

                  But in my case I just prepare new infrastructure for our new DC, and now, unfortunately, L2-to-Cluster is a current way to establishing routing adjacencies, I think it will be very difficult to change this method right now, but I will try

                There's a failure scenario that's covered pretty well in the INE course on vPC that has to do with a L2/L3 hashing mismatch when vPC peers also participate in routing protocols as individual switches.  It happens when the routing protocol hashes one way but the L2 hashes the other way and the traffic ends up crossing the peer-link and getting black-holed. It manifests as intermittent performance issues.

                Is it case of using layer3-peer routing feature to address this issue? In my case it resolve issue with L3 over vPC..maybe I just misunderstood you..

                 

                You mention that you're failing uplinks... are we talking about the server's uplinks to the vPC pair or the VTEP's uplinks to the spine?

                It is not about server uplinks, I talking about VTEP`s-to-SPINE and FEX-to-ParentLEAF links. There is picture(dotted line represent DOWN links in the "falure" moment)

                000000000000.JPG

                 

                If we're talking about the VTEP's uplink to the spine... well in a fully-redundant Clos fabric, the loss of one leaf-spine uplink won't result in leaf isolation, but just for the sake of argument, let's say that your leaf did get isolated.  If you don't plan failover in the event that a vPC peer loses its northbound connections, the traffic will get black-holed... with the exception of any traffic that is for or from orphans on the isolated peer.  Best practices recommend that vPC switches be configured with a track object that monitors the connectivity of northbound uplinks and fails the vPC over in the event that the northbound link fails.  Also recommended is that the peer-link and any northbound links be configured from different line cards and use port-channels.

                Oh, I was thinking that traffic from PC .11 in this case would be able to traverse the peer-link...am I wrong?

                It strange but sometimes traffic traversed the peer-link when I was testing it.

                ARP behavior in this situation looks weird, it flaps arp-entries every 1-2 seconds, accordingly it flaps bgp routes, and sometimes when arp entries living longer than usual second LEAF have enough time to insert host route and in this moment ping works fine, but 1-2 seconds later flap repeated and bgp entry withdrow etc.


                • 5. Re: Connect L3 device to VxLAN vPC peers
                  Micheline

                  If I understand You correct, better way is to configure some point-to-point L3 links from each vPC peer to each FW_Cluster members.

                  It looks more resilient.

                   

                  Yes.  Don't double up on L3 routing protocol adjacency and vPC member port.  Just do the L3 adjacency.  Dual home each device in your cluster.  Use port-channels for link redundancy.

                   

                   

                  Is it case of using layer3-peer routing feature to address this issue? In my case it resolve issue with L3 over vPC..maybe I just misunderstood you..

                   

                  No, that command just enables L3 routing over vPC.  Cisco Nexus 7000 Series NX-OS Interfaces Command Reference - L Commands [Cisco Nexus 7000 Series Switches] - Cisco. But just because you can do a thing... . There was a bug in earlier code that the hashing could get mismatched between the layers but later code and line cards were amended to address the issue.  Still, the best way to avoid it is not to double up the L3 adjacency and the vPC. 

                   

                  RE: PC .11.  OK, I see what's going on.  Thanks for the picture!  So a couple things:

                   

                  • FEX parent--the dual-homed config for the FEX is a little bit confusing.  I'm going to assume that your FEXes are also vPC members and then from the FEX south to the server is also a vPC member.  This double-decker vPC config is called "enhanced vPC" and while it is supported on F3 line cards for the 7000 series, it is a bit overly complicated since both FEXes have two parent switches.  My preference is for simplicity with the "host FEX" configuration, in which the FEXes are attached singly to their parent with port-channels and the vPC member port is from the FEX to the server.  Kinda like so:

                  host vpc.png

                   

                   

                  • So as configured in your picture, both of your peers are isolated--N5-3 has no northbound connections and N5-4 has no southbound connections.... so basically, this is a really rough day for the peer-link.  I say that because as far as I can tell in this failure scenario, both peers remain active/active and from Server .11's POV, it's sending traffic up both sides depending on whatever load-balancing hash it happens to be using.    So let's walk this through:
                    • Server .11 sends traffic up the A side (left).  It gets to FEX101.  FEX101 has only one northbound route, so the traffic gets to Leaf 5-3.  If the traffic is not east-west to another server on the vPC pair, Leaf 5-3 is going to have to send the traffic across the peer-link for it to get routed northbound.  Since 5-3 stopped responding to VXLAN, its MAC routes are going to get flushed and southbound traffic to .11 will get re-routed to 5-4.
                    • Server.11 sends traffic up the B side (right).  It gets to FEX102.  FEX102 has only one northbound route, again to L5-3.  Same scenario as above. 
                  • Here's where I'm scratching my head:  If Server .11 is fully attached (to the vPC pair) then traffic will all get dropped.  The vPC loop prevention rule will prevent the delivery of traffic that has crossed the peer-link.  But if the server is an orphan, you should get traffic to go through, because now the traffic qualifies as an exception to the rule.  I'm not sure why you're getting intermittent connectivity.   I might suspect that the intermittent connectivity would occur in the event that (1) Server .11 is an orphan and (2) traffic is dropping because of congestion at the peer-link if your pings were of any significant size, but they're not, so I'm not sure.

                   

                  To avoid the isolation issue, what you want to do after you simplify your FEX config is to have the vPC switch failover in the event that it loses its northbound connections to the spines.  So in your example, you'd want 5-3 to demote itself to vPC secondary and 5-4 take over as operational primary.  When this happens, the secondary switch will shut down all of its member ports and SVIs.  Traffic will be forced to the operational primary switch and there won't be any peer-link crossover. 

                   

                  MM

                  • 6. Re: Connect L3 device to VxLAN vPC peers
                    Artem

                    Hello!

                     

                    Yes.  Don't double up on L3 routing protocol adjacency and vPC member port.  Just do the L3 adjacency.  Dual home each device in your cluster.  Use port-channels for link redundancy.

                    I think i don`t completely understand You...how in this p-t-p topology we leverage Cluster IP? Without Cluster IP we have not transparent failower I think.. When Active FW become Standby it broke all adjacencies, and Second node will establish new adjacencies(through full OSPF/BGP state machine)

                    Is this picture what You mean?

                    111111111.JPG

                    Server .11 sends traffic up the A side (left).  It gets to FEX101.  FEX101 has only one northbound route, so the traffic gets to Leaf 5-3.  If the traffic is not east-west to another server on the vPC pair, Leaf 5-3 is going to have to send the traffic across the peer-link for it to get routed northbound.  Since 5-3 stopped responding to VXLAN, its MAC routes are going to get flushed and southbound traffic to .11 will get re-routed to 5-4.

                     

                    Actually 5-3 didn`t stop responding to VxLAN. Maybe it`s issue but 5-3 preserves Underlay OSPF and iBGP adjacencies with Spine through..peer-link

                    Cause in a special SVI we established Underlay OSPF adjacencies, as result 5-3 stiil could reach Spines NVE IP

                     

                    interface Vlan1999  
                    description ## PeerLink NVE ##  
                    no shutdown  
                    mtu 9216  
                    no ip redirects  
                    ip address 10.10.53.0/31  
                    no ipv6 redirects  
                    ip ospf cost 100  
                    ip ospf network point-to-point 
                    ip router ospf 1 area 0.0.0.0  
                    ip pim sparse-mode
                    
                    
                    
                    
                    
                    
                    

                     

                    To avoid the isolation issue, what you want to do after you simplify your FEX config is to have the vPC switch failover in the event that it loses its northbound connections to the spines.  So in your example, you'd want 5-3 to demote itself to vPC secondary and 5-4 take over as operational primary.  When this happens, the secondary switch will shut down all of its member ports and SVIs.  Traffic will be forced to the operational primary switch and there won't be any peer-link crossover.

                     

                    Thanks for your advice, I think this failure situation some kind of fantastic) but first thing to avoid this is to cofigure "conditional" vPC, it`s true, in my environment I will implement this.

                    But in general, I think there is some bug maybe, cause in our case traffic have to go through peer-link destined to Spine.

                    But as a result I see strange arp flapping, BGP type-2 routes flapping, it looks abnormal... I will contact to TAC about It.

                    • 7. Re: Connect L3 device to VxLAN vPC peers
                      Micheline

                      Good Morning Artem:

                       

                      I'm not hugely up on configuring firewalls, but my point about how to connect them to the vPC pair was one or the other.  Either use a L2 trunk/access port-channel to the vPC pairs OR an IP routed connection.  If you use an IP routed connection, the FWs peer with each vPC switch individually and you use the ECMP features of the routing protocol to handle loop-prevention, failover, load-balancing, etc. 

                       

                      If you need the L2 connection, then don't make the FW peer with the vPC switches in the routing protocol. 

                       

                      Actually 5-3 didn`t stop responding to VxLAN. Maybe it`s issue but 5-3 preserves Underlay OSPF and iBGP adjacencies with Spine through..peer-link

                      Cause in a special SVI we established Underlay OSPF adjacencies, as result 5-3 stiil could reach Spines NVE IP

                       

                      That might be causing your flap.  If 5-3 doesn't go down in the VXLAN fabric, then spines will keep sending it traffic.  The traffic that gets to 5-3 will actually be allowed to go through (bc 5-3 has a good link to Server .11).  But if the spine's load-balancing hash directs traffic to 5-4, then the traffic will be required to cross the peer-link and then get black-holed.  That makes much more sense to me. 

                       

                      Here's the config for the failover....  it is some kind of fantastic, indeed!  In the config po1 is the peer-link and po1000 and po1001 are the northbound links to the spines.  It's taken directly from https://www.cisco.com/c/dam/en/us/td/docs/switches/datacenter/sw/design/vpc_design/vpc_best_practices_design_guide.pdf There's also a section in there on attaching service appliances.  It might be helpful.

                       

                      track 1 interface port-ch 1 line-protocol

                      track 2 interface port-ch1000 line-protocol

                      track 3 interface port-ch1001 line-protocol

                       

                      track 10 list boolean OR

                      track 1

                      track 2

                      track 3

                       

                      vpc domain 1

                      track 10

                       

                       

                      Thanks for your advice

                       

                      You're very welcome!  I have really enjoyed this thread, and being able to try and help you with your troubleshooting.  MM

                      • 8. Re: Connect L3 device to VxLAN vPC peers
                        Micheline

                        Hello Artem--I just had a chat with my partner, who's is much more of a security wizard than I am.  He said that connecting firewalls to VXLAN was pretty tricky, and suggested a call to TAC about this as well.

                         

                        Good luck!  MM

                        • 9. Re: Connect L3 device to VxLAN vPC peers
                          Brandon

                          This is a great post with lots of good information. I wanted to correct something. Dynamic routing over vPC is supported now, but is very specific for what versions of code. I have had customers that have needed this for various reasons. Personally, I'd say keep it simple and don't combine L3 and VPC if you can.

                           

                          Reference below:

                          https://www.cisco.com/c/en/us/support/docs/ip/ip-routing/118997-technote-nexus-00.html

                           

                          and in the release notes, search for "dynamic routing over vpc":
                          Cisco Nexus 7000 Series NX-OS Release Notes, Release 7.2 - Cisco

                           

                           

                          • 10. Re: Connect L3 device to VxLAN vPC peers
                            Radek

                            Hello,

                            I have very similar issue. In my topology got 2 vPC pairs of N9k with EVPN-VXLAN configured between them.

                            L2 traffic works as expected but we have new feature requrement to establish eBGP peering from one pair of vPC to L3 device conected to second pair of vPC switches.

                             

                            eBGP.png

                             

                            From L2 perpsective SW03 and FW are layer-2 adjacent. But eBGP peering is not established.

                            Local eBGP peering between FW and SW01 is working fine.

                             

                             

                            SW03# show ip arp 10.208.223.147

                            Address         Age       MAC Address     Interface       Flags

                            10.208.223.147  00:14:30  001c.7f00.00f0  Vlan885         +

                             

                            SW03# sh mac address-table address 001c.7f00.00f0

                              VLAN     MAC Address      Type      age     Secure NTFY Ports

                             

                            C  885     001c.7f00.00f0   dynamic  0         F      F    nve1(10.208.251.41)

                             

                             

                             

                             

                            • 11. Re: Connect L3 device to VxLAN vPC peers
                              Micheline

                              Hello Radek--so I'm a little bit confused by your topo, so let me ask a few questions so I can be clear what the lay of the land is.

                               

                              I see that you have SW1 and SW3 as a vPC pair.  You have the FW connected to the vPC pair as a member, plus an additional link to SW2?  Is that correct?  Who is 10.208.251.41?  Is 10.208.223.147 your FW? 

                               

                              MM

                              • 12. Re: Connect L3 device to VxLAN vPC peers
                                Radek

                                Hello,

                                answering your questions:

                                - SW1 and SW2 are VPC pair

                                - SW3 and SW4 are another VPC pair

                                - FW with IP 10.208.223.147 is connected to SW1/SW2 with redundant links (L2 LACP trunk)

                                - VLAN885 is allowed on trunk to FW

                                - SW3 has configured interface Vlan885 (SVI) with IP 10.208.223.145 (this is not anycast gateway mode SVI)

                                - VLAN885 is configured as VXLAN and transported over EVPN-VXLAN between sites (SW1,SW2, FW is one DC-site, SW3 and SW4 is second DC-site)

                                - 10.208.251.41 is EVPN underlay loopback interface of SW1-SW2 VPC pair (NVE interface has source interface of this loopback secondary IP)

                                 

                                If I configure SVI Vlan885 on SW1 and enable eBGP peering it works fine.

                                • 13. Re: Connect L3 device to VxLAN vPC peers
                                  Micheline

                                  OK, I think I know what's going on.

                                   

                                  When you configure the SVI vof VLAN 885 on the vPC pair (in your topo SW1 or SW2) directly connected to the firewall, you have a connection between the firewall and the vPC pair that does not involve the VXLAN fabric. 

                                   

                                  When you move the SVI to a remote switch (like SW3), the traffic has to go through the VXLAN fabric to get to the remote switch.  If you configure that SVI as a non-anycast gateway, as you have, I think you disable that VLAN's access to the VXLAN fabric on that switch, breaking your connectivity.

                                   

                                  I think you might need to peer your FW with a switch that is *attached to SW3*.   I don't have a VXLAN fabric right now to test it, but if I get access in the next few days, I'll try to mock your scenario up.  Sorry I couldn't be more helpful. 

                                   

                                  MM

                                  • 14. Re: Connect L3 device to VxLAN vPC peers
                                    Radek

                                    What do you mean: "If you configure that SVI as a non-anycast gateway, as you have, I think you disable that VLAN's access to the VXLAN fabric on that switch, breaking your connectivity." ?

                                     

                                    ICMP is working for all nodes in VLAN885.

                                     

                                    Telnet from SW03 to FW on port 179 works:

                                     

                                    SW03#telnet 10.208.223.147 179

                                    Trying 10.208.223.147...

                                    Connected to 10.208.223.147.

                                     

                                    But telnet form FW to SW03 is not working:

                                    FW# telnet 10.208.223.146 179

                                    Trying 10.208.223.146...

                                    telnet: connect to address 10.208.223.146: Connection timed out

                                     

                                    Very weird.

                                    1 2 Previous Next