8 Replies Latest reply: May 13, 2019 10:21 AM by Micheline RSS

    vPC Split Brain Failure Scenario

    vicky_cisco

      Hello All,

       

      Need one more help on the vPC split brain failure scenarios. I have below doubts on this:

       

      a. If we are using 7K-5K-Server design with 5Ks running vPC with servers and layer3 boundary at 7K. Now, we have situation, if split brain occurs, thee 7Ks will start getting same spanning tree bridge IDs from two different switches, so in this scenario, will STP be able to converge ever and agree to a root port?

       

      b. I guess, if spanning tree can converge, we would only have the issue of duplicate MAC addresses in the error logs and MAC entries will keep on flapping in N7Ks based upon the BPDU or data packet.

       

      c. However, if we use layer3 boundary at N5Ks, I guess, the impact will not be much. We might have some unnecessary flooding of packets, specially ARP responses also might be treated as unknown unicast.

       

      However, I still cannot see, what would make a loop to occur in the network in split brain scenarios. Do we have any example demonstrating loops created in the network because of this. Please help.

        • 1. Re: vPC Split Brain Failure Scenario
          Leonardo

          Hello,

          Take a look at the document in the following link. You will fine very good info about vPC Split brain scenario, STP, etc.

           

          https://www.cisco.com/c/dam/en/us/td/docs/switches/datacenter/sw/design/vpc_design/vpc_best_practices_design_guide.pdf

           

          vPC Data-Plane Loop Avoidance

          vPC performs loop avoidance at data-plane layer instead of control plane layer for Spanning Tree Protocol.

          All logics are implemented directly in hardware on vPC peer-link ports, avoiding any dependancy to CPU utilization.

          vPC peer devices always forward traffic locally when possible. vPC peer-link does not typically forward data packets and it is usually considered as a control plane extension in a steady state network (vPC peer-link used to synchronize information between the 2 peer devices as mac address, vPC member state information, IGMP).

          vPC loop avoidance rule states that traffic coming from vPC member port, then crossing vPC peer-link is NOT allowed to egress any vPC member port; however it can egress any other type of port (L3 port, orphan port, …).

          The only exception to this rule occurs when vPC member port goes down. vPC peer devices exchange member-port states and reprogram in hardware the vPC loop avoidance logic for that particular vPC. The peer-link is then used as backup path for optimal resiliency. Traffic need not ingress a vPC member port for this rule to be applicable. The vPC loop avoidance rulle exception is depicted in the figure below:

           

          HTH

           

          Regards

          Leonardo

          • 2. Re: vPC Split Brain Failure Scenario
            vicky_cisco

            Leonardo,

             

            Thanks for the document, I would go through it. However, my query is not on loop prevention for orphan ports, I get that. But just want to understand, that in a split brain scenario, would the STP ever converge, when two switches are advertising same bridge ID and also what scenarios can cause in case we have a split brain kind of condition in our network.

            • 3. Re: vPC Split Brain Failure Scenario
              Micheline

              Dear Vicky_Cisco--so let's talk about your split brain scenario.  So split brain occurs when the peer-link goes down, and for whatever reason (usually bad design) both peers think that they're primary.  The symptoms of this scenario include black holing traffic, congestion, L2 loops, and (believe it or not) nothing.

               

              So before we even get to a split brain, let's understand the mechanisms in place to prevent this situation from happening in the first place.  If the peer-link fails, the vPC peers (presumably) still can reach each other via the keep-alive.  So here's the blow-by-blow of this failure scenario:

              1. The peer-link fails.
              2. The secondary switch pings the primary via the keep-alive.  If the ping comes back, the secondary shuts down all of its member ports and SVIs.  If the ping doesn't come back, the secondary assumes the primary is dead and assumes primary position.

               

              So in the case that you have a peer-link failure, one or the other switch is very likely to be going down in the first place.

               

              If you have a worst-case scenario, in which both peers remain up, then both peers will interact with STP according to normal STP rules.  If you have both switches advertising the same system ID (using the peer-switch command) and you followed vPC best practice and made the vPC peer pair the root then the whole STP domain will flap as root switch appears to be moving back and forth between peers.  If there's more STP domain northbound of the vPC, and the peers aren't the root bridge, you will still get flap as the STP domain keeps trying to converge on an seemingly ever changing topology.  Either way, things are no bueno.

               

              How does a L2 loop occur?  Let's say that we have Host1 connected to SW1 and SW2, which were our vPC pair, but now they are split brained.  SW1 and SW2 are dual connected to SW3 and SW4, which are our L2/L3 gateways.  Host1 needs a MAC address so ARPs for it.  The ARP goes to SW1 and SW2, our broken vPC pair.  Both SW1 and SW2 flood the ARP request out all ports.  So the ARP request goes up to SW3 and SW4, twice.  When SW3 and SW4 receive the ARP, it'll get flooded back down to SW1 and SW2.... round and round we go.

               

              Best practice is to avoid the split brain scenario all together.

              • Configure the peer-link and the keep-alive from entirely independent resources.  Different line cards, different VRFs, different networks.  This is why best practice recommends that the keep-alive be in the management VRF.
              • Create a port-channel for the keep-alive.  Although the keep-alive works just fine on a single link, port-channels are much more hardy.
              • Use a dedicated sub-interface for the keep-alive.  This can be used in conjunction with a port-channel.
              • Create a dedicated SVI for the keep-alive using a non-vPC VLAN with an independent L2 link.

               

              If you have access to INE, Brian McGahan does a great class on vPC (you'll want the one in the CCIE Data Center v2 bundle) and spends quite a bit of time on failure scenarios. 

               

              MM

              • 4. Re: vPC Split Brain Failure Scenario
                Micheline

                Oh oops... you wanted to know what kind of scenarios would result in a split brain scenario, also.  Typically split brain results when you've configured the peer-link and the keep-alive using the same resources.  So for instance, the peer-link and the keep-alive are configured using ports from the same line card.  The line card goes bad and the peer-link and keep-alive both fail.  The secondary cannot ping the primary, and assumes operational primary, even though the primary is still up.

                 

                Mr. McGahan called this "fate sharing"

                 

                MM

                • 5. Re: vPC Split Brain Failure Scenario
                  Mohamed Edrees

                  Hi Micheline

                  I still have some confusion for the part in your example related to the ARP loop between the 4 switches, we have two scenarios for the ARP requests destination:


                  1- The destination for the ARP is inside the same DOMAIN (VLAN/VXLAN) and connected to SW1, or SW2, so any possible loop will end once the destination send ARP reply.

                  2- The destination is in another domain so it should hit the L3 GWs in SW3/SW4, so any possible loop will end once the GW reply.


                  so in both cases the Split Brain won't cause actual loop, we can say the ARP Request only rounded one additional unneeded cycle..

                   

                  So please correct me if i missed any part..

                  • 6. Re: vPC Split Brain Failure Scenario
                    Micheline

                    Hello Mohamed--ARP isn't a cause of vPC split brain.  vPC split brain ONLY occurs when:

                    1. The peer-link goes down, AND
                    2. Secondary switch mistakenly believes that the primary has gone down, so assumes the primary role.

                     

                    The most common cause of split brain is when the peer-link and the peer-keepalive are configured on the same resources (such as a line card) and both fail.

                     

                    Does this make sense?  This thread is two years old now, so if you have a specific topology in mind I would suggest you start a new thread.  MM

                    • 7. Re: vPC Split Brain Failure Scenario
                      Mohamed Edrees

                      Thanks Micheline

                       

                      i know the reason of the Split brain, my concern is about its actual effect and when it will cause network impact, the ARP was just an example of *** traffic type which may cause loop, as explained even this loop won't take an actual effect on the network and it just may cause traffic to take extra round but it won't continue to cause real loop.. briefly i can't catch an real impact from this split brain issue..

                      • 8. Re: vPC Split Brain Failure Scenario
                        Micheline

                        There are a varied number of symptoms of split brain, ranging from no impact to black-holing traffic.  Obviously black-holing traffic is very bad, but the other big impact of a split-brain scenario is that it makes the STP domain very unstable, and THAT is often what causes more problems.  Nothing happens in a vacuum. 

                         

                        MM