BGP path selection

BGP is without question my favorite protocol! It is so powerful and flexible. It is unique, because it is the protocol that connects the biggest network on this planet so far: the Internet. And one of the most important topics in BGP is its path selection algorithm. For consistency, I'll use the term "prefix" to denote a route in BGP and the term "path" to denote a specific way this prefix can be reached by the evaluating router. So, each prefix in BGP has multiple attributes attached to it, and some of them are propagated to the peers (neighbors). These attributes are compared, and based on the result of that comparison, the champion path is chosen. The attributes are compared in a strict order, which we'll talk about in just a moment. As soon as a better attribute value is found, the process stops and declares a path as best and puts it into RIB. If, however, a comparison of one attribute ends up in a draw, then the next attribute is compared, and the process repeats itself until it comes to an attribute that can't end up in a tie. In other words, an attribute that is always unique to each path.

 

At first glance, the order in which these attributes are evaluated seems very straightforward. However, sometimes a newbie (like myself) can get a little confused with this order, because some points seem, well... out of order! This post is just a little story about what confused me in this path selection. So, let's start from listing the tie breakers in the order they appear in any Cisco document:

 

0. Synchronized

1. Weight

2. Local preference

3. Self-originated

4. AS-Path

5. Origin

6. MED

7. External-Internal

8. IGP cost

9. EBGP Peering

10. RID

 

As many of you might know, BGP has a rule called the Synchronization rule. It states that if a prefix was learned by BGP, but the same prefix wasn't learned via an IGP, this prefix won't be installed in a routing table. This is done to make sure that a non-border router that doesn't run BGP also knows how to route traffic to the destination and doesn't start to blackhole traffic.

 

The first attribute is weight. It is a Cisco proprietary attribute that is locally significant. It is an ultimate tool for route control on a Cisco BGP router. It is evaluated before any other attribute, and a higher value wins. So, if you want to make sure that prefixes learned from certain neighbors always have preference, you can just assign a weight to those neighbors and be sure that paths via those neighbors will be preferred over the others.

 

The second attribute is a local preference. This is a standard, well-known transitive attribute. "Transitive" means that it is one of the attributes that travel with the route to the neighbors. And the word "local" in its name means that it only travels within its own AS and doesn't get propagated outside of it. The default value for local preference is 100. But what happens to the prefixes that are being learned via EBGP neighbors? As I said before, local preference attribute is not passed across the AS boundary. So, a prefix received from EBGP neighbor won't have any local preference attached to it, right? Let's check that in the BGP table! I've got a router with two neighbors: one IBGP and one  EBGP neighbor advertising the same prefix to me - 10.1.24.0/24. I check the BGP table, and what do I see?

 

IOU1#sh ip bgp

BGP table version is 14, local router ID is 1.0.0.1

Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,

              r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter,

              x best-external, a additional-path, c RIB-compressed,

Origin codes: i - IGP, e - EGP, ? - incomplete

RPKI validation codes: V valid, I invalid, N Not found

 

 

     Network          Next Hop            Metric LocPrf Weight Path

*>  10.1.24.0/24     10.1.13.3                              0 100 i

* i                  1.0.0.2                  0    100      0 100 i

 

Sure enough, the prefix learned via IBGP neighbor does have local preference attribute, and the one learned via EBGP doesn't. So are we safe to assume it to be 0? If it is, then why do we choose an EBGP path as the best? External-Internal attribute is only checked at step 7. Something doesn't seem right here. Let's dig a bit deeper and see what BGP table has to say specifically about the prefix 10.1.24.0/24. Surprise, surprise!

 

IOU1#sh ip bgp 10.1.24.0

BGP routing table entry for 10.1.24.0/24, version 14

Paths: (2 available, best #1, table default)

  Advertised to update-groups:

     2

  Refresh Epoch 1

  100

    10.1.13.3 from 10.1.13.3 (1.0.0.3)

      Origin IGP, localpref 100, valid, external, best

      rx pathid: 0, tx pathid: 0x0

  Refresh Epoch 1

  100

    1.0.0.2 (metric 11) from 1.0.0.2 (1.0.0.2)

      Origin IGP, metric 0, localpref 100, valid, internal

      rx pathid: 0, tx pathid: 0

 

Both pats DO have a local preference, which is equal to 100! Apparently, the router adds this attribute to send out to its neighbors, and it compares it in the state when the attribute is already attached. However, in the output of "show ip bgp" command, it doesn't display that value for some reason. I guess it is to show you that this prefix was learned without this attribute, which gives you a clue that this is an EBGP learned prefix.

 

Right, let's move on :). The next attribute is self-originate. The next catch awaits you right here You see, some prefixes cheat a little bit and enter a conspiracy with some attributes. They can "bribe" them! So do self-originated prefixes, or simply put, the routes that are advertised first by the local router. They are preferred over prefixes received by neighbors. But these prefixes are very impatient! They want to make sure they don't wait their turn at step 3, and they don't want to risk losing their battle by having a lower local preference. So what they do is they bribe router and receive 32768 into their weight attribute just for originating locally! That really breaks this strict order and provides some shortcuts.

 

OK, so self-originated prefixes are a little naughty, but let's move forward. The next attribute is AS-PATH. This one is easy to understand, as it is the bit where BGP follows a distance vector logic. AS hops are treated just the same as router hops in RIP. The shorter the better. This attribute also implements BGP's second loop prevention mechanism. If it sees a prefix with its own AS number in the AS-Path attribute, it discards it. Nothing naughty or unusual about these guys. One thing to bare in mind about them, however, is to be careful and do not trust them blindly. They can lie! Some BGP routers can override the AS-Path and pretend that the route only originated in its AS, when in fact it have been through many. Sometimes it is done for a good reason, for example when you have multiple sites and you peer with your ISP using BGP, your ISP might just override your AS number so that when a prefix arrives to another site, border router doesn't discard it because it sees its own AS number in the path.

 

The next attribute is origin. This one has no tricks. The origin is the way that a prefix entered the BGP domain. Today there are only two ways to do it: advertise it with a "network" command or redistribute it from another protocol. The code legend still shows a third option, EGP, but that protocol has been dead for longer than I can remember. The preference is on the side of prefixes that have been advertised using "network" command.

 

The next attribute is MED. MED is very similar to an IGP metric. It is just a numeric value, and the lower it is, the more preference the prefix gets.

 

External-Internal. These ones do cheat BIG TIME! The rule states that externally learned prefixes are preferred over internally learned ones. Seems simple. But for me, the greatest confusion came from the fact that the IBGP prefixes are placed into the RIB with a higher Administrative Distance than EBGP ones. Here, the thing that saved my little brain from overheating and exploding like a melon was that I remembered the order in which RIB gets populated. First, prefixes get into their IGP or BGP tables and get processed there by the protocols. Once the protocol determines the best route/prefix, it "offers" it to the RIB with a set metric and AD. And here is when RIB starts to choose which ones it wants to accept. So, when BGP runs through its comparison process, it prefers external prefixes to internal ones and it offers it to the RIB with better AD. And internal prefixes it offers with worse AD. This way the cheating of prefixes goes even outside the boundaries of the BGP protocol, but also gets into RIB. Now EBGP routes beat nearly every other protocol's default AD and IBGP routes are beaten by nearly every other default protocol's default AD. However, you can configure the default AD for both internal (IBGP) and external (EBGP) prefixes to be the same, and then your offering to the RIB will be always the same, but preference will still remain and the prefix offered to the RIB will be the EBGP one. A bit of a head bender, right? It certainly was for me.

 

IGP cost. How come BGP has anything to do with IGP? Well, as you know, BGP neighbors don't have to be directly adjacent to each other. They can sit on opposite ends of an MPLS core that knows nothing at all about BGP. So, to try to optimize the exit point selection for a certain destination, it is logical to compare the IGP cost to the neighbor advertising the prefix. "Hot potato routing" paradigm. The sooner you can throw away the packet in the right direction, the cheaper/faster/better it is!

 

BGP peering. Now we come closer to the end of the list, and the attributes or selection choices seem to become less important. At this point you are pretty safe to assume that two (or more) prefixes that have tied at all previous steps are very, very similar and there is really not much difference between them. However, one word comes to mind, and that is "stability". If you have a good enough path in terms of forwarding costs, but it comes from a neighbor that is not very reliable, then you might want (or pretty certainly want) to chose a path that has stayed up for a longer period of time, showing its stability.

 

Finally, we come to one attribute that will never ever end up in a tie, because it is comparing the router ID, or RID. As by its very nature, RID is unique; you should never see (in a properly configured network) two routers with identical RIDs. This is the ultimate tie breaker, and if it comes to this point, you just know that two paths are just absolutely equal and there is really no difference between using one or another. But because by default BGP only selects one route to be offered to the RIB, this is the last decision algorithm makes before determining the all-time winner. The BGP champion! At this point a prefix is finally crowned to be The Route!

 

Curtain!