Demystifying BGP Session Establishments

Introduction

 

All of the advanced features of BGP begin with establishing a BGP session between two BGP speakers. BGP in turn rides on TCP. As such, it is important to recognize and understand the symbiotic relationship between TCP and BGP and the intricate details of BGP session establishment as it relates to BGP’s use of TCP as a transport protocol.

This blog aims to provide information for certification candidates who need to know the basic principles surrounding BGP’s reliance on TCP as transport as well as the specific configuration commands required to form BGP sessions in the most common scenarios. Configuration steps are detailed. Debugging and verification output has been included to help clarify certain points in the configurations.

 

The configurations and statements in this blog were tested based on Cisco IOS software. Other operating systems such as NX-OS or IOS-XR may behave differently than described in this blog.

Establishing BGP Sessions

 

The process by which a BGP session is established is a bit different from IGPs such as EIGRP and OSPF. The IGPs bootstrap themselves to the topology and dynamically discover neighbors. BGP speakers must be explicitly configured to attempt peering sessions with another BGP speaker. In addition, BGP has to wait for a reliable connection to be established before beginning the rest of its session establishment. The reason for this requirement is rooted with enhancements made to BGP to overcome some issues experienced with its predecessor, EGP.

 

With EGP, as the Internet grew, the size of EGP’s update messages grew to the point where it took multiple IP fragments to transmit the updates. Some of these fragments were lost in transit causing severe routing flaps in the Internet. To avoid this situation with BGP, there needed to be a robust system to ensure reliable delivery of BGP messages.

 

To achieve reliable delivery, the developers could have either built a new transport protocol to do the reliable exchange or utilize an existing transport protocol. Instead of reinventing the wheel, the creators of BGP decided to leverage TCP's already robust reliability mechanisms. This integration with TCP creates two phases of BGP session establishment:

 

  1. TCP connection establishment phase

  2. BGP session establishment phase


BGP uses a finite state machine (FSM) to keep track of session establishment with an intended BGP peer throughout the two phases of session establishment. A finite state machine is a construct by which an object the machine in this case operates in a fixed number of states. Each state carries out a specific purpose and set of operations. The machine exists in only one of these states at any given moment. A change in state is triggered by input events. BGP’s FSM has six states in total.The following three states of BGP’s FSM pertain to the TCP connection establishment phase:

 

  • Idle

  • Connect

  • Active

 

In these states, TCP messages are exchanged to build the TCP connection required for reliable delivery of BGP messages. After the TCP connection establishment phase, BGP enters the following three states of the BGP FSM, which pertain to the BGP session establishment phase:

 

  • Opensent

  • Openconfirm

  • Established

 

In these states, BGP exchanges messages relating to the BGP session. The OPENSENT and OPENCONFIRM states correspond to the exchange of BGP session attributes between the BGP speakers. The ESTABLISHED state indicates the peer is in a stable state and can accept BGP routing updates.

 

Combined, these six states make up the entire BGP FSM. BGP maintains a separate FSM for each intended peer. A peer transitions between these states based upon the receipt of input events. For example, if a TCP connection is successfully established in the CONNECT or ACTIVE states, the BGP speaker sends an OPEN message and enters the OPENSENT state. However, an error event in any state could cause the peer to transition to the IDLE state.

 

The next section details the TCP connection establishment phase, a prerequisite to establishing BGP sessions.


TCP Connection Establishment Phase

 

A successful TCP connection is required before a BGP session can be negotiated between two peers. TCP provides the reliable transport between the two peers over which BGP-related messages can be exchanged. If the TCP connection is broken, the BGP session is also broken. It is also important to understand that not all successful TCP connections lead to an established BGP session.

 

The BGP session establishment phase operates as an independent stage within a TCP connection leading to the fact that BGP “rides” on top of TCP. As such, it is possible for two peers to form a TCP connection but disagree on BGP parameters, resulting in a failed peering attempt. The BGP FSM oscillates between IDLE, ACTIVE, and CONNECT states while establishing the TCP connection.

 

TCP is a connection-oriented protocol. This means TCP first establishes a connection between two speakers through which it ensures ordered, reliable delivery of information. To create this connection, TCP uses the concept of servers and clients.

 

  • Clients connect to servers making them the connecting side

  • Servers listen for incoming connections from prospective clients

 

TCP uses port numbers to identify services and applications hosted by a server. There are a few well-known TCP port numbers such as 80 for HTTP traffic. These are the port numbers to which TCP clients initiate connections in order to access a specific service from a TCP server. The TCP clients will source their messages from a randomly generated TCP port number.

 

With TCP connections, there is always a passive side awaiting connections and an active side making connection attempts. This leads to two ways that TCP connections can be established or opened:

 

  • Passive Open

  • Active Open

 

A passive open is performed by a TCP server that is configured to accept connection attempts on a specific TCP Port from a TCP client. For example, a WebServer is configured to accept connections on its TCP port 80.This is also referred to as the WebServer “listening” on TCP port 80.


An active open is performed when a TCP client initiates a connection attempt to a specific port on a TCP server. For example, if Client A wants to connect to the WebServer, it can initiate a connection attempt to the WebServer’s TCP Port 80.


Clients and servers exchange TCP control messages with each other to setup and manage a TCP connection. Control bits in the TCP header of the TCP/IP packet indicate the type and purpose of the control messages being sent. For the purpose of our discussion, the SYN and ACK bits, as shown in the wireshark capture below in the TCP header of the TCP/IP packet, play a key role in the basic TCP connection setup process:


Transmission Control Protocol, Src Port: 179, Dst Port: 59580, Seq: 0, Ack: 1, Len: 0

[--omitted--]

  Flags: 0x012 (SYN, ACK)

      000. .... .... = Reserved: Not set

      ...0 .... .... = Nonce: Not set

      .... 0... .... = Congestion Window Reduced (CWR): Not set

      .... .0.. .... = ECN-Echo: Not set

      .... ..0. .... = Urgent: Not set

      .... ...1 .... = Acknowledgment: Set

      .... .... 0... = Push: Not set

      .... .... .0.. = Reset: Not set

      .... .... ..1. = Syn: Set

      .... .... ...0 = Fin: Not set

      [TCP Flags: ·······A··S·]


The SYN bit indicates a connection attempt is being made. It is used to synchronize the TCP sequence numbers used for reliable communication. The ACK bit indicates the acknowledgement of a received TCP message. The requirement of message acknowledgement is the cornerstone of TCP’s reliability mechanism.

 

The general process used to establish a TCP connection includes the exchange of three control messages as follows:

 

  1. Client initiates an active open by sending a TCP/IP packet that has the SYN bit set in the TCP header. This is a SYN message.

  2. Server responds with its own SYN message (SYN bit set in the TCP header) thus performing a passive open. The server also acknowledges the client's SYN segment by setting the ACK bit in the same control message. Since both the SYN and ACK bit are set in the same message, this message is referred to as the SYN-ACK message.

  3. Client responds with a TCP/IP packet that has the ACK bit set in the TCP header to acknowledge it received the Server’s SYN segment.

1.png

The control messages or segments exchanged above are collectively referred to as the TCP three-way handshake. Upon completion of the three-way handshake, the TCP connection is established and the two devices can begin sending application data.

 

In the context of BGP, the three-way handshake is performed as follows:

 

  1. BGP speakers register the BGP process on TCP port 179 and begin listening for connection attempts from configured clients.

  2. One speaker, acting as the TCP client, performs an active open by sending a SYN packet destined to the remote speaker’s TCP port 179 and sourced from a random port number.

  3. The remote speaker, acting as a TCP server, performs a passive open by accepting the SYN packet from the TCP client on TCP port 179 and responds with its own SYN-ACK packet.

  4. The client speaker responds with an ACK packet acknowledging it received the server’s SYN packet.


NOTE: This client/server model has no effect on BGP operation other than who connects to port 179 and who sources from port 179. Either side of the BGP session can become the client or server. However, for certain designs, designating the TCP server and client roles to particular devices might be desired. A good example of such a client/server interaction with BGP is in a hub-spoke topology like DMVPN where the hub is configured to be a route-reflector and the spokes are the route-reflector clients. The BGP dynamic neighbor feature can be used to ensure that the hub listens and accepts connections from a range of potential IP addresses, thus making it a TCP server that passively awaits for the spokes to perform a TCP active open to initiate a TCP session.

 

Upon completion of the three-way handshake, a TCP connection is established between the BGP speakers. They can now begin exchanging session attributes as TCP payload, leading to the next half of the FSM for BGP session establishment.


BGP Session Establishment Phase

 

The BGP session establishment phase consists of exchanging BGP control packets. These packets are OPEN, KEEPALIVE, NOTIFICATION, and UPDATE messages. These messages are sent and received in the final three states of the BGP FSM:

 

  • Opensent

  • Openconfirm

  • Established

 

In the OPENSENT state, the BGP speaker has sent an OPEN message to its peer and is waiting for the Peer to send its own OPEN message. Once this occurs, the state transitions to the OPENCONFIRM state where the BGP speaker compares the attributes in the peer’s OPEN message to that of its own. If the terms are acceptable, the BGP speaker responds with a KEEPALIVE packet, and the state progresses to ESTABLISHED. After that, UPDATE messages are sent to exchange routing information, and KEEPALIVE messages are sent to verify the peer is still functioning.

 

NOTIFICATION messages are sent in all three states to indicate error conditions. For example, if the OPEN message contains an unacceptable parameter, a NOTIFICATION message is sent with the exact nature of the error, and in most cases the state transitions back to IDLE.

 

For BGP session establishment, the OPEN message is the most important and is expounded upon in this section. As mentioned, in the OPENCONFIRM state each BGP speaker sends an OPEN message that contains its local attributes for building the BGP relationship. There are six main attributes contained in the OPEN message:


  • The version of BGP supported by the router

  • The router’s Autonomous System Number

  • The router’s hold timer

  • The router’s BGP identifier

  • The optional parameters the router supports


The version field identifies the version of BGP that  the BGP speaker runs. This field is for backwards compatibility with previous BGP versions and is typically set to 4 to represent the current BGP version number.

 

BGP speakers also verify that the AS number received in the OPEN message is valid. That is, it contains a valid value and is the value the speaker expects the peer to have. For example, if the BGP speaker thinks the peer is in AS 100, but the OPEN message received from the peer indicates an AS number of 50, there is a mismatch. The BGP speaker will send a NOTIFICATION message and the peering attempt fails.

 

BGP speakers must also agree on the Hold Timer that is used to determine whether a peer has gone offline. BGP speakers exchange their configured Hold Time values in OPEN messages. The lower of the two values is used as the Hold Time for the BGP session. A value of 2 or less is not considered a valid Hold Time value. Also, any individual BGP implementation can reject a Hold Time value that it considers unacceptable.


The next parameter is the BGP identifier. This is synonymous with the RID in the IGP world. It is a 4-octet number that identifies the BGP speaker. By default, for Cisco IOS, it is set to the highest IP address configured on a loopback interface. If there are no loopback interfaces, then it is set to the highest IP address of any up interface configured on the router. The BGP identifier can also be set using the bgp router-id command in BGP configuration mode and cannot be the same between two BGP peers.


The final attribute exchanged in the OPEN message is the optional parameters attribute. The optional parameters contained in the OPEN message is one of the most interesting and powerful features of BGP. It allows a BGP speaker to signal what features it supports. The most notable optional parameter is the capabilities optional parameter.


The capabilities optional parameter is used to advertise what optional features the BGP speaker supports, most notably the multiprotocol extensions for BGP. Originally, BGP was intended to transport routing information for IPv4 only. The multiprotocol capabilities extension allows BGP speakers to carry information regarding different network-layer protocols. With the multiprotocol extensions, routing information carried by BGP is abstracted as network layer reachability information or NLRI.


For example, the below wireshark capture of an OPEN message shows the sending BGP speaker negotiating support for IPv4, VPNv4, and IPv6 NLRIs. Each feature is advertised in a separate multiprotocol extension optional capability parameter statement.


Border Gateway Protocol - OPEN Message

    Marker: ffffffffffffffffffffffffffffffff

    Length: 73

  Type: OPEN Message (1)

    Version: 4

    My AS: 100

    Hold Time: 180

    BGP Identifier: 12.1.1.1

    Optional Parameters Length: 44

    Optional Parameters

        Optional Parameter: Capability

            Parameter Type: Capability (2)

            Parameter Length: 6

          Capability: Multiprotocol extensions capability

                Type: Multiprotocol extensions capability (1)

                Length: 4

                AFI: IPv4 (1)

                Reserved: 00

                SAFI: Labeled VPN Unicast (128)

        Optional Parameter: Capability

            Parameter Type: Capability (2)

            Parameter Length: 6

          Capability: Multiprotocol extensions capability

                Type: Multiprotocol extensions capability (1)

                Length: 4

              AFI: IPv6 (2)

                Reserved: 00

              SAFI: Unicast (1)

  Optional Parameter: Capability

            Parameter Type: Capability (2)

            Parameter Length: 6

            Capability: Multiprotocol extensions capability

                Type: Multiprotocol extensions capability (1)

                Length: 4

              AFI: IPv4 (1)

                Reserved: 00

              SAFI: Unicast (1)

 

The optional parameters are encoded in the OPEN message using Type, Length, Value (TLV) records. These TLVs are evaluated by both BGP speakers. In order to successfully exchange these different NLRs, both peers must be configured to support the capabilities.

 

After reviewing the optional parameters, the BGP speakers send a KEEPALIVE message signaling that the OPEN message received from the remote speaker was accepted. The state machine then transitions to the ESTABLISHED state. From here, KEEPALIVE messages are exchanged to prevent the Hold Timer from expiring, and routing information is exchanged between the two peers using UPDATE messages.

 

The previous sections detailed the high-level view of BGP session establishment. The next section reviews the different peering session options that exist within BGP.

 

BGP Peer Types

 

IGPs such as OSPF and EIGRP all carry a requirement that neighbors must be directly connected in order to form neighbor adjacencies. BGP, due to its use of the TCP as transport, carries no such limitation. Unlike IGP control traffic, BGP control traffic (KEEPALIVE, OPEN, UPDATE, NOTIFICATION and WITHDRAW messages) can be routed across multiple subnets. This all means BGP supports both single-hop and multi-hop, point-to-point peering sessions.


In addition to single and multihop peering sessions, BGP makes a distinction between internal peers and external peers, resulting in two types of peering sessions, eBGP and iBGP.


The AS number (configured using the router bgp ASN command in global configuration mode in Cisco IOS) sent in the OPEN messages that are exchanged between neighbors during the peer establishment stage determines the type of peering session established between the BGP speakers:


  • If the AS numbers differ between two BGP speakers, it is called an external or eBGP session.

  • If the AS numbers match between two BGP speakers, it is called an internal or iBGP session.

 

BGP behaves differently depending on the type of session, each having its own requirements when establishing the TCP connection and in exchanging routing information. The next sections detail single-hop and multihop, eBGP, and iBGP session characteristics and behaviors.


Single-hop vs Multihop Peers

 

Two BGP speakers can share a common physical or logical link, creating a physical connection, or be separated by multiple router links, creating a logical connection. The only requirement is that there is IP reachability between the peering addresses to establish a TCP session between the two BGP speakers.

 

This paradigm creates unique situations regarding how BGP speakers are connected in the network. A BGP speaker’s intended peer can be located a single hop or multiple hops away.

2.png

Single-hop sessions are created when two BGP speakers are located a single physical or logical router hop away. The peers can be configured to use the interface IP address assigned on their shared link to form the TCP session, or they can be configured to use an IP address from another interface configured on the routers, such as a loopback interface. If the BGP speakers choose to use an IP address other than the one assigned to their physical links, each BGP speaker needs a valid route to its intended peer’s peering address.

 

NOTE: A shared link, in context of this BGP discussion, is any connected routing entry that is designated as "C" in the routing table. For eg. a tunnel interface or a directly connected physical interface through which each peer can communicate.

3.png

A Multihop session is created when two BGP peers are separated by multiple router hops as illustrated in the topology below. As with the single-hop session, peers can be configured to use an IP address configured on any one of its interfaces to form the peering relationship. Each peer needs to have a valid route to the remote peer's peering address. In addition, all routers in between should also be able to route traffic between the two peers. In other words, a ping from peer A’s peering address should be able to reach peer B’s peering address.

4.png 

Special consideration must be made when establishing single-hop and multi-hop sessions depending on the type of BGP session (eBGP/iBGP) and the desired peering addresses. The next sections detail the configuration steps required to establish both single and multi-hop eBGP and iBGP sessions.


eBGP Sessions

 

eBGP sessions are designed for interaction between different Autonomous Systems. eBGP sessions are typically seen at the edge of the network between a customer and a service provider. The service provider exchanges Internet routes with the customer, and the customer exchanges its public IP address space to the service provider. The customer edge routers use the service provider routers as the next-hop for internet prefixes the service provider advertises to them.

 

NOTE: In MPLS L3VPN environments, the customer also exchanges private IP addresses with the service provider. This is to provide connectivity to the customer’s remote sites that are connected by the service provider.


BGP speakers represent the local AS as a single router hop to all of their eBGP peers. They hide the topology details of their local AS by setting the next-hop in the BGP updates to the interface IP address used to form the BGP session. This is illustrated in the topology below.

5.png

R5 in AS 100 owns the prefix 5.5.5.5/32. The edge router in AS 100 advertises this prefix to AS 200 via BGP, setting itself as the next hop. By doing so, it hides the internal topological information of AS 100 from the external peer in AS 200. From the edge router in AS 200’s perspective, AS 100 appears as a single router hop away.

 

NOTE: On a multi access segment with multiple BGP peers, eBGP will advertise routing information received from other eBGP peers on the multi access segment with the next hop set to the IP address of the eBGP peer that advertised the information. This behavior is called BGP third-party next hop.

 

BGP speakers require their eBGP neighbors to be directly connected. To ensure this, BGP speakers operate with the following rules whenever establishing an eBGP peering session with a BGP speaker:


  1. IOS routers perform a directly connected check on the peering address to verify the remote peer is reachable by a route that is “connected” in the routing table

  2. eBGP peers send control messages with a TTL value of 1.


The first rule deserves some extra explanation with aid of the following example. R1’s interface ethernet0/0 has been configured with an IP address of 12.1.1.1 along with a static route for 2.2.2.2 that points to an exit interface.


R1:

interface Ethernet0/0

ip address 12.1.1.1 255.255.255.0

!

ip route 2.2.2.2 255.255.255.255 Ethernet0/0

 

The routing table on R1 shows the result of the above configuration.


R1#show ip route

        2.0.0.0/32 is subnetted, 1 subnets

S      2.2.2.2 is directly connected, Ethernet0/0

    12.0.0.0/8 is variably subnetted, 2 subnets, 2 masks

C      12.1.1.0/24 is directly connected, Ethernet0/0

L      12.1.1.1/32 is directly connected, Ethernet0/0


There are two key points that deserve mentioning in this output:

 

  • The route to 12.1.1.0/24 shows up as “C” type in the routing table. This is a result of the prefix 12.1.1.1 being configured on the interface ethernet0/0.

  • The route to the 2.2.2.2 shows up as “S” type. This is the result of a static route pointing out of an interface being configured on R1.


In order to attempt an eBGP session, by default, IOS routers perform a connected check on the peering address. This check determines if the configured peering address falls within a directly connected network in the routing table by checking if such a route exists as a “C” type in the show ip route output. This connected check in Cisco IOS is internally known as the samecable test. Inherently, all routes in the routing table derived from configuring the ip address command on an interface, including loopback and tunnel interfaces, would pass the samecable test. Additionally, the routing entry for a peer’s address added by PPP IPCP would also show up as a “C” type route in the routing table and will pass the samecable test. However, static routes fail the samecable test as they are installed as static “S” entries in the routing table.

 

Whether or not an IP address passes this samecable test can be verified with the hidden command show ip cef x.x.x.x samecable in Cisco IOS as shown below.

 

If the output shows an interface for the given IP, the IP address has passed the samecable check.


R1#show ip cef 12.1.1.2 samecable

Prefix 12.1.1.2, connected interface: Ethernet0/0


If the output shows none, the IP address has failed the samecable check.


R1#show ip cef 2.2.2.2 samecable

Prefix 2.2.2.2, connected interface: none


NOTE: Cisco press books and documentations refer to this test as connected or directly connected test. However, throughout the remainder of this blog, this test shall be referred to as how it is labelled internally within the Cisco IOS, the samecable test.

 

These rules mentioned above present themselves as obstacles in certain scenarios that need to be overcome. The next sections explain the following connectivity scenarios along with the configurations required to overwrite these default behaviors. In short, there are three scenarios:

  • Single-hop over directly-connected interfaces

  • Single-hop over indirectly-connected interfaces

  • Multi-hop over indirectly-connected interfaces

 

Single-hop eBGP Session over Directly Connected Interfaces

 

The configuration for establishing an eBGP session over directly connected interfaces is very simple. There are only three requirements for such an implementation:

 

  1. An IP interface towards the intended BGP peer

  2. The peer’s AS number

  3. The peer’s peering address

 

This information makes up the details (source/destination IP addresses) needed to build the TCP connection. They also provide the AS number the BGP speaker expects to receive in the peer’s OPEN message.


To better understand how these requirements are filled in the BGP configuration, refer to the following topology:

6.png

BGP speakers R1 in AS 100 and R2 in AS 200 are directly connected. The IP addresses assigned to their E0/0 interfaces (12.1.1.1 and 12.1.1.2 respectively) are being used for the eBGP peering session between them. The following lists the configuration commands applied on both routers:


R1:

router bgp 100

bgp log-neighbor-changes

neighbor 12.1.1.2 remote-as 200

R2:

router bgp 200

bgp log-neighbor-changes

neighbor 12.1.1.1 remote-as 100


The router bgp command is used to initialize the BGP session on the two routers. It also specifies to which AS the BGP speaker belongs. Looking at R1’s configuration only, the router bgp 100 command specifies that R1 belongs to AS 100.


The neighbor command is used in BGP configuration mode to identify the neighbor to which the BGP speakers will try to create a BGP session. It identifies the remote BGP speaker’s peering address and AS number. R1 is configured to form a BGP session with neighbor 12.1.1.2, which is in AS 200. It compares the AS number in the neighbor 12.1.1.2 remote-as 200 command with its own configured AS number from the router bgp 100 command. Since these two values do not match, R1 knows its session with R2 will be an eBGP session.


The neighbor command also configures the TCP stack on the router to only accept connection attempts from the specified peer address. On R1, the neighbor 12.1.1.2 remote-as 200 command ensures that R1 will accept a connection attempt from 12.1.1.2 in AS 200 only. This can be verified using the show tcp brief all command. The below indicates that R1 is listening on TCP port 179 for a connection from client 12.1.1.2 only:


R1#show tcp brief all

TCB    Local Address            Foreign Address          (state)

F3543858  0.0.0.0.179             12.1.1.2.*                LISTEN

 

NOTE: In the configurations above, R1 and R2’s configuration is mirrored. R1’s neighbor command points to R2’s peering address and AS number while R2’s neighbor command points to R1’s peering address and AS number. Verifying this mirroring between two BGP peer configurations can help avoid some common mistakes, such as an incorrect AS number or peering address configuration.

R1 has determined its peering session with R2 will be an eBGP peering session. As such R1 will first perform the samecable test against R2’s peering address 12.1.1.2 to make sure 12.1.1.2 falls within a route in the routing table that is labelled as a “C” type. R1 will also send TCP  and BGP messages with a TTL value of 1 to R2.

The output of show ip cef 12.1.1.2 samecable command shows “connected interface: Ethernet0/0”. This indicates that the peering address 12.1.1.2 passes the samecable test.


R1#show ip cef 12.1.1.2 samecable

Prefix 12.1.1.2, connected interface: Ethernet0/0

This means R1 can attempt to initiate a peering session with R2. R1 sends TCP and BGP control messages with a TTL value of 1 as indicated in the below capture of a TCP SYN packet:

Once the BGP session is established, the neighbors can exchange routing information using UPDATE messages. This can be verified using the show ip bgp neighbor command on R1:

R1#show ip bgp neighbors

BGP neighbor is 12.1.1.2,  remote AS 200, external link

BGP version 4, remote router ID 2.2.2.2

BGP state = Established, up for 00:08:47

 

Single-hop eBGP sessions between the directly connected interfaces, are the most common type of eBGP sessions requiring no modification of BGP’s default behavior for eBGP sessions. However, there are use-cases in which directly connected BGP speakers do not form eBGP sessions over their directly connected interfaces. This case is detailed in the next section.


Single-hop eBGP Session over Loopback Interfaces

 

The case above illustrated a topology where an eBGP session was established between two peers using their directly connected interfaces. However, there are situations where using an indirectly connected interface such as a loopback interface may be desired for a particular peering session. Consider the topology below illustrating two directly connected links between R1 and R2.

7.png


The eBGP sessions between R1 and R2 can be configured and established in the following two ways, each with its own drawbacks:

 

  1. Configure a single eBGP session over a single link (either E0/0 or E0/1)

    1. The drawback here is a missed opportunity for providing high availability. If the link used for the peering session goes down, the eBGP session fails even though there is a redundant link connecting the two routers leaving this link wasted. All routing information learned from the external peer would be flushed from the BGP table.

  2. Configure two eBGP sessions, one over each link (E0/0 and E0/1)

    1. The drawback in this case is reduced scalability. All routing information will be advertised twice, one advertisement from the eBGP session formed over the E0/0 interface and a second advertisement from the eBGP session formed over the E0/1 interface.

 

So in such cases, where routers have multiple interfaces connecting them, a solution that brings about scalability and high availability in case of link failures is needed. Peering over loopback interfaces provides such scalability and high availability. Loopback interfaces are logical interfaces that exist on a router. They are assigned a logical IP that can be used to represent the entire router.


Loopback interfaces also remain in an up/up state unless the router is physically shut off or the loopback interfaces is manually shut down. BGP sessions can be established between these logical interfaces that represent the router.

 

However, since the loopback interfaces are indirectly connected interfaces and not directly connected, the routing table on the routers needs to be populated with routes to each others loopback interfaces. This information can be fed to the router by manually configuring a static route that directs the routers to each others loopback interface. Optionally, an IGP can be used to provide this reachability.

 

With such an implementation, regardless of which interface is being used to reach the router, as long as the reachability exists between the loopback interfaces, the BGP session stays up. The BGP session is no longer dependant on the state of the physical interfaces interconnecting the routers. Additionally, BGP routing information is only advertised a single time because there is only a single peering session.

 

NOTE: It is more common to use static routes with external neighbors because typically IGPs are not run between external peers with the exception of MPLS L3VPN PE-CE routing.

 

The  BGP configuration from the preceding example has been updated on R1 and R2. The destination of the BGP session on both routers is now modified to each other’s loopback addresses.

 

  • R1’s neighbor statement points to R2’s loopback address, 2.2.2.2

  • R2’s neighbor statement points to R1’s loopback, 1.1.1.1.

  • Static routes have been configured on R1 and R2 to provide reachability to the loopback addresses

R1:

router bgp 100

neighbor 2.2.2.2 remote 200

 

R2:

router bgp 200

neighbor 1.1.1.1 remote 100


It may appear on the surface that the following configuration on R1 and R2 is all that is required to fully establish the BGP session: However, if we configure the session and run debug ip bgp the following is logged onto the CLI:

R1(config)#
BGP: 2.2.2.2 Active open failed - update-source NULL is not available, open active delayed 12288ms (35000ms max, 60% jitter)

At first glance, this error message can be misleading since there is no update source specified in the configuration. This particular message is unique to eBGP sessions. By default, an IOS router uses the outgoing interface used to reach the peering address as the source of the BGP control messages. So in actuality, the error message is hinting to the fact that the router is unable to select an outgoing interface IP address to use as its update source as a result of a failing the samecable test. 

 

R1 performs the samecable test  for R2’s peering address 2.2.2.2, looking for a connected route to 2.2.2.2 in its routing table. However, the results of the show ip cef 2.2.2.2 samecable command reveals there is no connected route in the routing table for the peering address:

 

R1#sh ip cef 2.2.2.2 samecable

Prefix 2.2.2.2, connected interface: none

 

Since the peering address fails the samecable test, R1 determines the neighbor is not a directly connected neighbor. This fact is recorded in the output of the show ip bgp neighbor command:

 

R1#show ip bgp neighbors 2.2.2.2 | include External BGP neighbor

External BGP neighbor not directly connected.

 

R1 is unable to select an appropriate outgoing interface which means it is unable to select a proper update source to use when performing the TCP active open. Thus, the entire active open process fails giving the error message indicated above. The same process would happen on R2 as well.

 

In order to fix the above issue, R1 and R2 need to be told to skip the samecable test and attempt the connection to the peering address anyway. The neighbor x.x.x.x disable-connected-check command is used to do so on both routers.


The configuration on R1 and R2 has been modified to include the neighbor x.x.x.x disable-connected-check command.


R1:

router bgp 100

bgp log-neighbor-changes

neighbor 2.2.2.2 remote-as 200

neighbor 2.2.2.2 disable-connected-check


R2:

router bgp 200

bgp log-neighbor-changes

neighbor 1.1.1.1 remote-as 100

neighbor 1.1.1.1 disable-connected-check

 

With the addition of the neighbor x.x.x.x disable-connected-check command, R1 can now perform an active open towards R2.


However, upon applying this configuration, the BGP session still does not come up. The debug ip bgp command again reveals a new error:


R1#

*Nov 24 14:26:01.825: BGP: 2.2.2.2 active went from Idle to Active

*Nov 24 14:26:01.825: BGP: 2.2.2.2 open active, local address 12.1.1.1

*Nov 24 14:26:01.825: BGP: 2.2.2.2 open failed: Connection refused by remote host

*Nov 24 14:26:01.825: BGP: 2.2.2.2 Active open failed - tcb is not available, open active delayed


The debug shows that R1 is indeed performing a TCP active open towards 2.2.2.2. However, the connection is being refused by R2. The line “active open failed tcb is not available” points to the fact that the TCP session R1 created internally for the connection attempt has been removed. R2, acting as the TCP server, is actually resetting the TCP connection with R1, refusing the BGP peering attempt.


This is caused because R1 is sending its BGP updates with the source of 12.1.1.1 as stated in the “2.2.2.2 open active, local address 12.1.1.1” portion of the debug output. Unless modified explicitly, the IP address of the neighbor session packet will always be set to the IP address assigned to the outgoing interface. This means the TCP SYN packet is being delivered to R2 with a source address of 12.1.1.1 as shown in the packet capture below:

Looking closely at R2’s configuration we notice that R2 is configured to only accept connection attempts from 1.1.1.1. As a result, R2 refuses active open attempts for BGP peering sessions from any other IP address other than 1.1.1.1 by sending a TCP RST back as shown below.


Likewise, R1 is configured to only accept connection attempts from 2.2.2.2. This is a basic security mechanism to guard against malicious attackers who attempt to form a BGP session with an edge router and feed improper routing information.


In such cases where the BGP messages are being sourced from a different address than what is configured on the physical interface, the neighbor x.x.x.x update-source [interface] command can be used to specify the interface that should be used as the source. R1 and R2 need to make sure they use the IP address from their loopback interface as the update source to perform the TCP active open.


The configuration on R1 and R2 have now been modified to include the neighbor x.x.x.x update-source loopback1 command:


R1:

router bgp 100

bgp log-neighbor-changes

neighbor 2.2.2.2 remote-as 200

neighbor 2.2.2.2 disable-connected-check

neighbor 2.2.2.2 update-source Loopback1


R2:

router bgp 200

bgp log-neighbor-changes

neighbor 1.1.1.1 remote-as 100

neighbor 1.1.1.1 disable-connected-check

neighbor 1.1.1.1 update-source Loopback1


With these configuration changes, the BGP session between the two peers comes up and R1 and R2 can begin exchanging routing information with each other.


R1#show ip bgp neighbors

BGP neighbor is 2.2.2.2,  remote AS 200, external link

  BGP version 4, remote router ID 2.2.2.2

  BGP state = Established, up for 00:00:08

 

NOTE: If the neighbor is configured with a proper update source but without the neighbor x.x.x.x disable-connected-check command configured, the router logs a “no route to peer” message indicating there is not an acceptable route in the routing table to pass the samecable test. This message varies slightly from the “update-source NULL” error because the router has a proper update source but does not have a proper route to reach the peering address of its intended peer. To remedy this, check the routing table for a valid route to the peering address and add the neighbor x.x.x.x disable-connected-check command.


Is the Update-source Command Required on Both Peers?

 

In the example above, both R1 and R2 are configured with the neighbor x.x.x.x update-source loopback1 command. However, configuring this command only on one router would still result in a successfully peering session.


All BGP routers act as both TCP servers and TCP clients. Whenever a router responds to a connection attempt (performs a TCP passive open), it is acting as a TCP server. Whenever a router initiates a connection attempt (performs a TCP active open), it is acting as a TCP client. The router performs both tasks simultaneously. 

 

If the router receives a valid connection attempt from a potential client, it will process it and respond appropriately. For example purposes let’s review the following configuration:


R1:

router bgp 100

bgp log-neighbor-changes

neighbor 2.2.2.2 remote-as 200

neighbor 2.2.2.2 disable-connected-check

neighbor 2.2.2.2 update-source Loopback1


R2:

router bgp 200

bgp log-neighbor-changes

neighbor 1.1.1.1 remote-as 100

neighbor 1.1.1.1 disable-connected-check


In the example above, R1 is configured to use its loopback interface as the update-source and R2 is not. The following sequence of events occur between the two routers:

 

  1. R1 initiates a connection attempt to 2.2.2.2 with a source of 1.1.1.1.

  2. R2 initiates a connection attempt to 1.1.1.1 with a source of 12.1.1.2

  3. R1 receives R2’s connection attempt and rejects it because it is configured to accept connections from 2.2.2.2 and not 12.1.1.2.

  4. R2 receives R1’s connection attempt and accepts it because it is configured to peer with 1.1.1.1.

  5. R2 responds with a TCP SYN-ACK packet sourced from 2.2.2.2 and destined to 1.1.1.1.

  6. R1 receives this packet and respond with the TCP ACK

  7. The TCP connection is established. R1 is the TCP client and R2 is the TCP server.

 

Because R1 and R2 act as both clients and servers, they make simultaneous connection attempts to each other as shown in the capture below. The result is, as long as at least one peer initiates the connection with the proper source address, the TCP connection will come up and the session can be established.
In the above, R1 is initiating the session with the correct source address from which R2 expects connection attempts to be made. R2 receives the TCP SYN and, because it owns the address 2.2.2.2, responds to the connection attempt with a TCP SYN-ACK packet sourced from 2.2.2.2, the address R1 is expecting to hear a response from. R2 does this even without the proper update-source configuration.


This is normal IP behavior. Because R2 is receiving the TCP SYN destined to 2.2.2.2, it will also respond to the TCP SYN as 2.2.2.2. The key is, the neighbor x.x.x.x update-source command only affects how the router acts when initiating the connection as the TCP client. It does not affect how the router responds to connection attempts as the TCP server. As long as a connection attempt is received from the expected peering address, the router will allow the connection attempt.


The output of the show bgp neighbor command reinforces the fact that R2 is the TCP server because it’s local port is 179, the BGP server listening port:


R2# sh ip bgp neigh | begin Connection state

Connection state is ESTAB, I/O status: 1, unread input bytes: 0

Connection is ECN Disabled, Mininum incoming TTL 0, Outgoing TTL 1

Local host: 2.2.2.2, Local port: 179

Foreign host: 1.1.1.1, Foreign port: 59535

 

The same command on R1, the foreign host and port points to TCP port 179 on R2, meaning R1 is the client sending to port 179 on the TCP server.


R1#sh ip bgp neigh | begin Connection state

Connection state is ESTAB, I/O status: 1, unread input bytes: 0

Connection is ECN Disabled, Mininum incoming TTL 0, Outgoing TTL 1

Local host: 1.1.1.1, Local port: 59535

Foreign host: 2.2.2.2, Foreign port: 179

 

Even though it is not required to have the update-source configured on both peers, it is a good practice to do so for configuration consistency and ease of troubleshooting.As a recap the following outlines the requirements for establishing single-hop eBGP sessions over loopback interfaces:

 

  1. The neighbor x.x.x.x remote-as [as number] command should point to IP address of the remote peer’s loopback interface and AS number

  2. The two routers must be configured to bypass the samecable test for the peering address in the routing table with the neighbor x.x.x.x disable-connected-check command

  3. The neighbor x.x.x.x update-source [interface] command needs to be used to explicitly identify the loopback interface that should be used to source the TCP and BGP control messages

    1. This command is only required on one side of the session, however, for configuration consistency, it is recommended to apply to both sides of the session.


Once all of these configurations are in place, the eBGP peering is established between R1 and R2:


R1#show ip bgp neighbors

  BGP neighbor is 2.2.2.2,  remote AS 200, external link

  BGP version 4, remote router ID 2.2.2.2

  BGP state = Established, up for 00:01:35

 

The concepts explained in this section have similar applications when establishing a multi-hop eBGP session. The details behind that configuration are explained in the next section.

 

Multihop eBGP Session

 

Multihop eBGP peering sessions are eBGP peering sessions between two routers that are separated by a device that introduces an additional router hop such as a routed firewall or intermediary router. Consider the reference topology below:

8.png


R1 and R2 are intended to become eBGP peers but they are separated by an additional router, R3. R3 is not going to run BGP but can be used to route the BGP control messages between R1 and R2. The IP addresses assigned to the E0/0 interfaces on R1 and R2 can be used to form an eBGP session or the session can be formed using a logical interface such as a Loopback interface.

Irrespective of the interface chosen to form the eBGP session between them (Physical interface or Logical), the peer address will not pass the samecable test as the IP addresses do not fall within directly connected networks. In order for the session to come up, R1 and R2 need to skip the samecable test.

For this example, R1 and R2 are going to form an eBGP session over their loopback interfaces meaning the update source needs to be explicitly specified. The following is the configuration for both routers using the same principles from the previous section about forming eBGP sessions over indirect interfaces:

 

The update-source and disable-connected-check commands are applied and all routers possess the appropriate routes in their routing tables for the connection. However, the session between R1 and R2 will not come up. There exists another culprit here preventing R1 and R2 from forming an eBGP session. 

 

Recall, by default, that eBGP packets are sent out with a TTL value of 1. This is to ensure the intended peer is truly only one router hop away. The problem in this scenario is, with R3 as an intermediary routing device in between, R1 and R2 are two IP hops away. With this default, the following happens when R1 attempts a TCP connection to R2:

 

  1. R1 correctly sources the TCP SYN from its loopback interface destined to R2’s peering address 2.2.2.2

  2. The packet arrives at R3.

  3. R3 determines the packet needs to be routed to R2

  4. R3 decrements the TTL value by 1 resulting in a TTL of 0.

  5. R3 realizes it cannot route the packet further, drops it,  and sends back an ICMP “Time to live exceeded in transit” message back to R1.

 

NOTE: Due to the uncertainty of the inner workings of how IOS decrements the TTL value, steps 3 and 4 can be reversed and lead to the same result.

 

The above indicates that R2 will never receive the TCP SYN from R1 to begin with. The packet expires in transit to R3 because of the TTL value of 1. Therefore, it cannot make the extra routing hop required to reach R2. To fix this, whenever a TCP SYN packet destined for 2.2.2.2 ejects out of R1, it should have a minimum TTL value of 2.


R1 and R2 need to be configured to send their BGP control messages to each other with at least a TTL value of 2. This is done with the command neighbor x.x.x.x ebgp-multihop [hop count]. The exact effect of this command varies depending on the hop count value specified:


  • If no TTL value is specified (neighbor x.x.x.x ebgp-multihop)

    • TTL value is set to 255

    • Router forgoes the samecable test

  • If TTL value of 1 is specified (neighbor x.x.x.x ebgp-multihop 1)

    • TTL value remains 1

    • Router performs the samecable test

  • If TTL value of 2 or more is specified (neighbor x.x.x.x ebgp-multihop [2 - 255])

    • TTL value is set to the specified value

    • Router forgoes the samecable test


Notice from the above where the neighbor x.x.x.x ebgp-multihop command also has the added effect of causing the router to forgo the samecable test. This means, when forming multihop eBGP sessions, it is no longer necessary to configure the neighbor x.x.x.x disable-connected-check command, whenever the neighbor x.x.x.x ebgp-multihop command is configured with no TTL value specified or a TTL value of 2 or more.

 

The resulting configuration on R1 and R2, after performing the changes outlined above, is as follows:


R1:

router bgp 100

bgp log-neighbor-changes

neighbor 2.2.2.2 remote-as 200

neighbor 2.2.2.2 update-source Loopback1

neighbor 2.2.2.2 ebgp-multihop 2


R2:

router bgp 200

bgp log-neighbor-changes

neighbor 1.1.1.1 remote-as 100

neighbor 2.2.2.2 update-source Loopback1

neighbor 1.1.1.1 ebgp-multihop 2

 

With the addition of hte neighbor x.x.x.x ebgp-multihop [hop count] command, the eBGP session comes up as verified by this output from the show ip bgp neighbors command:


R1#show ip bgp neighbors

BGP neighbor is 2.2.2.2,  remote AS 200, external link

  BGP version 4, remote router ID 2.2.2.2

  BGP state = Established, up for 00:11:55

 

Tunneling BGP Over Multiple Router Hops

 

The above scenario could also be solved by creating a GRE tunnel between R1 and R2.

9.png

In this situation, eBGP session establishment takes place within the GRE tunnel making it a single-hop eBGP peering session. The configuration implementation is as follows:

 

  1. A tunnel interface is created on R1 and R2, assigned to a common IP subnet 12.1.1.0/24

  2. R1’s interface IP 13.1.1.1 and R2’s interface IP 23.1.1.2 are used as the tunnel endpoints

  3. Static routes are configured to advertise reachability to the tunnel endpoints on R1 and R2.

  4. Static routes are configured to advertise the loopback interface addresses that are being used for the eBGP peering session. The next hop points to the remote tunnel interface IP.

  5. The BGP configuration is modified to use the disable-connected-check command instead of using the ebgp-multihop command.


R1:

router bgp 100

bgp log-neighbor-changes

neighbor 2.2.2.2 remote-as 200

neighbor 2.2.2.2 disable-connected-check

neighbor 2.2.2.2 update-source Loopback1

!

interface Tunnel1

ip address 12.1.1.1 255.255.255.0

tunnel source Ethernet0/0

tunnel destination 23.1.1.2

!

ip route 2.2.2.2 255.255.255.255 12.1.1.2

ip route 23.1.1.0 255.255.255.0 13.1.1.3


R2:

router bgp 200

bgp log-neighbor-changes

neighbor 1.1.1.1 remote-as 100

neighbor 1.1.1.1 disable-connected-check

neighbor 1.1.1.1 update-source Loopback1

!

interface Tunnel1

ip address 12.1.1.2 255.255.255.0

tunnel source Ethernet0/2

tunnel destination 13.1.1.1

!

ip route 1.1.1.1 255.255.255.255 12.1.1.1

ip route 13.1.1.0 255.255.255.0 23.1.1.3

 

With the above configuration, R1 and R2 can form an eBGP session over R3 without modifying the TTL value of the BGP and TCP control packets. This type of connection is seen by BGP as a logical single-hop session over loopback interfaces and not a multihop eBGP session.

 

This is a subtle but important distinction to make. From BGP’s perspective, this is a single-hop session over loopback interfaces, requiring the use of the disable-connected-check command and not the ebgp-multihop command.

 

If we examine the payload of the packets exchanged over the tunnel interface, we see the inner TCP SYN packet used to establish the TCP connection is still sent with a TTL value of 1 encapsulated within a GRE/IP header.


Internet Protocol Version 4, Src: 13.1.1.1, Dst: 23.1.1.2 ! outer tunnel header

Generic Routing Encapsulation (IP)

Internet Protocol Version 4, Src: 1.1.1.1, Dst: 2.2.2.2 ! inner header

  0100 .... = Version: 4

  .... 0101 = Header Length: 20 bytes (5)

  Differentiated Services Field: 0xc0 (DSCP: CS6, ECN: Not-ECT)

  Total Length: 44

  Identification: 0xa807 (43015)

  Flags: 0x02 (Don't Fragment)

  Fragment offset: 0

  Time to live: 1

  --- omitted ---

Transmission Control Protocol, Src Port: 49910, Dst Port: 179, Seq: 0, Len: 0

  Source Port: 49910

  Destination Port: 179


At this point it may be a bit confusing when to use disable-connected-check and ebgp-multihop commands. This is due largely to the fact that both commands can also accomplish the same goal: allowing a BGP speaker to create an eBGP session with a neighbor to which fails the samecable test. The discussion surrounding the distinction between the effects of these two commands merits its own section.


Clarification of disable-connected-check and ebgp-multihop usage

 

The effects of the ebgp-multihop and disable-connected-check commands is so similar with certain configurations that it has led to confusion about the exact purpose of each command. The cause of this confusion lies in the added side-effect of the ebgp-multihop command to automatically cause the router to ignore the samecable test.


The perception is that establishing sessions over a loopback interface and over an additional router hop have the same requirements, namely increasing the default TTL. As such any session over an indirect address is thought to be a multihop session, even if the peer routers are a single hop away. The expected sequence of events is as follows:


  1. Router A sends a TCP SYN with TTL 1 sourced from its loopback to Router B’s loopback

  2. Router B receives the TCP SYN and needs to route it to its own loopback interface

  3. Router B decrements the TTL to 0 when sending to its own loopback interface causing the “TTL time exceeded in transit” event


This logic has led to the recommended solution of always configuring neighbor x.x.x.x ebgp-multihop 2 when peering over loopback interfaces, believing the extra TTL value ensures Router B can “route” the packet to its Loopback interface.


In reality, the reason the session is not established is because Cisco IOS routers will not be able to send a TCP SYN to begin with if the peering address fails the samecable test. This causes the TCP active open to fail and the connection is never even attempted.


When the ebgp-multihop 2 command is configured, the added side effect of disabling samecable test implicitly allows the router to perform the TCP active open as long as there is any valid route in its routing table. The disable-connected-check command has the same effect but this command does not modify the TTL value.


Configuring disable-connected-check alone for single-hop eBGP sessions over loopback interfaces and observing the packet capture proves this functionality. The TCP SYN is still sent with a TTL value of 1 to the remote peering address and the session will come up. If the TTL value needed to be 2 to complete the transaction, the session would not come up and there would be ICMP “TTL time exceeded in transit” messages sent back to the originating router.


In conclusion, with a proper update source, the ebgp-multihop command is only necessary whenever the two peers are actually separated by physical routing devices that introduce an extra routing hop.  If the peers are directly connected and peering over loopback interfaces, the disable-connected-check command on a per neighbor basis is sufficient to establish the session.


NOTE: The actual inner workings of when the TTL is actually decremented in IOS is not certain. However, the point of this section is to emphasize that disable-connected-check is used for single-hop sessions over loopback interfaces, leaving the TTL value as 1. On the other hand, ebgp-multihop is used for multihop sessions where the TTL value needs to be set to account for the additional router hops.


This concludes the different configuration scenarios encountered when trying to establish eBGP peering sessions. The next section details different configuration scenarios encountered when trying to establish iBGP peering sessions.


iBGP Sessions

 

After wading through the special configuration for eBGP sessions, it is refreshing to examine iBGP sessions. Unlike eBGP peers, iBGP peers do not care how many router hops are between each other. This is because iBGP sessions are designed to allow edge routers to exchange prefixes learned from eBGP peers to other edge routers in the same AS.

 

Consider the topology illustrated below:


10.png

AS 100 exchanges prefixes with AS 200 and 300 as a transit AS. R2 and R4 are edge routers that run BGP in AS 100. In order for AS 100 to transit traffic between AS 200 and AS 300, the edge routers need a way to exchange the contents of their BGP databases with each other.

 

This is where iBGP comes into play, to exchange routing information from one AS border router to another AS border router in the same AS. By creating an iBGP session between R2 and R4, the BGP speakers can exchange prefixes with each other as normal. However, the edge routers (R2 and R4) are separated by multiple router hops, specifically R3 in this case. This is the reason why iBGP does not carry the requirement that peers be directly connected. Any route in the routing table that points to the peering address can be used to establish the peering. As such, it is not necessary to configure disable-connected-check or ebgp-multihop commands for iBGP sessions.

 

With R3 in between the BGP speakers R2 and R4, static routes or dynamic routing can be used to provide reachability to whatever BGP peering address is chosen. If R3 does not run BGP and is not configured to peer with R2 or R4. It will pass the BGP update messages to R2 and R4 as normal transit traffic, without examining the BGP prefixes or adding them to its routing table. As a result R3 will not learn the external prefixes and will not be able to properly route transit traffic between AS 200 and AS 300 creating a routing black hole in the transit AS 100.

 

This blackholing of traffic could be solved by redistributing the external prefixes from BGP into an IGP on R2 and R4. Doing so is not considered best practice as the BGP table typically contains more prefixes than IGPs are designed to handle.

 

A better solution is to create a full-mesh of iBGP sessions in AS 100 by configuring R3 to run BGP with R2 and R4 as well.

 

NOTE: Creating a full-mesh of iBGP peers is not the most optimum solution as doing so creates scalability issues as the number of peers grow. There are more scalable ways of solving the above scenario that are out of the scope of this blog.

 

The next two sections detail the two main configuration scenarios for establishing iBGP sessions: directly connected and over loopback interfaces.


iBGP Sessions Over Directly Connected Interfaces

 

iBGP session can be formed over directly connected interfaces in the same manner eBGP session are. The only exception is the AS number used in the neighbor command should match the AS number configured in the router bgp [ASN] command. Matching AS numbers identify that both routers belong to the same AS and intend to create an iBGP peering session.

 

Using the above topology as a base, R2 is configured form an eBGP session with R1 and iBGP sessions with R3 and R4 in AS 100. Connectivity to the peering addresses between the iBGP peers is provided by an IGP:


R2:

router bgp 100

bgp log-neighbor-changes

neighbor 21.1.1.1 remote-as 200

neighbor 23.1.1.3 remote-as 100

neighbor 43.1.1.4 remote-as 100

 

The configuration is straightforward. The key points again are:

  • AS number used in the neighbor statement for the iBGP peers matches the AS number used in the router bgp command.

  • the IP addresses assigned to the physical interfaces are being used for the session.

 

Similar configurations are used on R3 and R4 as well:


R3:
router bgp 100
bgp log-neighbor-changes
neighbor 23.1.1.2 remote-as 100
neighbor 43.1.1.4 remote-as 100

R4:

router bgp 100

bgp log-neighbor-changes

neighbor 45.1.1.5 remote-as 300

neighbor 23.1.1.2 remote-as 100

neighbor 43.1.1.3 remote-as 100


The iBGP sessions come up evidenced by the show ip bgp neighbor output on R2:

R2#sh ip bgp neighbor | include neighbor|Established

BGP neighbor is 23.1.1.3,  remote AS 100, internal link

  BGP state = Established, up for 00:02:45

  BGP table version 3, neighbor version 3/0

BGP neighbor is 43.1.1.4,  remote AS 100, internal link

  BGP state = Established, up for 00:02:45

  BGP table version 3, neighbor version 3/0

--omitted--


Also noteworthy with this topology, R2 and R4 are separated by a router hop, but can peer with each other without adding disable-connected-check or ebgp-multihop commands. This is because unlike eBGP peers, iBGP peers do not perform the "samecable" test for the peering addresses nor do they use a default TTL value of 1 for the iBGP peering session. Instead the TTL is set to 255 by default. This means, R2 and R4 can form an iBGP peering session as long as they have a route to each other’s peering addresses.


In this example, R2’s only route to R4’s peering address 43.1.1.4 points out R2’s Ethernet0/2 interface as shown below.

R2#show ip route 43.1.1.4

Routing entry for 43.1.1.0/24

  Known via "eigrp 100", distance 90, metric 1536000, type internal

  Redistributing via eigrp 100

  Last update from 23.1.1.3 on Ethernet0/2, 00:00:13 ago

  Routing Descriptor Blocks:

  * 23.1.1.3, from 23.1.1.3, 00:00:13 ago, via Ethernet0/2

    Route metric is 1536000, traffic share count is 1

    Total delay is 2000 microseconds, minimum bandwidth is 10000 Kbit

    Reliability 255/255, minimum MTU 1500 bytes

    Loading 1/255, Hops 1


R2 will perform the TCP active open towards R4 using that interface address (23.1.1.2) as the source. R4 has a matching neighbor 23.1.1.2 remote-as 100 command and accepts the attempt. If R2 had multiple paths to reach R4, R2 would need to make sure the update-source is set to the proper interface. This concept is elaborated upon in the next section.

iBGP Session Over Loopback Interfaces

 

The reference topology above has one major flaw. If a link between one of the edge routers and R3 fails, iBGP peering sessions go down, and AS 100 can no longer transit traffic between AS 200 and AS 300. For high availability, a direct link is added between R2 and R4. This leads to a situation where there are multiple paths to reach each iBGP peer. In the eBGP section of the blog, loopback interfaces were used to take advantage of these multiple paths and allow high availability for the peering session. The same concepts apply to iBGP.

 

In fact, It is more common for iBGP peering sessions to be formed using loopback interfaces because the internal AS typically has multiple redundant links between internal routers. The same considerations must be made to ensure each iBGP peer chooses the correct interface address to source the control traffic needed to establish the iBGP peering session. Just like in the eBGP example, this is designated using the neighbor x.x.x.x update-source [interface-id] command.

 

The topology has been updated to include a link between R2 and R4. The BGP configuration on R2, R3 and R4 in AS 100 has been modified to peer using their loopback interfaces to provide high availability:

11.png

Looking at R2’s configuration, for each iBGP peer R2 changes its update-source to its loopback interface:

R2:

router bgp 100

bgp log-neighbor-changes

neighbor 3.3.3.3 remote-as 100         

neighbor 3.3.3.3 update-source Loopback1

neighbor 4.4.4.4 remote-as 100         

neighbor 4.4.4.4 update-source Loopback1

neighbor 21.1.1.1 remote-as 200         


The configurations on R3 and R4 have also been modified in a similar fashion.

R3:

router bgp 100

bgp log-neighbor-changes

neighbor 2.2.2.2 remote-as 100

neighbor 2.2.2.2 update-source Loopback1

neighbor 4.4.4.4 remote-as 100

neighbor 4.4.4.4 update-source Loopback1


R4:

router bgp 100

bgp log-neighbor-changes

neighbor 2.2.2.2 remote-as 100

neighbor 2.2.2.2 update-source Loopback1

neighbor 3.3.3.3 remote-as 100

neighbor 3.3.3.3 update-source Loopback1

neighbor 45.1.1.5 remote-as 300


The result is now the topology is more resilient to link failures. If the link between R2 and R4 fails, the IGP converges to the R2-R3 link to transit traffic and keep the session between R2 and R4 up.

Again, the show ip bgp neighbor output on R2 verifies the sessions are established:

R2#sh ip bgp neighbor | include neighbor|Established

BGP neighbor is 3.3.3.3,  remote AS 100, internal link

  BGP state = Established, up for 00:29:30

  BGP table version 3, neighbor version 3/0

BGP neighbor is 4.4.4.4,  remote AS 100, internal link

  BGP state = Established, up for 00:29:30

  BGP table version 3, neighbor version 3/0


As can be seen, establishing basic iBGP sessions is easier than establishing eBGP sessions thanks to the fact that iBGP neighbors do not have to bother with "samecable" tests or inadequate TTL value settings.

Conclusion

 

This blog walked the reader through the different kinds of peering sessions that can be established between BGP speakers along with how BGP leverages TCP to provide a channel to relay BGP-related messages. It then detailed the specific configuration requirements for eBGP and iBGP sessions across different configuration scenarios. Finally, the blog also explained some misconceptions about the requirements of certain configuration commands.

 

A special thanks goes out to Peter Paluch who took the time to provide his input on the samecable test and revealed the hidden show ip cef x.x.x.x samecable command.

 

The configurations and solutions were engineered to be as simple as possible for demonstration purposes and may not follow exact best-practice standards.