Introduction to Segment Routing
Segment Routing is one of the new kids on the block in the routing area, and perfectly fits both the paradigm of autonomous networking, and the SDN approach with centralized and application-driven path control. In this document, I would like to share an introduction to Segment Routing, its roots, and its applications.
Segment Routing heavily reuses concepts from another two major technologies - Multi Protocol Label Switching (MPLS), and Source Routing. It is therefore advantageous to revisit their most important properties.
MPLS - A Quick Review
This article assumes basic understanding of MPLS inner mechanisms and forwarding process. For an introduction into MPLS in a manageable and not-burdensome way, I invite you to read a blog written earlier: MPLS History and building blocks
As a starter, let's summarize some of the key MPLS characteristics :
- Transport technology able to carry a wide range (if not any kind) of protocols
- Does not have its own datagram format; its shim labels are always inserted between the Layer 2 and Layer 3 headers of existing protocols
- Its transport mechanism is based on operations with locally significant labels and their values
- Label operations are PUSH, SWAP and POP
- Relies mainly on Label Distribution Protocol (LDP) to advertise and distribute label bindings (although other protocols were extended later to support label advertisement)
- Labels are typically bound to destination IP prefixes or virtual circuits - in general, to paths toward a specific endpoint and a specific operation to be performed after unlabeling
- Available label range is <0 - 1,048,575>
- Reserved label range is <0 - 15>
The responsibility for creating and advertising label mappings for network entries in a router’s RIB is in the hands of LDP. More precisely, LDP creates and advertises label bindings for RIB entries learned from all routing information sources except BGP. BGP-learned routes are exempted from LDP operations because BGP has its own mechanism of advertising label mappings along with prefixes. Therefore, the RIB has to be populated so LDP can have something to work on. Typically, link-state protocols like OSPF and IS-IS are used for this purpose, not only to populate the RIB, but also to advertise topology information and build a shared topology database, allowing each node in the network to have a complete and consistent view of the topology. Distance-vector routing protocols such as EIGRP or BGP can of course be used, too, but some advanced MPLS applications actually require the routers to know the detailed network topology, so the link-state IGPs are generally preferred.
Each router creates and advertises label bindings based on its own rules, and because each label value only has a particular meaning to the router that has advertised it (the same label value can have an entirely different meaning to a different router), the label values along the same path towards a particular destination are generally different on each hop.
MPLS has a genuinely unique property: Regardless of what logic has been used to set up the labels between consecutive routers for a given path, the mechanism of forwarding labeled packets does not change. This allows MPLS to bring in diverse mechanisms into the control plane that are responsible for setting up the labels, and achieve very specific path setup and selection without changing the underlying data plane operations. This flexibility paved the way to one of most important - and advanced - MPLS applications: MPLS Traffic Engineering (MPLS-TE). MPLS-TE is an independent set of features and technologies that allows leveraging paths in the network that would not be usually chosen by routing protocols since they do not correspond with the usual shortest path criteria, but which may have specific properties, such as available bandwidth. Essentially, MPLS-TE allows transporting packets along paths that meet additional criteria than just the shortest distance. Doing this helps to utilize the overall capacity of the network more efficiently, and allows providing specific SLAs to specific data flows.
MPLS-TE has multiple building blocks itself. First, it comprises a set of traffic engineering extensions for link-state IGPs. Their purpose is to, among other details, carry information about the total and reservable bandwidth on each link between routers in a network. Second, MPLS-TE reuses the RSVP protocol from the Integrated Services (IntServ) architecture. In IntServ, the purpose of RSVP was to provide the comprehensive signalling by which an application running on end hosts could tell the network what requirements it had on the transport path for a particular flow (bandwidth, queuing). The network would either set up such path, or let the application know that the required path was not available. In MPLS-TE, RSVP only runs between MPLS routers, not towards end hosts, and has slightly different responsibilities: It performs the advertisement and accounting of the bandwidth used up by individual TE tunnels on the individual links, and advertises MPLS labels for these TE tunnels. The link-state IGP and RSVP in MPLS-TE work together: The IGP is responsible for finding the shortest path that still meets the bandwidth requirement of a particular tunnel, then hands off the exact step-by-step router and link sequence to RSVP that performs the accounting of the used and remaining bandwidth along this path, and advertises the MPLS labels for this path.
Traffic-engineered paths in MPLS-TE are called TE tunnels, or simply tunnels. A TE tunnel would always be configured on the starting router (the headend router) of the tunnel, and the configuration would typically contain the destination of the tunnel, the required bandwidth, and a list of path options, referring either to the routing protocol to do the path computation, or to an explicit list of next hops comprising the tunnel’s path. OSPF/IS-IS and RSVP would then either set up the tunnel including the MPLS labels, or inform the headend router that a path towards the configured destination with the requested bandwidth does not exist. TE tunnels in MPLS are always unidirectional - from the headend to the tailend router. A bidirectional traffic-engineered path between two tunnel endpoints would require setting up two TE tunnels.
MPLS-TE is one of the major applications of MPLS, but there are certain concerns about its scalability. One of the obvious challenges is the configuration/operational complexity: Each tunnel with unique properties (destination, requested guaranteed bandwidth, and others) has to be configured on its headend explicitly. With tens, perhaps hundreds of tunnel endpoints, also considering the fact that there can be multiple tunnels with different requirements between the same endpoints, the amount of configuration increases considerably. Another concern is the presence of RSVP itself. In IntServ times, the RSVP signalling and resulting control plane state about all flows posed a major scalability burden. In MPLS-TE, the situation is way better since RSVP only carries information about TE tunnels, not about individual flows anymore, but RSVP is still an added component with its own set of messages to be periodically exchanged, and a state to be stored. Finally, another aspect of MPLS-TE are the added responsibilities of the link-state IGP - OSPF or IS-IS. These IGPs had to be extended with an additional feature called Constrained Shortest Path First (CSPF), which is a fancy name for the shortest path computation that also takes available bandwidths into account, to make sure that the computed shortest path can carry a tunnel with a particular requested bandwidth. Although this feature is very simple in its nature - ignore all links whose current available bandwidth is lower than the tunnel’s requested bandwidth and just compute the shortest path using the SPF algorithm over whatever links are left - it needs to be invoked on the headend router for every tunnel. As network administrators are always cautious about extra SPF runs, the CSPF executions do bring some concerns in.
In summary, MPLS-TE imposes a considerable amount of extra state about the maximum, used, and available reservable bandwidths on the links in the network, and about the TE tunnels themselves, that the routers have to exchange, maintain, and update. While the TE capability is greatly desired, the resulting control plane overhead is not that attractive.
Networks based on packet switching offer an unique quality, however: Instead of just transporting plain "vanilla" packets and storing all the forwarding state about their specific handling on all the routers along the path, we can encode the state - or the delivery instructions - somewhere into the packet itself. The instructions would be inserted into the packet by an ingress router, and other routers would simply follow these instructions. To an extent, this is what MPLS is already doing: Since the routers inside an MPLS network perform forwarding decisions based on the top label alone, they do not need to understand anything below that label. This is exactly what the MPLS VPNs are all about. Even with this logic, though, MPLS-TE still needs quite some overhead in terms of control plane operations till it sets up a particular sequence of labels on routers that mark out the particular TE tunnel path in a hop by hop fashion. Would it be possible to encode the complete TE tunnel path into a labeled packet, and if so, how to do this in the most efficient way?
This question has actually been answered - using the technically available means of those times - over 35 years ago, when a technology capable of traffic engineering called Source Routing made its appearance on the stage. Let’s take a short step sideways because source routing - in a reinvented way - is what makes the core of Segment Routing.
IP Source Routing
The specification of IPv4 was published as RFC 791 in 1981. This document specified the format of IPv4 packets, the addressing architecture, and the operations of hosts and routers when originating, routing, forwarding, and receiving packets. The basic routing mode of IPv4 packets was based on the destination routing paradigm: Only the destination address is used to select a route for a given packet; the sender or other properties of the packet do not influence the path selection. However, the IPv4 specification also came with two routing modes (in the options field) where the sending host had more control about the path of the packet: Loose Source and Record Route (LSRR), and Strict Source and Record Route (SRRR)
What was the use case for them? They were meant to allow a sender to specify the path a packet should take - in other words, what hops the packet needs to traverse on its way towards its destination. So, the source of the traffic dictated, partially or totally, the path for those packets.
Let’s make a quick comparison.
In the case of common IP routing, for every packet individually, each router would determine the proper next-hop for the packet based specifically and only on the destination address found in the packet. Figure 1 shows the classical example of a destination-based routing along the common shortest path:
Figure 1: Destination-based routing along the shortest path
In Source Routing, on the other hand, the sender of a packet can prescribe an arbitrary path for the packet which is entirely independent of the typical shortest path to the destination. One possible path is shown in Figure 2:
Figure 2: An arbitrary path defined by Source Routing
As shown above, the path can be completely different, and it depends on the source of the packets to decide what should be the particular delivery path.
There are two IPv4 variants of Source Routing: Loose Source and Record Route (LSRR) and Strict Source and Record Route (SRRR). Both variants are based on storing a list of hops (waypoints) in the Options field of the IP header; the list is also known as "route data" and can hold a maximum of 9 addresses.
Loose Source and Record Route (LSRR) works in a similar way as a GPS does: Set a route towards a city, and define a couple of waypoints you want to pass through. The GPS will compute a route that consists of shortest paths between consecutive waypoints, without requiring you to specify every single crossroad along the path. Loose source routing works in the same way. The adjective Loose means that the consecutive waypoints - the next hops - do not need to be directly connected to each other. It is up to the router receiving the packet to decide which path to reach the upcoming waypoint - naturally, it will be the shortest path towards it.
Strict Source and Record Route (SSRR) works similarly but imposes an additional constraint: the consecutive next hops in the route data field must represent a sequence of directly connected routers. If the router handling a packet finds out that the explicit next hop is not directly connected, it will drop the packet. Literally, whenever SSRR “dictates” the way, the packet must follow the path described by the route data information in the options field.
The purposes of Source Routing and its flavors were to provide a form of host-based Traffic Engineering, and to extend the troubleshooting toolkit by providing a way to discover and explore the network by adding a tool that could allow administrators to track and pinpoint failures in a specific path to a particular destination where multiple paths are available. Figure 3 shows a set of possible paths to the same destination that can all be used - or troubleshot - thanks to Source Routing. An important fact to realize is that the routers shown below do not store any state about these explicit paths - the explicit path is stored in the packet itself.
Figure 3: Multiple independent paths defined by Source Routing
Having a powerful tool like this one, able to bypass regular and expected forwarding patterns in a determined network, wasn't only attractive for administrators, but also for individuals trying to achieve malicious goals. For example, Source Routing could have been used to send packets to a private network that was not advertised in routing protocols as long as the attacker knew which routers to traverse before reaching the one that had the target network directly connected. Source Routing was proved to be a significant security concern and brought with it a considerable risk that no network should be exposed to. As with great power comes great responsibility, IETF published a document recommending to disable Source Routing capabilities in routers to avoid any kind of exposure to such peril. It eventually got deprecated and nowadays devices are not source-routing-enabled out of the box.
Lets reiterate where we have arrived so far. We have discussed MPLS-TE and noted that it is a very useful and widely deployed application of MPLS but it has issues with scalability and manageability, both referring to the control and management plane aspects. We have also discussed the IP Source Routing which can be seen as a rudimentary traffic engineering technique in which the network itself has no added burden on the control plane because the explicit path is encoded into each packet’s header, but which has security implications because it gives too much power to end hosts sending IP packets themselves.
But what if the responsibility of encoding the explicit path into the packet was given to an ingress router instead of the sending host? the security issues would no longer be a concern. And if there was a way to encode this explicit path information into labeled packets so that a MPLS-enabled network could process them without needing to store additional state on all routers along the desired path, it would resolve the issues with MPLS-TE scalability. These are the two key ideas of Segment Routing that combines the best from MPLS and Source Routing.
Encoding an explicit path into a packet can be seen as putting a sequence of instructions into the packet. With a grain of salt, it is almost like "turn left, then go straight, then turn right, then right again, and then straight for the next 10 kilometers". Segment Routing leverages this idea: Its explicit path is an ordered set of instructions placed into the packet, with the routers executing these instructions as they forward it. Each instruction in Segment Routing is called a segment, has its own number called the Segment ID (SID), and as we will learn later, there are multiple segment types. To represent these instructions in a packet, Segment Routing needs to choose a suitable encoding - and for MPLS-enabled networks, the natural encoding is nothing else than a label stack, with each label representing one particular segment. The MPLS label values would carry the Segment IDs of individual segments.
From a pure MPLS forwarding perspective, Segment Routing again builds on top of the basic MPLS forwarding paradigm and does not change how the labeled packets are forwarded, similar to other MPLS applications. Regarding control plane operations, there are two significant changes to the well-used MPLS control plane policies that deserve to be mentioned:
- For certain segment types, the labels have preferably identical values on all routers in the SR domain and so have global significance
- Label bindings to segments are advertised by OSPF or IS-IS; LDP is not used
To summarize: In Segment Routing, the path a packet follows is represented by a stack of labels pushed down to the packet by an edge router. Each label represents a segment - a particular forwarding instruction that determines how the packet will be forwarded.
Having said this, we have understood that they are instructions, but now we need to determine how the routers identify these segments. Let’s define the classes of segments we can encounter.
In Segment Routing, there are two segment classes:
- Global Segment
- Local Segment
A global segment is an ID value bearing significance inside the entire SR domain. This means that every node in the SR domain knows about this value and assigns the same action to the associated instruction in its LFIB. The reserved label range used for these purposes is <16000 - 23999>, it is called Segment Routing Global Block (SRGB) and it is a vendor-specific range, therefore, other vendors may use a different range.
A local segment, on the other hand, is an ID value holding local significance, and only the originating node (the router advertising it) can execute the associated instruction. As this range is only relevant for that particular node, these values are not in the SRGB range but in the locally configured label range.
Segment Routing recognizes many particular types of segments that belong either to the global or the local segment class. Let’s have a look at some of them:
IGP Prefix Segment: A globally significant segment which is distributed by IGPs (IS-IS/OSPF) and whose path is computed as the shortest path towards that specific prefix. This also allows it to be ECMP-aware. The actual SID value of an IGP Prefix Segment is configured by the administrator on a per-interface basis, and it is also the administrator’s responsibility to make sure that this value is unique in the entire SR domain. Typically, the SID would be configured on loopback interfaces to identify nodes in the cloud. An IGP Prefix Segment is very similar to a loose source routing hop. This is shown in Figure 4:
Figure 4: IGP Prefix Segment
IGP Adjacency Segment: A locally significant segment distributed by IGPs (IS-IS/OSPF) which describes a particular link - or better put, an IGP adjacency between two neighboring routers. As opposed to IGP Prefix Segments, the SID for an Adjacency segment would be assigned by the router itself, and does not require an administrator’s intervention. The instruction related with this segment can be explained as “Pop label and forward on the IGP adjacency”. An IGP Adjacency Segment is very similar to a strict source routing hop, as shown in Figure 5:
Figure 5: IGP Adjacency Segment
Pushing multiple labels representing segments of the same type onto a packet essentially provides exactly the same functionality as IP Source Routing does: Multiple IGP Prefix Segments are nothing else than Loose Source Routing; multiple IGP Adjacency segments are nothing else than Strict Source Routing - but here, based on MPLS labeling, and, provided with a sufficient MTU reserve, not limited anymore to just 9 explicit hops.
What might not be obvious is that labels for both segment types can be freely combined and pushed onto a packet! Their combination is a superset of what plain IP Source Routing was able to accomplish, and provides ample space for more complex source routing scenarios including backup paths and fast-reroute-alike detours where traffic can be steered through the network routing around a failure. A simple scenario is shown in Figure 6:
Figure 6: Combining Prefix and Adjacency segments
BGP Prefix Segment: Similar to IGP Prefix segment and holding global significance, BGP Prefix Segment represents the shortest path to a specific BGP prefix and, of course, is ECMP-aware. As opposed to IGP Prefix Segment that is advertised by an IGP, this segment is signaled by BGP.
Figure 7: BGP Segment
Since the Prefix segments (IGP Prefix and BGP Prefix segment types) have a global significance, it was necessary to consider that MPLS routers might reserve the same range of label values for SR deployment, and it might not be possible to expect that all routers will be able to use the same label for the same segment. There are various reasons for that: Different vendors might allocate different default ranges; gradual SR deployment into an existing MPLS network may face the obvious issue of the label range already partially used or label ranges configured differently on different routers. Therefore, Prefix segments introduce a level of indirection: Each router advertises its own range of labels reserved for Prefix segments in its link-state packets, and this range is called the Segment Routing Global Block (SRGB). Individual Prefix segment IDs are then advertised as offsets, or indexes, from the beginning of the label range, instead of absolute values. Typically, the SRGB range starts at 16,000, and this is what we call the default SRGB.
How does this help? Check the Figure 4 again. The rightmost router is shown to advertise the prefix segment for prefix 188.8.131.52/32 as 16005. In reality, though, the router would advertise that its own SRGB starts at 16,000, and that the index for prefix 184.108.40.206/32 is 5 (16,005 = 16,000 + 5). If all routers in the SR domain use the same SRGB, they will all arrive at the same label of 16,005 when forwarding packets along the path toward 220.127.116.11/32. However, if the top middle router used a SRGB that starts at 20,000, its own SID for this prefix would be 20,005 (20,000 + 5). Every neighbor of this router would know that, too, since each router’s SRGB is advertised in its link-state packets. So when a neighbor would forward packets toward 18.104.22.168/32 through the top middle router, knowing that the index of this prefix is 5 and the router uses a SRGB range starting at 20,000, it would use a label of 20,005 instead. Again, with global SIDs, their originating routers advertise their index rather than their absolute value; the actual value to be used in the label is computed as the index plus the SRGB base of the next hop.
As a summary: Segment Routing is able to accomplish exactly what MPLS itself can, and brings with itself a new paradigm of encoding the forwarding state into the packet itself as a label stack, opening a whole new area of possible applications. From a control plane perspective, Segment Routing relies on extensions made for link-state routing protocols to advertise the segment IDs, and to provide detailed knowledge about the network topology required to accomplish the source routing operations. Each segment represents a forwarding instruction that gets discarded once the task is fully carried out, and, as the segments taken into account each hop are the ones on the top of the MPLS label stack, labels are discarded once their task is done and forwarding is achieved, this process is repeated till the packet reaches its destination. Reducing operational complexity while simplifying the forwarding process grants Segment Routing a positive position among ISPs, considered as an attractive technology to implement in complex environments where simplification can make a difference in daily operations and meeting tight service level agreements contracted by exigent customers.
This article was intentionally meant to be a light introduction into the topic, tying strongly into the roots of Segment Routing rather than in its advanced features, and does not cover more advanced topics or deployments such as Path Computation Element (where Segment Routing demonstrates its capacity to be SDN-ready) deployment in conjunction with BGP-LS, or data plane related features like Topology Independent LFA and several others. These are coming, though - stay tuned!
Any feedback, comments, questions and corrections are welcome!
Spanish version: Introducción a Segment Routing