Design Considerations: Speed and Availability


In the first two blogs of the series “Pillars of the Earth”, I covered cost and scalability. In this blog I’m covering two design considerations: speed and availability.

Blog18 - Pillars - speed and availability.PNG

How fast is fast enough for the network to do its job of enabling applications to run smoothly?



You’ll have to consider the applications you currently run and plan on running on the network short- and long-term to respond to the question “how fast is fast enough?” Different applications (data, voice, and video) and routing protocols have different requirements of end-to-end bandwidth, latency and jitter. Consider where the end users and the servers hosting these applications reside on the network. You need to know these requirements to create a network that is right for your customer.


(High) Availability

A critical component to consider when making the network application-aware is how to make it highly available. But what does “availability” really mean? It’s a metric that conveys how much downtime per year is acceptable from the business point of view. This varies from company to company, so don’t shoot for five 9s just because “it’s cool” (high availability doesn’t come for free) if it’s not a customer requirement: the cost of downtime is high, the cost of SLA breaches is high, or it impacts productivity or business results. You can calculate availability using Mean Time Between Failures (MTBF) (i.e. what, when, why and how does it fail?) and Mean Time to Repair (MTTR) (i.e how long does it take to fix?). To increase availability you can combine reducing MTTR with increasing MTBF. See the components you can consider to ultimately make the network more available.


Blog18 - Pillars - high availability.PNG



Reliability refers to consistently, timely, predictably and accurately having packets sent being received at the intended destination. How do you achieve network reliability? Consider symmetrical (vs. asymmetrical) routing across alternate paths, traffic engineering, and proper route summarization.



Redundancy refers to the duplication of components so that failures on a given component don’t affect the remaining of the network. How do you achieve redundancy? Avoid using components (hardware/software and links) in series, single points of failure and fate sharing, and consider using components in parallel, distributed functionality, and smaller fault domains.


(Fast) Convergence

There are many technologies that help achieve high availability through shorter convergence time, and they normally fall into one or more of these categories: failure detection and propagation, processing to determine the best alternate path(s), the use of these best alternate paths considering the dampening effect, and revert back once failure has been resolved.



Resiliency is the ability of networks to self-heal without requiring manual intervention from operations personnel to continue to run (business continuity). How do you achieve resiliency? Consider using the technologies described above associated with the Layer 2 and/or Layer 3 protocols you are already using or planning to use on your network design, and/or fine tuning protocol timers (should be used with caution).



What typically fails on a stable network? Cables, telco, power supplies and hardware (these can be resolved with redundancy), but mainly attacks, human error and what has changed. How do you achieve serviceability? Use simplicity on your network designs, keep the network documentation updated, and follow the established change management process.



Do you consider rightsizing the speed and availability on your network designs? Is there a topic you want to hear about on my upcoming blogs? Add it to the comments field!





Elaine Lopes is the CCDE and CCAr Certifications Program Manager and Team Lead for the CCIE program team, and she’s passionate about how lives can change for the better through education and certification.




Here are a few additional ways for us to engage and keep the conversation going: