It’s 8:00 AM on the east coast of the United states and the support team is beginning to get a lot of calls from users that claim some critical network resources are unavailable. The support staff manager calls you for help. After about 30 minutes of troubleshooting, your phone begins to ring off the hook. You are getting more and more reports that some users can't work. Your phone continues to ring over the next 1/2 hour and you do your best to try and field as many of the calls that you can. It's now 8:00 AM in the central time zone. Offices there are beginning to come online. The support staff is now completely overwhelmed with calls from users who can't work. Your manager just sent you a text message asking about the issue. You're getting a plethora of instant messages from users on your laptop. People in the building are starting to congregate at your office door and a line is beginning to form. You now have people from every department and every managerial level asking for an update of when the network will be available. The network is down. Now what?
That's a question that strikes fear in the hearts of many network engineers, new and old. When the network is functioning as it should, everyone is happy and life moves at the speed of business. When the network is not operating, stress levels and blood pressures begin to rise. If the above scenario hasn't happened to you yet in your career, it will. As much as we plan for high availability, there is the possibility that something got overlooked in the design phase of the network. This blog will attempt to shed some light on 5 things you can do when the network does fail.
1. Stay calm - Your initial reaction to the situation is going to be vital. Keep your emotions under control even if you really are beginning to smell smoke coming from the datacenter. When people feel others begin to panic, they begin to panic. Take a step back and use logic and reason instead of emotion. The pressure will be there, but it's important to grow and learn how to deal with the pressure when it comes.
2. Rely on your training - Network engineers are in a skilled technical role and should show the ability to reason through problems with logic and critical thinking. Network engineers spend a lot of time reading about technology to get an understanding of how things work. Use your training to your advantage. Now is your time to shine. On the other hand, if you decided to use brain dumps for your training, now would be a good time to ponder your next career move.
3. Call for support - That's right. Call on all the additional support you may need. If you have a team of engineers, lean on them and ask them for help even if their role is to provide the support staff and managers with the updates they need. If you don't have a team of engineers to lean on, call Cisco TAC. You did renew those SmartNet contracts didn't you? Cisco has some very skilled people that can help in just about any situation. Don't be afraid to ask for help, even if it means you need to reach out to the Cisco support team.
4. Have a "Plan B" - Sometimes, as engineers, our goal will be to do whatever is necessary to get the network back to an operational state. That means using a "work around" instead of an actual solution to a problem at hand. In pressure situations, the main goal may to get people working again and to find a solution to the actual problem after hours. The ability to think outside of the box will serve you well here. In just about every situation, there is more than one way to solve a problem. It's your job to engineer a solution to get around the problem if necessary.
5. Use time management - Don't spend too much time focusing on one possible solution. If you can't get a proposed solution to work after 3 hours, you've probably spent too much time focusing on one thing. This relates closely to having a plan B. Try to find a work around to the problem as soon as possible and budget a finite amount of time to work on a final solution. This means that if a solution to the problem hasn't been found within the amount of time budgeted, the work around can be implemented to restore network resources.
While these are a few key only points to help in a pressure situation when the network is down, there are some proactive things that can be done to prepare for such an event. I plan to discuss a few key points in my next blog entry. They will address things that can be done in the design stage and during daily business operation.