Things break. It is just a reality. In IT, we have support teams to handle this but are they prepared.
Competition requires us to change things, which increases the chances of breakage. And then, on top of that, we have to worry about security and vulnerability threats since having a strong online presence is a must! Put aside the physical viruses, we need to also worry about the e-viruses!
With that said, we can't anticipate everything, but we can prepare for problems. Firefighters do, and so should we.
So many kids typically want to be a firefighter when they are really young. I went to the extreme and looked into being a smokejumper into my college years even though I was on the engineering/IT and supply chain track with no desire of a long term career in fire fighting. However, I found that my career has been filled with fire fighting, just different from what I expected! I am sure most readers can relate, as many of our day-to-day plans are put aside to help our companies with present-day problems and needs. A.K.A. firefighting, from an IT perspective!
"Any disaster is a learning process."
- Julia Child
What is a "disaster"? The most common thought is something tremendous like a hurricane, earthquake, or fire. Others would say how their presentation went, or how they scored on a test. So putting perspective around a disaster is important, as well as defining different kinds of disasters when planning how many resources, software, and equipment to handle IT disasters. These can and should range from someone's email not working to a server outage to larger issues. Defining these produces a list that can show impacts to the business, both internal operations as well as to customers, and therefore can have approximate costs added to the effect of an outage.
From here, plans can be made on how fast they need to be handled, investments for levels of backups, monitoring, and outage protection. This can include cloud-based solutions, additional but separated servers, levels of malware protection, etc. For these plans, I would highly recommend covering anything from a bad sector in a disc to complete wipe-out of a data center (I've known one that flooded, so these do happen). However, more common ones such as a server needing a reboot, a power outage, database issues, deployment back-outs, and ISP outages should be the most common issues for which a plan exists.
"In preparing for battle I have always found that plans are useless, but planning is indispensable." - Dwight D. Eisenhower.
Having the right plans documented is beyond just sensible when running a business where you want your systems to work for customers both externally as well as internally. Disaster recovery plans are a fantastic way to explore what needs to be done to prepare a network for the worst, but should also cover the bad-but-not-worst situations. The many reasons include:
- Teams are better prepared to act quickly and do not need to think about the different steps to take, but rather just execute the steps.
- Communication plans ensure the right people are notified with the right timing.
- Alternative processes can be activated to keep the business running.
- The budget can be properly used so the business does not get fatally impacted.
On this last point, the act of planning recovery step involves many budgetary aspects - hardware for backup vs. cloud services, software to capture backups securely, additional software and/or network setup to switch systems over to a backup quickly, people requirements, monitoring requirements to make sure backups are being taken, and ultimately, the cost of features vs. business needs. It is difficult to impossible to implement a disaster recovery plan which provides complete seamless business continuity immediately. However, by determining critical systems vs. secondary or further tiers, the budget and resources can create a roadmap to implement the critical pieces needed first in building a robust recovery system. Questioning how quickly does a system need to be back up and running is key and requires much cost vs. timing discussions and decisions. Perhaps a recovery plan can be frame-worked and implemented over the course of several years to cover all aspects.
"The best-laid plans of mice and men often go awry" - Robert Burns
However, plans won't mean much unless they are understood by those who need to use them. General statements won't help unless there is a procedure to execute behind the plans. And there will be bigger impacts if the teams involved try to run these plans for the first time when a disaster strikes. You may find out that there are missing pieces that weren't anticipated, so now the disaster is becoming an apocalypse!
So allocate time and budget to test disasters. Have test users you can use to break their tools and emails safely to ensure the support team knows what to do and how to communicate in getting to the final fix. Invest in development and QA environments so larger outage recovery can be tested. Having an annual test is ok for compliance, but it is not enough to get a robust, trained team in place who can handle disasters and also help identify gaps or risks which can be budgeted and improved in the future.
Don't forget to plan for the common issues - deployment back-outs should rarely happen, but need to have the proper systems and procedures in place to prevent escalating problems. This is one of the main intentions of the first point above - teams need to be practiced at handling outages and problems rather than having to refer to a process document. Think of a fire emergency - do you want experienced and practiced professionals or a poorly trained volunteer or contracted force handling a fire at your home?
The main emphasis on disaster recovery is to treat them as personal. Make sure the support resources practice often on disasters so they will be ready when a real one happens. This can be achieved by getting support involved in refreshing QA systems with a disaster recovery procedure on a monthly or quarterly basis. Also, think of minor disasters - phone outages, local internet for companies with many offices, viruses on a few computers, hardware failures for both users and the network. Do you have redundancy where needed? Appropriate backups? Are your choices the most cost-effective compared to the industry?
It is so common to call what IT teams do as putting out fires. However, firefighters train and practice often, and so should IT teams. It is well worth the resources and budget to make sure the plans made are truly executable, and that teams are trained to do their jobs properly. Think Nascar, sports teams, etc., when getting proper practice techniques in place and the importance of training for these issues.
"If you have an important point to make, don't try to be subtle or clever. Use a pile driver. Hit the point once. Then come back and hit it again. Then hit it a third time - a tremendous whack." - Winston Churchill
One of the critical, but often overlooked pieces of responding to a disaster is effective communication. I use the word effective judiciously. There are situations where there is a minor delay or a performance impact which is mitigated internally versus other situation which require involving customers with the right language to let them know the situation and how it affects them with an anticipated resolution.
Liberty Technology Advisors is very experienced in these situations and can help prepare your company for hard situations. We have handled a variety of situations ranging from saving specific IT projects from disaster to hands-on leading companies though complete ransomware outage recoveries. We have experienced CIO's who can help lead or advise your organization through this process and prepare your organization as needed.
“For the things we have to learn before we can do them, we learn by doing them.” - Aristotle
I played sports throughout high school and college and fortunately did not get injured seriously nor often. However, I did play contact sports and knew it was more of a question of when injuries would happen and how serious versus being indestructible. Cyber-attacks will happen, hardware will break or get old, and tools from organizations we rely upon outside of our companies will have their own issues that may affect you. So it is always to good to follow the Boy Scout motto and Be Prepared!
I hated every minute of training, but I said, ''Don't quit. Suffer now and live the rest of your life as a champion.'' – Muhammad Ali
With injuries, it is also good to have healing, followed by a physical therapy recovery plan. Part of the disaster recovery plan needs to be how to return to normal business seamlessly. LTA has expertise in helping your company develop these plans to suit your company's requirements and budget. Contact us today to get your company prepared!