Lately, I have had several conversations with customers regarding Business Continuity and Disaster Recovery (BC/DR). I am finding that, in the middle market, when it comes to BC/DR, companies have swung the pendulum too far – they either have excessive BC/DR resources, or little-to-no BC/DR resources – and they need help.
Business continuity and disaster recovery is hard. BC/DR is viewed as remedial work, usually pawned off to the new guy. 364 days a year, BC/DR brings little value to the business. No one ever got promoted for putting together a good DR plan. Most companies are doing the same thing they did 15 years ago – sure, you are backing up to disk now, but it still goes to tape, and that tape still goes somewhere off-site.
What a lot of people don’t realize is that this space is perhaps the fastest changing discipline in IT today. In the last year there has been a tremendous amount of movement in this space. Microsoft, EMC, Cisco, VMWare – have all made acquisitions in this space, or developed their own solution and are building BC/DR directly into their products. Then there is the Cloud – whatever your definition of it is. The reality is that whatever my BC/DR plan is today, it won’t be the same in five years. Just like it shouldn’t be the same as it was 5 years ago.
Business continuity and disaster recovery is expensive. Many companies spend upwards of a million dollars a year on Infrastructure just to ensure they can get systems back up and running, or even worse, they spend that money on a service that they “demo” once or twice a year and it consistently fails. You have business owners saying they want zero downtime and zero data loss for zero dollars. That simply is not the way it works. The quicker you want systems up, the more it costs. The less data you are willing to lose, the more it costs. And it is a HUGE hockey stick.
Here is an exercise I have successfully used to discuss Business continuity and disaster recovery with my customers over the years based on a Managed Services IT model. It really helps us get to the meat of the conversation – Reality – both in terms of Cost and in Effort.
Before I get into it – you need to understand two important concepts – RTO and RPO:
- RTO – Recovery Time Objective – when a system goes down, how long can it be down? This is usually measured in seconds/minutes/hours/days.
- RPO – Recovery Point Objective – When I have a data incident, how much data can I lose? This is usually measured in cache (in memory)/seconds/minutes/hours/days.
Sure, there are other considerations that need to go into a good plan, but these two concepts will get you started.
My business continuity and disaster recovery strategy has two components at its core – your application tiers, and the application matrix. The tiers are where we lay down, in a broad swipe, what your standards will be for RTO and RPO. You will generally have 3-4 tiers for a good BC/DR plan. Creating tiers should not take into account any specific applications or workloads. It should reflect what your capabilities are based on what the business needs – You won’t have the infrastructure to meet these tiers Day 1 – and that is OK.
Here is an example:
Tier 1 – Business Critical
|Tier||Recovery Time Objective||Recovery Point Objective|
|1||15 Minutes||15 minutes|
|2||2 Hours||2 Hours|
|3||4 Hours||4 Hours|
|4||24 Hours||24 Hours|
Many companies today will have a tier for SaaS or Cloud solutions. This tier is a bit different as RPO and RTO is dictated by the provider and is enforced via financially backed SLA’s. So you may not be worrying about backup tapes but you should be worrying about SLA’s and how you will monitor these SLA’s.
Once you have your tiers, you create your matrix. You simply create a table with your applications in the first column and assign those applications to the appropriate tier – kind of like this:
|Application||Tier 1||Tier 2||Tier 3||Tier 4||Cloud|
|Apps not Listed||X|
Once you have these two tables, socializing it with the business is the final step. Take it for a spin and see what people think. See how it maps to what you do today.
Going through this exercise, a couple things might happen:
- You will have all your applications in Tier 1 or Tier 2 on the first pass
- You will always have 1 more application to add
- You might be surprised at what ends up in Tier 3 or Tier 4
- The real difference between Tiers 1 and 2 and Tiers 3 and 4 is typically not technology but priority – You can only bring up so many systems at one time.
- You will miss some application dependencies that you didn’t know about – For Lawson to work, SharePoint and Email have to be up for example.
Whether you actually implement these tiers or not, the most important outcome of this exercise will be the discussions you have with the business around service levels and costs.