Large and small disasters happen all the time.
Events ranging from purely local disasters such as local flooding caused by a
fire down the block, to a city-wide flu epidemic, or a region-wide blizzard,
all have the potential to put companies out of business.
In our disaster planning work with many
different types of organizations, we have seen that too many of them make the
recovery process harder for themselves - or even impossible - by not planning
ahead for disaster recovery. While they may take steps to try to prevent
disasters, they ignore the reality that prevention won't always work.
As a result, these organizations fail to take
prudent and inexpensive preparatory actions to facilitate the recovery of
their business operations.
DISASTER CATEGORIES
The first of two fundamental hurdles to
overcome when planning for disaster recovery is to realize that the seemingly
large variety of possible disasters can actually be reduced to a manageable
number. In point of fact, all disasters can be grouped into one or more of
only three categories. These are:
- loss of information,
- loss of access,
- loss of personnel.
RECOVERY TIME PERIODS
The second hurdle to overcome is in accepting
the fact that "business-as-usual" will be suspended at the time of
the disaster. In fact, the people who are usually in charge may not even be
available! For example, several years ago, there was a gas line explosion at a
bank in the midwest. In the explosion, all employees were either killed or
injured, the president was among those killed, and the executive vice
president was left to try to manage the recovery from his hospital bed.
What you have to accept is that there will be
two time periods which must be planned for following a disaster. First will be
the immediate, disorganized, "limited-operation" time span, which
will then be followed by a period of "makeshift-operations," which
can be quite lengthy until normal operations can be resumed.
Typically, following a physical disaster, the
limited-operations time span can extend for up to a week or more, while the
makeshiftoperations time span can last for several months until normal
operations are restored.
This need to recover in phases is typically
very difficult for management to accept. Often, when asked to prioritize among
the organization's services or products, our clients' first reactions are to
consider them all equal. Following that, people are often unrealistic in their
estimation of how fast departments can accomplish their tasks. In one of our
client situations, the organization had planned to relocate a key department
to a hotsite four hours away - without realizing that most of the affected
people were single parents, who couldn't possibly go there!
Once management has a proper mind-set to build
upon, the objective of the planning process is to systematically sort out the
various issues and priorities so that a cost-effective plan can be developed
which is in perspective to the level of loss exposure which the organization
is risking.
The process itself can be summarized in the
following steps:
- provide top-management guidelines,
- identify serious risks,
- prioritize the operations to be maintained
and how to maintain them,
- assign the disaster team,
- take a complete inventory,
- know where to get help,
- document the plan,
- review with key employees, test the plan,
and train all employees.
Each of these is discussed below.
Top management guidelines:
Input from top management is required to keep
the planning process in perspective and to insure participation by everyone
within the organization. Top management also has to indicate the length of
time during which time the organization is willing to accept disrupted service
and the amount of money the organization is willing to invest in procuring
standby equipment, paper forms, testing, etc. as part of being prepared for an
emergency. Input from management is also important in assigning priorities to
which operations will be maintained during the limited-operations time span
and which will be recovered later.
Our experience has been that even though
employees thought they knew the answers which management would give regarding
these priorities, invariably this stage in planning produces the most
surprises and shows how little communication often occurs between management
levels.
Identifying serious risks:
This is a "brainstorming" process,
which is best accomplished working with the employees themselves during
department or group meetings. It serves the dual role of starting to build the
awareness of the employees to the issue of disaster planning as well as
surfacing potential risk areas about which management may not have been aware.
For example, one of our clients who performs extensive money wire transfers,
discovered that in the event of telephone service interruption, the emergency
"callback" number they had given to their wire-transfer service
agency was in the same building as their normal telephone number. Obviously,
in a disaster, neither line would be available. The client immediately had the
number changed to one in another building - but would never
have known of the problem without going
through the process with lower-level employees.
Prioritize the operations:
Most managers never think about it, but for
the typical organization, the highest priority is payroll. Even if this is
performed by an outside service, there is usually a terminal for remote input
of the payroll data. So, in the event that there is a disruption, either at
the source of the data, or at the payroll processor, there must be a
delegation of authority to someone (remember, the president, owner, etc. may
well not be available) to be able to issue substitute manual advance checks.
In general, top management will have to
decide, depending on the kind of organization, how long they are willing to
operate without being able to perform each of their daily operations, such as
accepting customer credit applications, receiving deliveries, etc., in
addition to their more obvious operations such as buying and selling. Banks
need policies on accessing safe deposit boxes, sending out mortgage bills,
commercial night depository, etc., in additions to just worrying about
deposits and withdrawals.
Based on these priorities, the organization
can plan out how long to suspend each operation, and designate either a manual
backup mode or a longer lead-time approach for each function.
These priorities also guide the organization
in setting the frequency of off-site storage of backup files. For example, in
order to meet emergency requirements, some files which might normally be
stored off-site on a weekly basis might instead be stored on a more frequent
basis.
Assign the disaster team:
Disasters always seem to happen at the worst
possible times, when the fewest personnel are available. Therefore, it is
crucial that as part of the disaster plan, management appoint one person in
charge of recovery, and one person as second-in-command. Following this, as
many specific tasks as possible within the plan should be pre-assigned. In the
wake of hurricane Hugo, with most telephone service knocked out, one company
in South Carolina which had not preassigned tasks, reported that it took four
days just to assemble their key personnel. That is certainly not the way to
endear yourself to your customers or clients! The best basic rule of thumb is
that when disaster occurs, employees should know what they are responsible
for, and are not responsible for, who is in charge, and who is the designated
alternate in charge.
Inventory:
While most organizations have records covering
the make and model numbers of their equipment, at the time of purchase, they
are usually not updated and almost never kept off-site. Taking inventory
should include emergency vendor contacts for all equipment (including
microfilmers, specialty mailing and other equipment - not just computer
hardware and software), descriptions and formats of all data files, and copies
of all business forms used, along with the vendor contact for each.
Know where to get help:
Actively collect any additional names of
service or equipment providers as you come across them.
Documentation:
The plan should be written down - remembering
that if the core document is longer than 15-20 pages it will never be read or
used, along with the various assignments, updated inventory, and all key phone
numbers. Key personnel should have a copy of this documentation at home.
Review, Training, and Testing:
After completion, the plan needs to be
reviewed with all employees on a regular basis. This does not have to be a
lengthy procedure, and it offers a first-level "blink test" as to
the reasonableness of the plan - as our client with the staff of single
parents found out. Basic training also does not have to be time-consuming,
although employees should at least know where the fire extinguishers are
located and have seen a demo on how to use them.
More extensive training may be required in the
event that there has not been enough cross-training to allow employees to
replace a missing co-worker.
With respect to testing, a full-blown test of
the plan may not be feasible, although moves, relocations, or unplanned
shutdowns should be treated and evaluated as tests of your recovery ability.
To conclude, all of these activities can
basically be characterized as "in-advance decision-making." Their
cost is very little, yet they yield the immediate benefits of:
- improving communications within the
organization,
- highlighting vulnerable points in the
organization's operations,
- ensuring that the organization has its best
possible chances of surviving disaster.
Finally, the underlying philosophy in our
approach to disaster recovery planning it is that you can get a lot done
without a lot of expense, that you can benefit greatly by thinking through as
much as possible beforehand, and that you should assign responsibilities and
make management decisions now - rather than wait until you're in your parking
lot, leaning against a fire engine in the middle of the night!
Getting Started
-- Disaster
Recovery Planning --
Preparing