Business Continuity Planning Domain

Chris Hare (chare@nortelnetworks.com)

Nortel Networks

Version 1.0 - March 1999

This simple study booklet is based directly on the ISC2 CBK document.

This guide does not replace in any way the outstanding value of the CISSP Seminar and the fact that you must have been involved into the security field for at least a few years if you intend to take the CISSP exam. This booklet simply intends to make your life easier and to provide you with a centralized resource for this particular domain of expertise.

 

WARNING:

As with any security related topic, this is a living document that will and must evolve as other people read it and technology evolves. Please feel free to send me comments or input to be added to this document. Any comments, typo correction, etc… are most welcome and can be sent directly to: chare@nortelnetworks.com

This is NOT a Nortel Networks sponsored document, nor is it to be indented as a representation of Nortel Networks operating practices.

 

DISTRIBUTION AGREEMENT:

This document may be freely read, stored, reproduced, disseminated, translated or quoted by any means and on any medium provided the following conditions are met:

 

 

 

CBK - Business Continuity Planning

Description

The Business Continuity Planning and Disaster Recovery Planning domain addresses the preparation of specific actions to preserve the business in the face of major disruptions to normal business operations. It deals with the natural and man made events and the consequences if not dealt with promptly and effectively.

Expected Knowledge

The professional should fully understand:

The CISSP can meet the expectations defined Access Control Topics and Methodologies as:

Examples of Knowledgeability

Compare and Contrast Business Continuity Planning (BCP) and Disaster Recovery

Planning (DRP)

Define What is Involved in Business Continuity Planning

Define Who is involved in Business Continuity Planning

Define and Describe Legal & Regulatory Reasons for Business Continuity Planning

Define What Disasters Must be Prepared For in BCP and DRIP

Define Information Security Goals and their relationship to BCP and DRP

Describe the Generic Recovery Planning Methodology

Define Disaster Recovery

Identify the Actions that must be Taken in Disaster Recovery Planning (DRP)

Identify the Objectives of Disaster Recovery Planning (DIRP)

Define What is Involved In Disaster Recovery Planning

Define Who Is Involved in Disaster Recovery Planning

Define and Describe the Role of Senior Management in Disaster Recovery Planning

Define Probable Complications Beyond the Actual Disaster

Define Methods of Dealing with Media and Others

Compare and Contrast Possible Organizational Placement of Planner

Define Disaster Recovery Planning Organization Guidelines

Describe Methodology for Identifying Critical Business Units

Describe Methodology for Identifying Critical Support Units

Identify the Steps to be Taken In Performance of a Criticality Survey

Identify Objectives, Major Activities, and End Products of a Vulnerability Assessment

Identify Factors that must be Considered when Analyzing and Summarizing Potential

Financial Impacts of Successful Disaster Attacks

Identify Areas which must be Considered when Writing Disaster Recovery Plan

Define Areas of End-User Disaster Contingency Planning

Compare and Contrast Strategies for Data Center Backup Planning

Identify the Advantages and Disadvantages of Mutual Aid Agreements

Compare and Contrast the Advantages and Disadvantages of Hot Sites and Cold Sites

Compare and Contrast the Advantages and Disadvantages of Using Service Bureaus

Identify Disaster Recovery Contingency Planing Events

Define and Describe Required Steps to Perform a Business Impact Analysis (BIA)

Define Roles of Maintenance in Disaster Recovery Planning

Identify Possible Preventive Measures in Disaster Recovery Planning

Identify Steps in Assessing Recovery Capability

Define Characteristics of a Test of Disaster Recovery Plan

References

[HUTT95] Hutt, Arthur, Seymour Bosworth, Douglas Hoyt. The Computer Security Handbook: Third Edition. John Wiley and sons, 1995.

[ISC991] (ISC)2 CISSP Week 1 Review Material

[KRAU99] Krause, Mikki, Harold Tipton, Editors. The Handbook of Information Security Management 1999. Auerbach, 1999.

 

 

Knowledge Areas

Important Definitions

Business Continuity Planning is defined as planning that facilitates the rapid recovery of business operations to reduce the overall impact of the disaster, while ensuring continuity of the critical business functions.

Disaster Recovery Planning is defined as the procedures for emergency response, extended backup operations and post-disaster recovery when the computer installation suffers loss of computer resources and physical facilities.

Recovery Planning is defined as the advance planning and preparation which are necessary to minimize loss and ensure continuity of the critical business functions of an organization.

Disaster is defined as an interruption affecting user operation significantly.

Disaster Recovery Planning is defined as a process to identify the critical computing resources in an organization. What potential event could affect or interrupt those services, and planning to respond to those events should they occur.

A Disaster Recovery Plan is defined as a comprehensive statement of consistent actions to be taken before, during and after a disaster.

The First Recovery team is responsible for determining if/when the building can be occupied again, the status of assigned people, initiate alternate site if needed, and activate emergency recovery procedures.

The Second Recovery teams set up and activate the alternate-processing site, retrieve needed materials from backups, install needed equipment, and resume critical work.

The Third Recovery team performs salvage and repair operations at the primary site. It is their job to get the primary site back into normal operation.

The objectives of Recovery Procedure Development include

The major activities in Recovery Procedure Development include

The End Products of Recovery Procedure Development are

The Structured Walk-Through Test occurs when the functional representatives meet to review the plan in detail. This involves a thorough look at each of the plan steps, and the procedures that are invoked at that point in the plan. This ensures that the actual planned activities are accurately described in the plan.

The Checklist test is a method of testing the plan by distributing copies to each of the functional areas. Each area reviews the plan and checks off the points that are listed. This process ensures that the plan addresses all concerns and activities.

The Simulation Test is where all operational and support functions meet to practice execution of the plan based on a scenario that is played out to test the reaction of all functions to various situations. Only those materials and information available in a real disaster are allowed to be used during the simulation, and the simulation continues up to the point of actual relocation to the alternate site and shipment of replacement equipment.

The Parallel Test is essentially an operational test. In this test, the critical systems are placed into operation at the alternative site to see if things run as expected. The results can be compared with the real operational output and differences noted.

The Full Interruption Test is when full normal operations are completely shut down, and the processing is conducted at the alternate site using the materials that are available in the offsite storage location and personnel that are assigned to the recovery teams.

Compare and Contrast Business Continuity Planning (BCP) and Disaster Recovery Planning (DRP)

From [ISC991], section 5, page2,

Business Continuity Planning is defined as planning that facilitates the rapid recovery of business operations to reduce the overall impact of the disaster, while ensuring continuity of the critical business functions.

Disaster Recovery Planning is defined as the procedures for emergency response, extended backup operations and post-disaster recovery when the computer installation suffers loss of computer resources and physical facilities.

The major difference between them is that business continuity planning involves more that just the computer facility, while DRP is focused more at maintaining business operations when the computing facilities are lost.

Define what is involved in Business Continuity Planning

From [ISC991] Section 5, page 3,

Business depends upon a variety of technology including centralized and decentralized computing systems, special-purpose support systems and communications systems.

Page 7-4 of [HUTT95] defines the elements of Contingency Planning as:

Ensuring the identification and protection of the organization’s vital records.

This represents the set or prepared actions that are meant to cope initially with disruption.

These are step by step actions to be taken by specific personnel when there is an emergency. (See [HUTT95] page 7-9.)

[[KRAU99] on page 274 present it as follows:

Define who is involved in Business Continuity Planning

In order for BCP to be effective, [ISC991] recommends that the following people must be involved:

Executive management must provide consistent support throughout the planning process and must put the final approval on the business continuity plan.

It is the responsibility of these groups to identify and prioritize the mission-critical systems within the organization.

This committee is composed of coordinators representing all functional units within the organization that are involved in planning, implementing and monitoring BCP activities. This team should include an Information Security Task Force that is a group of people who have a major interest in Information Security for the organization. It should also include the corporate auditors due to the legal and regulator issues associated with BCP.

Define and Describe Legal & Regulatory Reasons for Business Continuity Planning

There are several pieces of US legislation and regulatory agencies that require organizations to take appropriate care in safeguarding their information. These include:

Define What Disasters must be prepared For in BCP and DRIP

All of the following disasters should be planned for, as well as anything that is specific to the organization itself or the locations where the organization is situated. These are found in [ISC991] Section 5, pages 5-6.

Define Information Security Goals and their relationship to BCP and DRP

The Information Security goals are

The intent is to continue to prevent accidental or intentional

Throughout the planning and recovery process, the intent is to minimize the disaster impact as much as possible on these goals.

Describe the Generic Recovery Planning Methodology

From [ISC991] Section 5, page 12, here is the diagram.

The project initiation phase involves the following steps:

The business impact analysis generally uses material (i.e. a survey) that consists of customized materials for the organization. The purpose is to gather qualitative and quantitative impact information that is used to conduct an analysis of the impacts and document the findings. Recommendations are then prepared for management to approve.

Developing the recovery strategy includes compiling the resource requirements and identifying the alternatives that are available during recovery. With that information available, a strategy is then developed and documented for the recovery efforts, and submitted for approval.

With the strategy complete, the recovery plan must be documented according to the organization’s documentation format.

The recovery plan must be implemented, tested and maintained over time to be able to handle the organization’s changing requirements. With the completion of the implementation procedures, maintenance and testing strategy, the plans are submitted for management approval.

Define Disaster Recovery

From [ISC99] section 5, page 12,

Disaster is defined as an interruption affecting user operation significantly.

Disaster Recovery Planning is defined as a process to identify the critical computing resources in an organization. What potential event could affect or interrupt those services, and planning to respond to those events should they occur.

Identify the Actions that must be taken in Disaster Recovery Planning (ORP)

Disaster Recovery planning according to [ISC991] section5 page 13, involves

Preparing a full-scale disaster recovery plan can take as long as three years. However, you will not see any direct payback from the money invested. If you do not use it, it is like throwing money down the toilet. If you need it, it is priceless.

Identify the Objectives of Disaster Recovery Planning (DIRP)

The objectives of Disaster Recovery Planning as outlined in [ISC991] Section 5 page 13 are

Define what is involved In Disaster Recovery Planning

From http://www.netoffice.u-net.com/bcp.htm

There are six steps:

Define Who Is Involved in Disaster Recovery Planning

The disaster recovery planning team should include representatives (or teams depending upon the size of your organization) from

Define and describe the Role of Senior Management in Disaster Recovery Planning

As in [ISC991] Section 5 page 15, Senior management have an important role in the Disaster Recovery Planning process. It is expected that senior management will demonstrate commitment and support to the program. This is the most critical element of their involvement. Senior management demonstrates this through

Typically, a business case is required to obtain their support. The business case should include

If senior management will not buy into the program, then it is unlikely that you will be able to gain any acceptance at the grass roots level. The implementation of a disaster recovery plan is not something that is done from a bottom up approach. It absolutely requires top down acceptance and motivation.

Consequently, senior management are expected to "sell" the program to the user community, and work to build and maintain user awareness for the disaster recovery program. They are expected to assist in ensuring that ongoing resources are allocated to maintain and test the plan.

Define Probable Complications beyond the Actual Disaster

[ISC991] section5 page 16 identifies the following complications beyond the disaster

Define Methods of Dealing with Media and others

Dealing with media can be a challenge. They will be present on your site as so as it is known that there has been a disaster of some kind, looking for the "Scoop". The disaster recovery plan must include how the media is to handled during the disaster, in order to keep things under control. While the CEO is generally the spokesperson of the company, it is not advisable for the CEO to talk to the press during a disaster.

Rather, [ISC991] recommends the following action plan:

Establish a central focal point fr the media to convey to them credible, informed responses by a trained spokesperson. Furthermore, give employees the ability to tell the press who to talk to. This alleviates the problem of getting misleading information from various sources.

This means they will not go looking to other people for responses and information.

While no company wants to publicize bad news, it is far better to report it to the press and public than to have someone come after the fact demanding an explanation. The appearance will be that the organization was trying to cover up the event, and this leads to mistrust from the public.

This avoids the likelihood of suspicions and rumors running rampant during the disaster. This would give you an additional major problem to address in the middle of the first one.

This allows for quick information transfer to customers, suppliers and shareholders regarding the disaster and how it is being handled. This should include the preparation of background releases on the company to prevent them from having to be done during the disaster.

This allows you to hold the conference where you want, and not necessarily with the disaster looming in the background. It helps to demonstrate the appearance that you have the situation under control.

Use photographs, video, etc. as a method of recording what happened and how things were handled. This information will be useful during an civil or criminal trials, as well as a review of how the disaster plan really worked.

Compare and Contrast Possible Organizational Placement of Planner

This can be tricky. The planner must be in a positional within the organization to balance the needs of the corporation with the individual business units that would be affected., The must be able to review the corporation charter and viewpoint. They must have the knowledge of the business, which enables them to understand how the disaster can affect the corporation. They must have easy access to executive management and have the credibility and ability to influence senior management when there is a decision to be made.

See [ISC991] Section 5, page 17.

Define Disaster Recovery Planning Organization Guidelines

The development of the disaster plan must be prepared by the same people who are going to execute if the time comes. By doing so, it is possible for them to see where they have made an error, and hopefully they nt for all eventualities. Even if they did miss something, they may be able to extrapolate the correct course of action from the plan since they understand it on an intimate level.

The planning organization will include representatives from all of the critical business and support units.

The business unit planners develop plans to restore the critical product and service delivery capability as quickly as possible. They must obtain management approval of the milestone deliverables before moving to the next item. This ensures that management is fully aware of the plan development and the progression of the project. Finally, the business unit planners are in the position to be able to test the plan within their business units, or to test the plan and monitor the progress that their business unit makes through the implementation of the plan.

The support unit planners are responsible for the development of plans to support the critical business functions. Their development is dependant on the business unit planners to forward the requirements to the support teams. Otherwise, their responsibilities are similar to the business unit planners.

The planning organization must also include consolidated area representatives. They will be required based upon the size and/or complexity of the organization. Smaller organizations may not need people in these roles. The need is established by the span of control guidelines for the organization. For example, generally 8-10 people is the optimum number for a single level in a reporting chain to handle. If there are more, then a reporting level is established to properly handle issues and control of the project. These representatives have a similar charter to a corporate program manager but they operate at a division/group or department level.

See [ISC991] section 5, page 18

Describe Methodology for Identifying Critical Business Units

The critical business units are identified through an examination of the "loss criteria". These criteria re:

From [ISC991] Section 5, page 26

It is important to note that the essential business units are identified as those business functions that are necessary to support an acceptable level of business continuity. Some example business functions are:

Describe Methodology for Identifying Critical Support Units

From [ISC991] Section 5, page 22:

The critical support units are those that provide a critical service to the business units. They support the local emergency management team, and are involved in the planning process. The following groups are examples of critical support units:

Identify the Steps to be taken In Performance of a Criticality Survey

From [ISC991] Section 5, page 23:

The Criticality Survey is implement through a standard questionnaire to tool to gather input from the most knowledgeable people. The intent is to find out what services and systems are critical to keeping the organization in business.

When developing the survey, it is essential that the purpose of the survey is clearly stated in order to avoid any misinterpretations. It is also important to get management approval of the survey before distributing it.

Identify Objectives, Major Activities, and End Products of a Vulnerability Assessment

From [ISC991] Section 5, page 24,

The objectives of a Vulnerability Assessment are:

The major activities of the assessment are

This involves determining what services are dependant upon other services being operational. For example, the data network may depend upon connectivity with outside network carriers.

Establish what the scenario is for the outage, e.g. a building blows up, a major equipment failure, etc.

The products of the assessment are:

Identify Factors that must be Considered when Analyzing and Summarizing Potential Financial Impacts of Successful Disaster Attacks

From [ISC991] Section 5, page 27

Hen analyzing and summarizing potential financial impacts, it is important to obtain data from approved outage impact scenarios, with an emphasis on including varying duration of outages. This will improve the reporting as the longer the outage, the greater the cost. However, including different time lengths will also illustrate the projected rate of cost increase.

It is also required that the direct costs associated with the outage be included. Again, outages of varying duration must be included in order to accurately estimate

The analysis must report on the direct financial impact to all of the essential business functions. Each business function will incur specific costs for outages of varying duration, and their direct costs must be included.

The estimated indirect costs associated with lost productivity, error, and other non-quantifiable costs are important. Although you have experienced an outage, the organization still had to pay its employees while they were being non-productive.

Finally, the single event losses must be estimated. This is done on an annualized basis using the Annualized Loss Expectancy (ALE) calculation. ALE is calculated using the formula

Single loss expectancy * annualized rate of occurrence

Identify Areas which must be Considered when Writing Disaster Recovery Plan

From [ISC991] Section 5, page 30:

There are several areas to be included in the disaster recovery plan. These are:

The considerations in this area surround

The facility area is concerneced particularly with the main building, remote, off-site and backup facilities.

The availability of people to operation of the plan is essential. All of the essential people and methods of reaching them should be identified prior to the need to activate the plan. Particularly, personnel in the operations, technical areas, along with vendor and other support groups are to be considered.

It will be a requirement to ensure that there is adequate hardware to support operations during the disaster. This means access to mainframe, mini and microcomputers, disk storage (DASD), tape units, printers and communications equipment.

It will be necessary to ensure that you have access tot eh various software components including operating system loads, application software and subsystem drivers.

It will be impossible to operate your regular business, much less during a disaster if appropriate measures are not taken to include for supplies such as paper, microcomputers and typewriters.

If your organization requires the use of special forms in its normal business operations, arrangements to have them available during the disaster are important. Some forms may include certificates, checks and other special forms.

The documentation of your communications will provide information on the circuits in use, phone configurations and location, modems and test equipment to support the communications network.

The organization must have documentation available to support it during the disaster. There may be the requirement for personnel who do not normally carry out certain functions to perform them during this time. The available document must include

The availability of the data to work with is essential. Procedures to recover backup data if necessary from the off-site storage location is essential.

There may be the need to move personnel from one location to another very quickly during the disaster. Suitable transportation to move them must be accounted for. For example, how employees get access to a car, truck or aircraft while on disaster duty must be included in the plan.

Arrangement for adequate Air Conditioning, Power Conditioning (UPS), safety and security equipment and services is necessary. There is little point in assembling new computing facilities for the disaster if they fail due to the lack of air conditioning.

Define Areas of End-User Disaster Contingency Planning

From [ISC991] Section 5, page 32-33;

End-user recovery planning is important, as they still expect to be operational while the disaster recovery is underway. Data processing capability to critical to most users on a day to day basis. When a disaster occurs, it may impact a number of services where manual procedures are difficult to implement or maintain on an ongoing basis. Since many data processing failures can be significant, and the recovery can be complex. The resulting service disruption can threaten the organization’s survival.

The end user must be involved in the disaster planning. The can be involved by recognizing the effect that a disaster can have on their ability to work. In the event of a situation serious enough to force the data processing center to move, user procedures should be able to address lost data or transactions, continuation of critical functions until the alternate site is ready, and any procedural changes to account for the alternate site operation.

It may also be a requirement that the user plans be able to change the ways information is input to the system if there is a loss of communications, or specialized equipment.

Compare and Contrast Strategies for Data Center Backup Planning

There are several major strategies for Data Center Backup Planning. These include

See the following sections for a discussion of these topics.

2Identify the Advantages and Disadvantages of Mutual Aid Agreements

From [ISC991] Section 5 page 35:

Mutual aid or reciprocal agreement are arrangements between to companies to provide facilities to the other in the event of a disaster. They require that the two agencies have similar hardware and software computing environments.

Advantages of this form of arrangement is

However, there are a number of disadvantages, which often make the use of a reciprocal agreement tenuous at best. These are:

Typically, reciprocal agreements are dismissed in practice because the few data processing installations have the extra capacity needed to run both their own and another organization’s jobs.

Compare and Contrast the Advantages and Disadvantages of Hot Sites and Cold Sites

From [ISC991] Section 5 page 36-37:

Hot, warm and cold sites are generally run through a subscription service. The hot site is defined as a fully configured site with complete hardware and software compatible with the client. This is generally available in hours, hence the term host site.

The warm site is similar to a hot site, although the expensive equipment that can be reliably and quickly obtained is not available on site. The site is ready in hours after the needed equipment arrives.

A cold site is one that is air conditioned and wired, but there is no equipment on site. This is ready for use as soon as the customer’s equipment arrives.

The hot site is preferable because you can be operational n a very short period. It is highly available and the service bureau assures exclusive use of the facility. The site is available immediately when disaster strikes, and supports both short and long term outages. The contract for the hot site includes annual test time to ensure that everything is compatible and can be made operational as expected. Typically, a subscription service will have multiple sites available so you can go the site that is closest to the disaster location.

However, the hot site is the most expensive to maintain. If there is a regional disaster, and someone else uses the same subscription service, there is the possibly of contention for the hot site between one or more companies. If the organization has special or unusual hardware, there are limited choices available to provide support.

The cold site is designed to simply provide a room, air conditioning and power. There is no equipment provided, that is the organization’s responsibility. The cold site is assured of availability for along period of time. They are available I a variety of locations across the country, and the subscription service assures exclusivity of use. It is the least expensive of the three options and practical for situations where the hardware involved is less popular.

The disadvantages associated with the cold site is that testing the plan is not possible since you do not really want to go through the expense of procuring a great of equipment for a short period test.

Compare and Contrast the Advantages and Disadvantages of Using Service Bureaus

It is also possible to obtain hot site services through a service bureau. Remember that a service bureau gains it advantage by offering services for a larger customer base. By obtaining hot site services through the service bureau, there is a quick response time and availability of service. It is possible to test the configuration to make sure that it and your plan is effective.

However, the few service bureaus offer this as a service. It costs as much as a hot site, and there is a configuration management problem. As they will continue to configure and modify their system to suit their ongoing business needs, the implementation of your plan may be a problem.

Identify Disaster Recovery Contingency Planing Events

From [ISC991] Section 5, page 43:

The recovery contingency planning events are:

Define and describe Required Steps to Perform a Business Impact Analysis (BIA)

From [ISC991] Section 5, page 45:

There is a requirement as part of the BIA to develop customized materials for your organization. The process involves collecting both qualitative and quantitative impact information.

Define Roles of Maintenance in Disaster Recovery Planning

Identify Possible Preventive Measures in Disaster Recovery Planning

From [ISC991], Section 5, page 47,

There are several preventative measures available to the Disaster Recovery Planner. These relate to specific types of threats, including the following:

Fire

Detection, Suppression and Protection

Water

Leaks, equipment covers

Electrical

UPS, backup generators

Environmental

Air Conditioning

Backup Procedures

Systems, applications, data, documentation

Other preventative measures include

Identify Steps in Assessing Recovery Capability

Recovery capability is assessed through the following:

Define Characteristics of a Test of Disaster Recovery Plan

From [ISC991], Section 5, pages 52-54,

There are a number of benefits to testing the plan on a regular basis. These benefits include

When the plan is being tested,

The Structured Walk-Through Test occurs when the functional representatives meet to review the plan in detail. This involves a thorough look at each of the plan steps, and the procedures that are invoked at that point in the plan. This ensures that the actual planned activities are accurately described in the plan.

The Checklist test is a method of testing the plan by distributing copies to each of the functional areas. Each area reviews the plan and checks off the points that are listed. This process ensures that the plan addresses all concerns and activities.

The Simulation Test is where all operational and support functions meet to practice execution of the plan based on a scenario that is played out to test the reaction of all functions to various situations. Only those materials and information available in a real disaster are allowed to be used during the simulation, and the simulation continues up to the point of actual relocation to the alternate site and shipment of replacement equipment.

The Parallel Test is essentially an operational test. In this test, the critical systems are placed into operation at the alternative site to see if things run as expected. The results can be compared with the real operational output and differences noted.

The Full Interruption Test is when full normal operations are completely shut down, and the processing is conducted at the alternate site using the materials that are available in the offsite storage location and personnel that are assigned to the recovery teams.