Difference between revisions of "Disaster Preparedness"

From EITBOK
Jump to: navigation, search
 
(21 intermediate revisions by 2 users not shown)
Line 1: Line 1:
<p style="color: red">'''Note: This wiki is a work in progress, and may contain missing content, errors, or duplication.'''</p>
+
<table border="3">
----
+
<tr><td>
 +
<table>
 +
<tr>
 +
<td width="60%"><font color="#246196">'''Welcome to the initial version of the EITBOK wiki. Like all wikis, it is a work in progress and may contain errors. We welcome feedback, edits, and real-world examples. [[Main_Page#How to Make Comments and Suggestions|Click here]] for instructions about how to send us feedback.''' </font></td>
 +
<td width="20%">[[File:Ieee logo 1.png|100px|center]]</td>
 +
<td width="20%"> [[File:Acm_logo_3.png|175px|center]]</td>
 +
</tr></table>
 +
</td></tr></table>
 +
<p>&nbsp;</p>
 
<h2>Introduction</h2>
 
<h2>Introduction</h2>
<p>Disaster preparedness and [http://eitbokwiki.org/Glossary#dr disaster recovery (DR)] supports business-continuity planning and includes planning for [http://eitbokwiki.org/Glossary#eit enterprise information technology (EIT)] resiliency, as well as recovery from adversity, so that critical business services affected are restored to a satisfactory working state within an [http://eitbokwiki.org/Glossary#acceptable acceptable] timeframe after the event. </p>
+
<p>Disaster preparedness and [http://eitbokwiki.org/Glossary#dr disaster recovery (DR)] support business-continuity planning and include planning for [http://eitbokwiki.org/Glossary#eit Enterprise information technology (EIT)] resiliency, as well as recovery from adversity, so that critical business services affected are restored to a satisfactory working state within an [http://eitbokwiki.org/Glossary#acceptable acceptable] timeframe after an event. </p>
<p>[http://eitbokwiki.org/Glossary#dr DR] can be defined as “in computer system operations, the return to normal operation after a hardware or software failure.&nbsp;[[#One|[1]]] Also, the “activities and programs designed to return the organization to an [http://eitbokwiki.org/Glossary#acceptable acceptable] condition. The ability to respond to an interruption in services by implementing a disaster recovery plan to restore an organization’s critical business functions.&nbsp;[[#Two|[2]]] </p>
+
<p>DR can be defined as "in computer system operations, the return to normal operation after a hardware or software failure."&nbsp;[[#One|[1]]] Also, the "activities and programs designed to return the organization to an acceptable condition. And the ability to respond to an interruption in services by implementing a disaster recovery plan to restore an organization's critical business functions."&nbsp;[[#Two|[2]]] </p>
 
<p>This chapter defines these processes and deliverables, and who should be responsible for planning, creating the documents, and communicating if a disaster occurs. The following are some examples for context: </p>
 
<p>This chapter defines these processes and deliverables, and who should be responsible for planning, creating the documents, and communicating if a disaster occurs. The following are some examples for context: </p>
 
<ul>
 
<ul>
Line 9: Line 17:
 
<ul>
 
<ul>
 
<li>Natural disaster affecting datacenters or EIT service operations (flood, fire, earthquake, wind)</li>
 
<li>Natural disaster affecting datacenters or EIT service operations (flood, fire, earthquake, wind)</li>
<li>Security breach resulting in disaster (destruction of data, admin password changes, virus/malware installation, sabotage)</li>
+
<li>Security breach resulting in a disaster (destruction of data, admin password changes, virus/malware installation, sabotage)</li>
 
<li>Usage error (accidental deletion, unplug/turn off system resulting in corruption)</li>
 
<li>Usage error (accidental deletion, unplug/turn off system resulting in corruption)</li>
 
<li>Utility failure affecting datacenters (loss of power even after UPS)</li>
 
<li>Utility failure affecting datacenters (loss of power even after UPS)</li>
Line 18: Line 26:
 
<ul>
 
<ul>
 
<li>Requiring use of computers or printers when power is out</li>
 
<li>Requiring use of computers or printers when power is out</li>
<li>Requiring use of internet when power or connectivity is out</li>
+
<li>Requiring use of Internet when power or connectivity is out</li>
 
<li>Single point of knowledge/control for administration access</li>
 
<li>Single point of knowledge/control for administration access</li>
 
<li>Lack of offsite backup storage</li>
 
<li>Lack of offsite backup storage</li>
 
<li>Lack of working restoration from backups</li>
 
<li>Lack of working restoration from backups</li>
 
<li>Lack of failover datacenters in separate locations</li>
 
<li>Lack of failover datacenters in separate locations</li>
<li>Undocumented or out of date documentation for system interfaces</li>
+
<li>Undocumented or out-of-date documentation for system interfaces</li>
 
<li>Requiring use of phones that are out of power</li>
 
<li>Requiring use of phones that are out of power</li>
 
<li>Lack of designation of leaders in restoration efforts (who is in charge of restoring service and they know they are in charge)</li>
 
<li>Lack of designation of leaders in restoration efforts (who is in charge of restoring service and they know they are in charge)</li>
Line 32: Line 40:
 
<li>To document and plan for appropriate backup and recovery processes for all systems, and priority of systems for restoration. </li>
 
<li>To document and plan for appropriate backup and recovery processes for all systems, and priority of systems for restoration. </li>
 
<li>To create and deploy an EIT disaster recovery plan. </li>
 
<li>To create and deploy an EIT disaster recovery plan. </li>
<li>To ensure that the business has business continuity processes in place in case of a disaster.</li></ul>
+
<li>To ensure that the business has business-continuity processes in place in case of a disaster.</li></ul>
<p>Fundamental principles of disaster recovery depends on the business functions within the enterprise, and how critical each is to the health of the business. There are several methods for determining criticality of functions. </p>
+
<p>The fundamental principles of disaster recovery depend on the business functions within the enterprise, and how critical each is to the health of the business. There are several methods for determining criticality of functions:</p>
<ul><li>Hierarchy of need and stated in [http://eitbokwiki.org/Glossary#sla SLAs], which is that the most critical business functions should be restored first, or in the first phase of disaster recovery. </li>
+
<ul><li>Hierarchy of need as stated in [http://eitbokwiki.org/Glossary#sla SLAs], which is that the most critical business functions should be restored first, or in the first phase of disaster recovery. </li>
<li>Keep the lights on ([http://eitbokwiki.org/Glossary#ktlo KTLO]) or keep the business running ([http://eitbokwiki.org/Glossary#ktbr KTBR]), which are not the same thing</li>
+
<li>Keep the lights on ([http://eitbokwiki.org/Glossary#ktlo KTLO]) or keep the business running ([http://eitbokwiki.org/Glossary#ktbr KTBR]), which are not the same thing.</li>
 
<li>All non-critical services are in the final phase of recovery.</li>
 
<li>All non-critical services are in the final phase of recovery.</li>
<li>Industry-specific, so all systems delivering lifesaving functions are the highest priority for recovery efforts, whereas administration systems wait for second or third wave of recovery. </li>
+
<li>Industry-specific, so all systems delivering lifesaving functions are the highest priority for recovery efforts, whereas administration systems wait for second or third wave of recovery.</li>
 
</ul>
 
</ul>
<p>However, a fundamental recovery principle is that all systems to be recovered should be attended to within the specifications for recovery time objectives ([http://eitbokwiki.org/Glossary#rto RTO]) and recovery point objectives ([http://eitbokwiki.org/Glossary#rpo RPO]) laid out by the business in the DR plan.</p>
+
<p>However, a fundamental recovery principle is that all systems to be recovered should be attended to within the specifications for recovery time objectives ([http://eitbokwiki.org/Glossary#rto RTOs]) and recovery point objectives ([http://eitbokwiki.org/Glossary#rpo RPOs]) laid out by the business in the DR plan.</p>
 
<h2>Context Diagram</h2>
 
<h2>Context Diagram</h2>
[[File:ContextDiagram_DisasterRecovery.jpg|700px]]
+
<p>[[File:07 Disaster Preparedness CD.png|700px]]<br />'''Figure 1. Context Diagram for Disaster Preparedness and Recovery'''</p>
<p>'''Figure 1. Context Diagram for Disaster Preparedness and Recovery'''</p>
+
 
<h3>Gather Inputs</h3>
 
<h3>Gather Inputs</h3>
 
<p>The following inputs are necessary for this process to initiate or continue:</p>
 
<p>The following inputs are necessary for this process to initiate or continue:</p>
Line 52: Line 59:
 
<li>[http://eitbokwiki.org/Glossary#cmdb Configuration management database (CMDB)] and asset inventory (see the [http://eitbokwiki.org/Operations_and_Support Operations and Support chapter])</li>
 
<li>[http://eitbokwiki.org/Glossary#cmdb Configuration management database (CMDB)] and asset inventory (see the [http://eitbokwiki.org/Operations_and_Support Operations and Support chapter])</li>
 
<li>Current enterprise architecture artifacts/source code/document management systems</li>
 
<li>Current enterprise architecture artifacts/source code/document management systems</li>
<li>EIT [http://eitbokwiki.org/Glossary#service_cat service catalogue] </li>
+
<li>EIT [http://eitbokwiki.org/Glossary#service_cat service catalog] </li>
 
<li>EIT staff capabilities</li>
 
<li>EIT staff capabilities</li>
 
<li>Vendor service agreements/maintenance agreements</li>
 
<li>Vendor service agreements/maintenance agreements</li>
 
</ul>
 
</ul>
<p>The obvious business driver is to reduce [http://eitbokwiki.org/Glossary#risk risk] for the business, by providing both mitigation strategies and contingency plans. High-risk projects or operational inefficiencies can lead to lost business, which ultimately causes lost income for the business &mdash; this can be the high price of risk.</p>
+
<p>The obvious business driver is to reduce [http://eitbokwiki.org/Glossary#risk risk] for the business, by providing both mitigation strategies and contingency plans. High-risk projects or operational inefficiencies can lead to lost business, which ultimately causes lost income for the business—this can be the high price of risk.</p>
 
<p>Another business driver for formal DR processes may be to meet regulatory (i.e., [http://eitbokwiki.org/Glossary#sox SOX]) or sustainability objectives. Part of the information gathering includes conducting workshops or interviews to document the drivers to ensure that deliverables meet these requirements.</p>
 
<p>Another business driver for formal DR processes may be to meet regulatory (i.e., [http://eitbokwiki.org/Glossary#sox SOX]) or sustainability objectives. Part of the information gathering includes conducting workshops or interviews to document the drivers to ensure that deliverables meet these requirements.</p>
 
<p>Another related information-gathering effort is to define and document the technical drivers driving DR, including aging technology and lack of application-support capabilities. </p>
 
<p>Another related information-gathering effort is to define and document the technical drivers driving DR, including aging technology and lack of application-support capabilities. </p>
Line 62: Line 69:
 
<h3>Business Impact Analysis</h3>
 
<h3>Business Impact Analysis</h3>
 
<h4>Define Critical Business Services</h4>
 
<h4>Define Critical Business Services</h4>
<p>The first activity is to define services critical to operations. Critical services are those that, if missing, would mean that the enterprise could no longer meet commitments and deliver business products or services. Use business impact analysis, and get input from the business, such as the risk management group, the business continuity management, audit departments, and executives. Use business process diagrams to assist with analysis. </p>
+
<p>The first activity is to define services critical to operations. Critical services are those that, if missing, would mean that the enterprise could no longer meet commitments and deliver business products or services. Use business impact analysis, and get input from the business, such as the risk management group, the business continuity management, audit departments, and executives. Use business process diagrams to assist with analysis. </p>
<p>The following list is a suggested structure for determining the ''service categories'' and corresponding criticality of organizations services (for definitions of the categories, refer to the [[#StandardServiceDefinitions|Standard Service Definitions]] section***where should ref go?***):</p>
+
<p>The first activity is to define services critical to operations. Critical services are those that, if missing, would mean that the enterprise could no longer meet commitments and deliver business products or services. Use business impact analysis, and get input from the business, such as the risk management group, the business continuity management, audit departments, and executives. Use business process diagrams to assist with analysis. </p>  
 +
<p>The following list is a suggested structure for determining the ''service categories'' and corresponding criticality of organizations services (for definitions of the categories, refer to&nbsp;[[#Three|[3]]]):</p>
 
<ul>
 
<ul>
 
<li>Mission critical</li>
 
<li>Mission critical</li>
Line 72: Line 80:
 
<p>Examples of typical critical services within an enterprise are safety processes, safety documentation management, communication polices and processes, and financial data and processes.</p>
 
<p>Examples of typical critical services within an enterprise are safety processes, safety documentation management, communication polices and processes, and financial data and processes.</p>
 
<h4>Map Critical Business Services to EIT Services</h4>
 
<h4>Map Critical Business Services to EIT Services</h4>
<p>This function is often referred to as ''building an EIT service catalog'', which is an important input to disaster recovery planning. A service catalog is “a database or structured document with information about all live EIT services, including those available for deployment…The service catalog includes information about deliverables, prices, contact points, ordering, and request processes.&nbsp;[[#Four|[4]]] Templates exist to assist with this mapping.&nbsp;[[#Five|[5]]] See the [http://eitbokwiki.org/Operations_and_Support Operations and Support chapter] for more information on service catalogs.</p>
+
<p>This function is often referred to as ''building an EIT service catalog'', which is an important input to disaster recovery planning. A service catalog is "a database or structured document with information about all live EIT services, including those available for deployment…The service catalog includes information about deliverables, prices, contact points, ordering, and request processes."&nbsp;[[#Four|[4]]] There are templates to assist with this mapping.&nbsp;[[#Five|[5]]] See the [http://eitbokwiki.org/Operations_and_Support Operations and Support chapter] for more information on service catalogs.</p>
 
<h4>Define Relevant Disaster Scenarios and Responsible Parties</h4>
 
<h4>Define Relevant Disaster Scenarios and Responsible Parties</h4>
 
<p>Clearly define criteria for who declares a disaster, including when and how. Mature organizations have assigned who is in charge during disasters so that there is a clear leader who can decide which processes and procedures to implement, and who knows to follow the communication plan. If no plan is in place, it allows for invalid assumptions about who is in charge, including no one taking responsibility, or multiple parties competing to be in charge, neither of which helps resolve the disaster and recover service.</p>
 
<p>Clearly define criteria for who declares a disaster, including when and how. Mature organizations have assigned who is in charge during disasters so that there is a clear leader who can decide which processes and procedures to implement, and who knows to follow the communication plan. If no plan is in place, it allows for invalid assumptions about who is in charge, including no one taking responsibility, or multiple parties competing to be in charge, neither of which helps resolve the disaster and recover service.</p>
 
<h4>Define Successive Waves for Extending Recovery Across the Business</h4>
 
<h4>Define Successive Waves for Extending Recovery Across the Business</h4>
<p>Due to the complex nature of EIT systems within the enterprise today, it is unrealistic to provide recovery for all services in the initial recovery phase. There are different levels of recovery for different tiers of business services, and a corresponding, agreed-to timeframe for recovery of each service within the enterprise. These waves of recovery begin with the most critical services, and move through to the least critical in an [http://eitbokwiki.org/Glossary#acceptable acceptable] timeframe based on a risk-mitigation process. For example, level one (i.e., Tier 1) recovery may take place within 72 hours of a disaster and would include services such as product production, shipping, and customer-service applications. Note: A non-critical service may be recovered in the first pass of recovery based solely on a critical service having it as a dependency. </p>
+
<p>Due to the complex nature of EIT systems within the enterprise today, it is unrealistic to provide recovery for all services in the initial recovery phase. There are different levels of recovery for different tiers of business services, and a corresponding, agreed-to timeframe for recovery of each service within the enterprise. These waves of recovery begin with the most critical services, and move through to the least critical in an acceptable timeframe based on a risk-mitigation process. For example, level one (i.e., Tier 1) recovery may take place within 72 hours of a disaster and would include services such as product production, shipping, and customer-service applications. '''Note:''' A non-critical service may be recovered in the first pass of recovery based solely on a critical service having it as a dependency. </p>
<p>Critical systems management is a useful process in the identification and documentation of critical systems.&nbsp;[[#Six|[6]]] Also, it ensures that proper application life-cycle management is occurring for these EIT services.&nbsp;[[#Seven|[7]]] </p>
+
<p>Critical systems management is a useful process in the identification and documentation of critical systems.&nbsp;[[#Six|[6]]] Also, it ensures that proper application lifecycle management is occurring for these EIT services.&nbsp;[[#Seven|[7]]] </p>
 
<p>Use risk-assessment techniques to analyze how disaster scenarios could adversely affect the business. One such process would be to tier possible risks into levels such as:</p>
 
<p>Use risk-assessment techniques to analyze how disaster scenarios could adversely affect the business. One such process would be to tier possible risks into levels such as:</p>
 
<ul>
 
<ul>
Line 87: Line 95:
 
<h3>Recovery Objectives and DR Plan</h3>
 
<h3>Recovery Objectives and DR Plan</h3>
 
<h4>Determine Recovery Objectives and Develop Plan</h4>
 
<h4>Determine Recovery Objectives and Develop Plan</h4>
<p>In cooperation with the business, define the [http://eitbokwiki.org/Glossary#rpo recovery point objectives (RPOs)] and [http://eitbokwiki.org/Glossary#rto recovery time objectives (RTOs)]. </p>
+
<p>In cooperation with the business, define the [http://eitbokwiki.org/Glossary#rpo recovery point objective (RPO)] and [http://eitbokwiki.org/Glossary#rto recovery time objective (RTO)]. </p>
 
<p>RPO is the point in time to which all integrated systems are recovered, taking into account backup schedules, sync points, and data-transfer points to ensure data quality and integrity. </p>
 
<p>RPO is the point in time to which all integrated systems are recovered, taking into account backup schedules, sync points, and data-transfer points to ensure data quality and integrity. </p>
 
<p>RTO is how long it will take to return an EIT service to active duty. This varies depending on the criticality of the service as well as how integrated the service is with other services. </p>
 
<p>RTO is how long it will take to return an EIT service to active duty. This varies depending on the criticality of the service as well as how integrated the service is with other services. </p>
[[File:RecoveryTimeline.jpg|700px]]
+
<p>[[File:RecoveryTimeline.jpg|700px]]<br />'''Figure 2. Recovery Timeline'''</p>
<p>'''Figure 2. Recovery Timeline'''</p>
+
 
<p>Configuration management is a process that helps document the business impact of a service, as well as documenting the backup and recovery requirements. Also, it provides an inventory of the applications and supporting infrastructure needed in the restoration processes.</p>
 
<p>Configuration management is a process that helps document the business impact of a service, as well as documenting the backup and recovery requirements. Also, it provides an inventory of the applications and supporting infrastructure needed in the restoration processes.</p>
 
<p>'''Organization and Culture'''</p>
 
<p>'''Organization and Culture'''</p>
<p>The risk tolerance and depth of capabilities within the organization have a large impact on the organization’s disaster preparedness level. In other words, the business’s disaster tolerance is the “the time gap the business can accept the non-availability of EIT facilities.&nbsp;[[#Two|[2]]] The lower the tolerance, the more extensive and costly DR practices and techniques are deployed. </p>
+
<p>The risk tolerance and depth of capabilities within the organization have a large impact on the organization's disaster preparedness level. In other words, the business's disaster tolerance is the "the time gap the business can accept the non-availability of EIT facilities."&nbsp;[[#Two|[2]]] The lower the tolerance, the more extensive and costly DR practices and techniques are deployed. </p>
 
<p>Also, the business product deliveries determine the requirements of the planning effort and metrics. </p>
 
<p>Also, the business product deliveries determine the requirements of the planning effort and metrics. </p>
 
<h4>Develop Communications Plan</h4>
 
<h4>Develop Communications Plan</h4>
Line 101: Line 108:
 
<li>How to deliver communications when standard communication systems are unavailable (such as email or phone systems)</li>
 
<li>How to deliver communications when standard communication systems are unavailable (such as email or phone systems)</li>
 
<li>Who to contact in a disaster situation, including specific lists for specific situations or systems affected</li>
 
<li>Who to contact in a disaster situation, including specific lists for specific situations or systems affected</li>
<li>What the information each communication should and shouldn’t include</li>
+
<li>What information each communication should and shouldn't include</li>
 
</ul>
 
</ul>
 
<p>Contact information lists should include the following [http://eitbokwiki.org/Glossary#stakeholder stakeholders]:</p>
 
<p>Contact information lists should include the following [http://eitbokwiki.org/Glossary#stakeholder stakeholders]:</p>
Line 118: Line 125:
 
</ul>
 
</ul>
 
<h4>Develop and Document DR Plan</h4>
 
<h4>Develop and Document DR Plan</h4>
<p>A ''disaster recovery plan ([http://eitbokwiki.org/Glossary#drp DRP])'' is “a set of human, physical, technical, and procedural resources to recover, within a defined time and cost, an activity interrupted by an emergency or disaster.&nbsp;[[#Two|[2]]]</p>
+
<p>A ''disaster recovery plan ([http://eitbokwiki.org/Glossary#drp DRP])'' is "a set of human, physical, technical, and procedural resources to recover, within a defined time and cost, an activity interrupted by an emergency or disaster."&nbsp;[[#Two|[2]]]</p>
<p>The DR plan document needs to include all of the information required to recovery all critical systems that a business needs to operate. EIT must work with the business to develop and document a DR plan. See the [[#template|template at the end of this chapter]] for recommended sections of a DR plan.</p>
+
<p>The DR plan document needs to include all the information required to recover all critical systems that a business needs to operate. EIT must work with the business to develop and document a DR plan. See the [[#template|template at the end of this chapter]] for recommended sections of a DR plan.</p>
 
<p>Data collection techniques are critical to the development of a meaningful DR plan that meets the needs of the business. </p>
 
<p>Data collection techniques are critical to the development of a meaningful DR plan that meets the needs of the business. </p>
 
<h4>Interface with Business Continuity</h4>
 
<h4>Interface with Business Continuity</h4>
<p>The EIT team must communicate their processes to the business, and make consistent updates to the [http://eitbokwiki.org/Glossary#bcp business-continuity plan (BCP)]. As new business components or services are added, the business assigns a criticality level, which then needs to be translated into EIT services, that are assigned internally to a tier to determine the disaster recovery requirements. The relationship between business continuity and EIT disaster recovery is symbiotic and is critical to the success of both functions within the enterprise.</p>
+
<p>The EIT team must communicate their processes to the business, and make consistent updates to the [http://eitbokwiki.org/Glossary#bcp business-continuity plan (BCP)]. As new business components or services are added, the business assigns a criticality level, which then needs to be translated into EIT services that are assigned internally to a tier to determine the disaster recovery requirements. The relationship between business continuity and EIT disaster recovery is symbiotic and is critical to the success of both functions within the enterprise.</p>
 
<h3>Implement and Test DR plan (Drill or Simulation)</h3>
 
<h3>Implement and Test DR plan (Drill or Simulation)</h3>
<p>The first step to implementing a DR plan is to allocate resources and assign responsibilities. The DR team needs to be assigned early in the process to ensure accountability and understanding of roles at the time of a disaster. Many different roles are needed to define and execute a successful DR plan. The DR test is an opportunity to cross train roles, to mitigate the risk of key roles not being available should a disaster occur. It is likely that no one from the business DR team will be available for the recovery of the systems, so documentation, testing, and assigning a strategic partner is important to the recovery of business services.</p>
+
<p>The first step to implementing a DR plan is to allocate resources and assign responsibilities. The DR team needs to be assigned early in the process to ensure accountability and an understanding of roles at the time of a disaster. Many different roles are needed to define and execute a successful DR plan. The DR test is an opportunity to cross train roles, to mitigate the risk of key roles not being available if a disaster occurs. It is likely that no one from the business DR team will be available for the recovery of the systems, so documentation, testing, and assigning a strategic partner is important to the recovery of business services.</p>
 
<h4>Roles and Responsibilities</h4>
 
<h4>Roles and Responsibilities</h4>
<p>Input supplier roles are roles and teams that supply the inputs to the process:</p>
+
<p>''Input supplier roles'' are roles and teams that supply the inputs to the process:</p>
 
<ul>
 
<ul>
 
<li>Enterprise risk-management team</li>
 
<li>Enterprise risk-management team</li>
Line 134: Line 141:
 
<li>Solution management team</li>
 
<li>Solution management team</li>
 
</ul>
 
</ul>
<p>Key roles are the responsible individuals or teams that perform the process:</p>
+
<p>''Key roles'' are the responsible individuals or teams that perform the process:</p>
 
<ul><li>DR team leads</li>
 
<ul><li>DR team leads</li>
 
<li>Test team</li>
 
<li>Test team</li>
Line 144: Line 151:
 
<li>Service manager</li>
 
<li>Service manager</li>
 
</ul></li></ul>
 
</ul></li></ul>
<p>User roles expect and receive the deliverables:</p>
+
<p>''User roles'' expect and receive the deliverables:</p>
 
<ul>
 
<ul>
 
<li>Operations management team<br />
 
<li>Operations management team<br />
Line 152: Line 159:
 
<li>Business management team</li>
 
<li>Business management team</li>
 
</ul>
 
</ul>
<p>Stakeholder roles are informed or consulted on the process execution:</p>
+
<p>''Stakeholder'' roles are informed or consulted on the process execution:</p>
 
<ul>
 
<ul>
 
<li>Enterprise risk management team</li>
 
<li>Enterprise risk management team</li>
Line 162: Line 169:
 
<li>Contract manager</li></ul></li></ul>
 
<li>Contract manager</li></ul></li></ul>
 
<h4>Document Recovery Strategies</h4>
 
<h4>Document Recovery Strategies</h4>
<p>As mentioned above, there are many strategies to recover the services that the business needs to function. There is a different [http://eitbokwiki.org/Glossary#solution solution] for every different service out there. The most important element is to choose a strategy, then document and communicate it.</p>
+
<p>As mentioned above, there are many strategies to recover the services that the business needs to function. There is a different [http://eitbokwiki.org/Glossary#solution solution] for every service. The most important element is to choose a strategy, then document and communicate it.</p>
 
<ul>
 
<ul>
<li>Use a '''third-party hot recovery site'''. The hot site should be in a geographically separate location to ensure that a natural disaster does not take out both the primary production location as well as the backup site location. These distances vary depending on geographic as well as infrastructure dependences (such as power, water, and network commonalities). </li>
+
<li>Use a '''third-party hot recovery site'''. The hot site should be in a geographically separate location to ensure that a natural disaster does not take out both the primary production location as well as the backup site location. These distances vary depending on geographic and infrastructure dependences (such as power, water, and network commonalities). </li>
<li>'''Real-time mirroring''' is a technique used to replicate data to a geographically separate location to ensure data is available if a restore processes is needed.</li>
+
<li>'''Real-time mirroring''' is a technique used to replicate data to a geographically separate location to ensure that data is available if a restore processes is needed.</li>
<li>'''Manual, non-standard, or ad hoc/on-demand/unscheduled procedures''' are an important aspect that is a responsibility of the business units to ensure business continuity while EIT is rebuilding system services. Recommend to business management that manual processes either be automated, or have testing be completed on a regular basis. Document the methods used to mitigate problems caused by aging technology, such as having parts inventories, and redundant or cold standby equipment.</li>
+
<li>'''Manual, non-standard, or ad hoc/on-demand/unscheduled procedures''' are an important aspect that is a responsibility of the business units to ensure business continuity while EIT is rebuilding system services. Recommend to business management that manual processes either be automated, or have testing be completed on a regular basis. Document the methods used to mitigate problems caused by aging technology, such as having parts inventories, and redundant or cold-standby equipment.</li>
<li>'''Offsite data archiving''' ensures that backups are available if a disaster makes the primary site unavailable. Offsite services are available through many service provides. Due diligence by the DR team is important to ensure that the offsite facilities can guarantee secure and proper handling of backup data, which is an important enterprise asset.</li>
+
<li>'''Offsite data archiving''' ensures that backups are available if a disaster makes the primary site unavailable. Offsite services are available through many service providers. Due diligence by the DR team is important to ensure that the offsite facilities can guarantee secure and proper handling of backup data, which is an important enterprise asset.</li>
 
<li>'''Action plans and recovery processes''' differ depending on what type of disaster has occurred. A single-component failure results in a standalone recovery of the failing component (such as an application, server, or appliance). An enterprise-wide disaster results in a disaster declaration event with a full DR plan being executed with the full DR team being mobilized.</li>
 
<li>'''Action plans and recovery processes''' differ depending on what type of disaster has occurred. A single-component failure results in a standalone recovery of the failing component (such as an application, server, or appliance). An enterprise-wide disaster results in a disaster declaration event with a full DR plan being executed with the full DR team being mobilized.</li>
 
<li>'''Identify and document potential disaster scenarios''' that have a high probability. For example, intrusion or denial of service attacks could have adverse effects on a technology company, whereas adverse environment conditions create higher risks to a construction company. </li>
 
<li>'''Identify and document potential disaster scenarios''' that have a high probability. For example, intrusion or denial of service attacks could have adverse effects on a technology company, whereas adverse environment conditions create higher risks to a construction company. </li>
 
</ul>
 
</ul>
 
<h4>Define a Schedule for Service Continuity Testing</h4>
 
<h4>Define a Schedule for Service Continuity Testing</h4>
<p>For the success of the recovery plan it is critical to define a schedule for the disaster recovery testing. One method often used is to simulate a disaster to test system recovery. Another process strongly recommended is to have production support test refreshes on a regular (i.e., monthly) basis. This not only ensures that backups are usable, but also that processes are well documented and functional, ensuring data quality and integration integrity. </p>
+
<p>For the success of the recovery plan, it is critical to define a schedule for disaster recovery testing. One method often used is to simulate a disaster to test system recovery. Another process strongly recommended is to have production support test refreshes on a regular (i.e., monthly) basis. This not only ensures that backups are usable, but also that processes are well documented and functional, ensuring data quality and integration integrity. </p>
 
<h4>Implement and Test DR Plan (Drill or Simulation)</h4>
 
<h4>Implement and Test DR Plan (Drill or Simulation)</h4>
 
<ul>
 
<ul>
<li>Implementation can take many forms. A hot-site contract is an agreement with a third-party vendor to provide facilities and infrastructure needed to restore agreed to services in the timeframe specified. There are many variants to this type of contract depending on the dollar value of the [http://eitbokwiki.org/Glossary#contract contract] and the expected availability of internal staff at the time of a disaster. If the hot sites are geographically distant from the enterprise offices, it is likely the contract includes staff to perform the recovery as well. </li>
+
<li>Implementation can take many forms. A hot-site contract is an agreement with a third-party vendor to provide the facilities and infrastructure needed to restore agreed to services in the timeframe specified. There are many variants to this type of contract depending on the dollar value of the [http://eitbokwiki.org/Glossary#contract contract] and the expected availability of internal staff at the time of a disaster. If the hot sites are geographically distant from the enterprise offices, it is likely the contract includes staff to perform the recovery as well. </li>
<li>Due to the size and complexity of many enterprises, in-house DR facilities are often the norm, meaning these are secondary facilities used as recovery centers for primary facilities if needed.</li>
+
<li>Due to the size and complexity of many enterprises, in-house DR facilities are often the norm, meaning these are secondary facilities used as recovery centers for primary facilities, if needed.</li>
 
<li>A useful metric from testing processes is the timing of the actual recovery procedures as well as a measure of the capabilities of the DR team, third-party, or secondary facilities, and the level of maturity of both staff knowledge and processes accuracy. </li>
 
<li>A useful metric from testing processes is the timing of the actual recovery procedures as well as a measure of the capabilities of the DR team, third-party, or secondary facilities, and the level of maturity of both staff knowledge and processes accuracy. </li>
 
</ul>
 
</ul>
<h3>DR Plan &mdash; Change Management</h3>
+
<h3>DR Plan—Change Management</h3>
 
<p>Regular verification and updates to backup processes are necessary to ensure that accurate and usable backups are delivered. This change-management process needs to provide updates to the documentation of the backup and recovery processes. For example:</p>
 
<p>Regular verification and updates to backup processes are necessary to ensure that accurate and usable backups are delivered. This change-management process needs to provide updates to the documentation of the backup and recovery processes. For example:</p>
 
<ul>
 
<ul>
 
<li>DR testing cycle changes as services change or risk tolerances change.</li>
 
<li>DR testing cycle changes as services change or risk tolerances change.</li>
 
<li>DR test results always cause process improvements and lessons learned to be added to the documentation.</li>
 
<li>DR test results always cause process improvements and lessons learned to be added to the documentation.</li>
<li>Updates and changes to [http://eitbokwiki.org/Glossary#bcp business-continuity plan (BCP)] go hand in hand with the changes to systems and services.</li>
+
<li>Updates and changes to the [http://eitbokwiki.org/Glossary#bcp business-continuity plan (BCP)] go hand in hand with the changes to systems and services.</li>
 
</ul>
 
</ul>
 
<p>Mature organizations build continual improvement evaluation and activities into all processes.</p>
 
<p>Mature organizations build continual improvement evaluation and activities into all processes.</p>
Line 197: Line 204:
 
<h2>Summary</h2>
 
<h2>Summary</h2>
 
<p>Like most processes, DR processes are a closed loop of plan > build > test > review with action. Continuous improvement and maturity of these processes are obtained through the regular execution of DR tests, measuring results, and then revising the DR plan as necessary. Stakeholder involvement with setting requirements is critical to the success of DR processes. </p>
 
<p>Like most processes, DR processes are a closed loop of plan > build > test > review with action. Continuous improvement and maturity of these processes are obtained through the regular execution of DR tests, measuring results, and then revising the DR plan as necessary. Stakeholder involvement with setting requirements is critical to the success of DR processes. </p>
 
+
<h2>Key Maturity Frameworks</h2>
 +
<p>Capability maturity for EIT refers to its ability to reliably perform. Maturity is measured by an organization's readiness and capability expressed through its people, processes, data, technologies, and the consistent measurement practices that are in place. See [http://eitbokwiki.org/Enterprise_IT_Maturity_Assessments Appendix F] for additional information about maturity frameworks.</p>
 +
<p>Many specialized frameworks have been developed since the original Capability Maturity Model (CMM) that was developed by the Software Engineering Institute in the late 1980s. This section describes how some of those apply to the activities described in this chapter. </p>
 +
<h3>IT-Capability Maturity Framework (IT-CMF) </h3>
 +
<p>The IT-CMF was developed by the Innovation Value Institute in Ireland. This framework helps organizations to measure, develop, and monitor their EIT capability maturity progression. It consists of 35 EIT management capabilities that are organized into four macro capabilities: </p>
 +
<ul>
 +
<li>Managing EIT like a business</li>
 +
<li>Managing the EIT budget</li>
 +
<li>Managing the EIT capability</li>
 +
<li>Managing EIT for business value</li>
 +
</ul>
 +
<p>The three most relevant critical capabilities are technical infrastructure management (TIM), information security management (ISM), and enterprise information management (EIM). </p>
 +
<h4>Technical Infrastructure Management Maturity</h4>
 +
<p>The following statements provide a high-level overview of the technical infrastructure management (TIM) capability at successive levels of maturity.</p>
 +
<table>
 +
<tr valign="top"><td width="10%">Level 1</td><td>Management of the EIT infrastructure is reactive or ad hoc. </td></tr>
 +
<tr valign="top"><td>Level 2</td><td>Documented policies are emerging relating to the management of a limited number of infrastructure components. Predominantly manual procedures are used for EIT infrastructure management. Visibility of capacity and utilization across infrastructure components is emerging. </td></tr>
 +
<tr valign="top"><td>Level 3</td><td>Management of infrastructure components is increasingly supported by standardized tool sets that are partly integrated, resulting in decreased execution times and improving infrastructure utilization.</td></tr>
 +
<tr valign="top"><td>Level 4</td><td>Policies related to EIT infrastructure management are implemented automatically, promoting execution agility and achievement of infrastructure utilization targets. </td></tr>
 +
<tr valign="top"><td>Level 5</td><td>The EIT infrastructure is continually reviewed so that it remains modular, agile, lean, and sustainable.</td></tr>
 +
</table>
 +
<h4>Information Security Management Maturity</h4>
 +
<p>The following statements provide a high-level overview of the information security management (ISM) capability at successive levels of maturity.</p>
 +
<table>
 +
<tr valign="top"><td width="10%">Level 1</td><td>The approach to information security tends to be localized. Incidents are typically not responded to in a timely manner. </td></tr>
 +
<tr valign="top"><td>Level 2</td><td>Defined security approaches, policies, and controls are emerging, primarily focused on complying with regulations. </td></tr>
 +
<tr valign="top"><td>Level 3</td><td>Standardized security approaches, policies, and controls are in place across the EIT function, dealing with access rights, business continuity, budgets, toolsets, incident response management, audits, non-compliance, and so on. </td></tr>
 +
<tr valign="top"><td>Level 4</td><td>Comprehensive security approaches, policies, and controls are in place and are fully integrated across the organization. </td></tr>
 +
<tr valign="top"><td>Level 5</td><td>Security approaches, policies, and controls are regularly reviewed to maintain a proactive approach to preventing security breaches. </td></tr>
 +
</table>
 +
<h4>Enterprise Information Management Maturity</h4>
 +
<p>The following statements provide a high-level overview of the enterprise information management (EIM) capability at successive levels of maturity.</p>
 +
<table>
 +
<tr valign="top"><td width="10%">Level 1</td><td>Management has limited awareness of information management opportunities. </td></tr>
 +
<tr valign="top"><td>Level 2</td><td>Basic and discrete information management approaches are in place, typically by function or line of business. </td></tr>
 +
<tr valign="top"><td>Level 3</td><td>Standardized information management policies, standards, and controls are in place across the EIT function, enabling formal oversight of all aspects of information management. </td></tr>
 +
<tr valign="top"><td>Level 4</td><td>Comprehensive information management policies, standards, and controls are in place across the organization. Business intelligence and analysis are recognized as key to organizational success. </td></tr>
 +
<tr valign="top"><td>Level 5</td><td>Information management policies, standards, and controls are continually reviewed based on agreed risk tolerance factors. Their scope effectively extends to key business ecosystem partners. </td></tr>
 +
</table>
 
<h2> Key Competence Frameworks</h2>
 
<h2> Key Competence Frameworks</h2>
<p>While many large companies have defined their own sets of skills for purposes of talent management (to recruit, retain, and further develop the highest quality staff members that they can find, afford and hire), the advancement of EIT professionalism will require common definitions of EIT skills that can be used not just across enterprises, but also across countries. We have selected 3 major sources of skill definitions. While none of them is used universally, they provide a good cross-section of options. </p>
+
<p>While many large companies have defined their own sets of skills for purposes of talent management (to recruit, retain, and further develop the highest quality staff members that they can find, afford and hire), the advancement of EIT professionalism will require common definitions of EIT skills that can be used not just across enterprises, but also across countries. We have selected three major sources of skill definitions. While none of them is used universally, they provide a good cross-section of options. </p>
 
+
<p>Creating mappings between these frameworks and our chapters is challenging, because they come from different perspectives and have different goals. There is rarely a 100 percent correspondence between the frameworks and our chapters, and, despite careful consideration some subjectivity was used to create the mappings. Please take that in consideration as you review them.</p>
 
<h3>Skills Framework for the Information Age</h3>
 
<h3>Skills Framework for the Information Age</h3>
<p> The Skills Framework for the Information Age (SFIA) has defined nearly 100 skills. SFIA describes 7 levels of competency which can be applied to each skill. Not all skills, however, cover all seven levels. Some reach only partially up the seven step ladder. Others are based on mastering foundational skills, and start at the fourth or fifth level of competency. It is used in nearly 200 countries, from Britain to South Africa, South America, to the Pacific Rim, to the United States. (http://www.sfia-online.org)</p>
+
<p>The Skills Framework for the Information Age (SFIA) has defined nearly 100 skills. SFIA describes seven levels of competency that can be applied to each skill. However, not all skills cover all seven levels. Some reach only partially up the seven-step ladder. Others are based on mastering foundational skills, and start at the fourth or fifth level of competency. SFIA is used in nearly 200 countries, from Britain to South Africa, South America, to the Pacific Rim, to the United States. (http://www.sfia-online.org)</p>
<p>SFIA skills have not yet been defined for the this chapter.</p>
+
<p>SFIA skills have not yet been defined for this chapter.</p>
 
+
 
<!--  
 
<!--  
<table cellpadding="5" border="1">
+
<table cellpadding="5" border="1">
<tr>
+
<tr><th style="background-color: #58ACFA;"><font color="white">Skill</font></th><th style="background-color: #58ACFA;"><font color="white">Skill Description</font></th><th width="10%" style="background-color: #58ACFA;"><font color="white">Competency Levels</font></th></tr>
<th>Skill</th>
+
<th>Skill Description</th>
+
<th width="10%">Competency Levels</th>
+
</tr>
+
 
<tr>
 
<tr>
 
<td valign="top">Skill</td>
 
<td valign="top">Skill</td>
 
<td>description</td>
 
<td>description</td>
<td valign="top" >levels</td>
+
<td valign="top">levels</td>
 
</tr>
 
</tr>
 
</table>
 
</table>
 
-->
 
-->
 
 
<h3>European Competency Framework</h3>
 
<h3>European Competency Framework</h3>
<p> The European Union’s European e-Competence Framework (e-CF) has 40 competences and is used by a large number of companies, qualification providers and others in public and private sectors across the EU. It uses five levels of competence proficiency (e-1 to e-5). No competence is subject to all five levels.</p>
+
<p>The European Union's European e-Competence Framework (e-CF) has 40 competences and is used by a large number of companies, qualification providers, and others in public and private sectors across the EU. It uses five levels of competence proficiency (e-1 to e-5). No competence is subject to all five levels.</p>
<p>The e-CF is published and legally owned by CEN, the European Committee for Standardization, and its National Member Bodies (www.cen.eu). Its creation and maintenance has been co-financed and politically supported by the European Commission, in particular, DG (Directorate General) Enterprise and Industry, with contributions from the EU ICT multi-stakeholder community, to support competitiveness, innovation, and job creation in European industry. The Commission works on a number of initiatives to boost ICT skills in the workforce.
+
<p>The e-CF is published and legally owned by CEN, the European Committee for Standardization, and its National Member Bodies (www.cen.eu). Its creation and maintenance has been co-financed and politically supported by the European Commission, in particular, DG (Directorate General) Enterprise and Industry, with contributions from the EU ICT multi-stakeholder community, to support competitiveness, innovation, and job creation in European industry. The Commission works on a number of initiatives to boost ICT skills in the workforce. Version 1.0 to 3.0 were published as CEN Workshop Agreements (CWA). The e-CF 3.0 CWA 16234-1 was published as an official European Norm (EN), EN 16234-1. For complete information, see http://www.ecompetences.eu. </p>
 
+
<table cellpadding="5" border="1">
Version 1.0 to 3.0 were published as CEN Workshop Agreements (CWA).  
+
<tr><th width="85%" style="background-color: #58ACFA;"><font color="white">e-CF Dimension 2</font></th><th width="15%" style="background-color: #58ACFA;"><font color="white">e-CF Dimension 3</font></th></tr>
The e-CF 3.0 CWA 16234-1 was published as an official European Norm (EN), EN 16234-1.
+
<tr><td valign="top"><strong>E.3. Risk Management (MANAGE)</strong><br />Implements the management of risk across information systems through the application of the enterprise-defined risk management policy and procedure. Assesses risk to the organization's business, including web, cloud, and mobile resources. Documents potential risk and containment plans. </td><td valign="top">Level 2-4</td></tr>
For complete information, please see http://www.ecompetences.eu. </p>
+
 
+
<table cellpadding="5" border="1">
+
<tr>
+
<th width="20%">e-CF Dimension 1</th><th width="40%">e-CF Dimension 2</th><th width="40%">e-CF Dimension 3</th>
+
</tr>
+
<tr>
+
<td valign="top"><em><strong>C. Run</strong></em></td>
+
<td valign="top"><strong>C.4. Problem Management</strong><br />Identifies and resolves the root cause of incidents. Takes a proactive approach to avoidance or identification of root cause of ICT problems. Deploys a knowledge system based on recurrence of common errors. Resolves or escalates incidents. Optimises system or component performance.</td>
+
<td valign="top"><ul>
+
<li>Level 2: Identifies and classifies incident types and service interruptions. Records incidents cataloguing them by symptom and resolution.
+
</li>
+
<li>Level 3:  Exploits specialist knowledge and in-depth understanding of the ICT infrastructure and problem management process to identify failures and resolve with minimum outage. Makes sound decisions in emotionally charged environments on appropriate action required to minimise business impact. Rapidly identifies failing component, selects alternatives such as repair, replace or reconfigure. </li>
+
<li>Level 4: Provides leadership and is accountable for the entire problem management process. Schedules and ensures well trained human resources, tools, and diagnostic equipment are available to meet emergency incidents. Has depth of expertise to anticipate critical component failure and make provision for recovery with minimum downtime. Constructs escalation processes to ensure that appropriate resources can be applied to each incident.</li></ul></td>
+
</tr>
+
<tr><td valign="top"><em><strong>E. Manage</strong></em></td>
+
<td valign="top"><strong>E.3. Risk Management</strong><br />Implements the management of risk across information system s through the application of the enterprise defined risk management policy and procedure. Assesses risk to the organisation’s business, including web, cloud and mobile resources. Documents potential risk and containment plans. </td>
+
<td valign="top"><ul>
+
<li>Level 2: Understands and applies the principles of risk management and investigates ICT solutions to mitigate identified risks.</li>
+
<li>Level 3: Decides on appropriate actions required to adapt security and address risk exposure. Evaluates, manages and ensures validation of exceptions; audits ICT processes and environment.</li>
+
<li>Level 4: Provides leadership to define and make applicable a policy for risk management by considering all the possible constraints, including technical, economic and political issues. Delegates assignments.</li></ul></td>
+
</tr>
+
 
+
 
</table>
 
</table>
 
+
<h3>i&nbsp;Competency Dictionary </h3>
<h3>i-Competency Dictionary </h3>
+
<p>The Information Technology Promotion Agency (IPA) of Japan has developed the i&nbsp;Competency Dictionary (iCD) and translated it into English, and describes it at https://www.ipa.go.jp/english/humandev/icd.html. The iCD is an extensive skills and tasks database, used in Japan and southeast Asian countries. It establishes a taxonomy of tasks and the skills required to perform the tasks. The IPA is also responsible for the Information Technology Engineers Examination (ITEE), which has grown into one of the largest scale national examinations in Japan, with approximately 600,000 applicants each year.</p>
<p>The Information Technology Promotion Agency (IPA) of Japan has developed the i-Competency Dictionary (iCD), translated it into English, and describes it at https://www.ipa.go.jp/english/humandev/icd.html. It is an extensive skills and tasks database, used in Japan and southeast Asian countries. It establishes a taxonomy of tasks and the skills required to perform the tasks. The IPA is also responsible for the Information Technology Engineers Examination (ITEE), which has grown into one of the largest scale national examinations in Japan, with approximately 600,000 applicants each year. </p>
+
<p>The iCD consists of a Task Dictionary and a Skill Dictionary. Skills for a specific task are identified via a "Task x Skill" table. (See [http://eitbokwiki.org/Glossary Appendix A] for the task layer and skill layer structures.) EITBOK activities in each chapter require several tasks in the Task Dictionary. </p>
 
+
<p>The table below shows a sample task from iCD Task Dictionary Layer 2 (with Layer 1 in parentheses) that corresponds to activities in this chapter. It also shows the Layer 2 (Skill Classification), Layer 3 (Skill Item), and Layer 4 (knowledge item from the IPA Body of Knowledge) prerequisite skills associated with the sample task, as identified by the Task x Skill Table of the iCD Skill Dictionary. The complete iCD Task Dictionary (Layer 1-4) and Skill Dictionary (Layer 1-4) can be obtained by returning the request form provided at http://www.ipa.go.jp/english/humandev/icd.html. </p>
<p>The iCD consists of a Task Dictionary and a Skill Dictionary. Skills for a specific task are identified via a “Task x Skill” table. (Please see Appendix A for the task layer and skill layer structures.) EITBOK activities in each chapter require several tasks in the Task Dictionary. </p>
+
 
+
<p>The table below shows a sample task from iCD Task Dictionary Layer 2 (with Layer 1 in parentheses) that correspond to activities in this chapter. It also shows the Layer 2 (Skill Classification), Layer 3 (Skill Item), and Layer 4 (knowledge item from the IPA Body of Knowledge) prerequisite skills associated with the sample task, as identified by the Task x Skill Table of the iCD Skill Dictionary. The complete iCD Task Dictionary (Layer 1-4) and Skill Dictionary (Layer 1-4) can be obtained by returning the request form provided at http://www.ipa.go.jp/english/humandev/icd.html.
+
 
+
 
<table cellpadding="5" border="1">
 
<table cellpadding="5" border="1">
 +
<tr><th width="15%" style="background-color: #58ACFA;" font-size="14pt"><font color="white">Task Dictionary</font></th><th colspan="3" style="background-color: #58ACFA;" font-size="14pt"><font color="white">Skill Dictionary</font></th></tr>
 +
<tr><th width="30%" style="background-color: #58ACFA;"><font color="white">Task Layer 1 (Task Layer 2)</font></th><th width="15%" style="background-color: #58ACFA;"><font color="white">Skill Classification</font></th><th width="15%" style="background-color: #58ACFA;"><font color="white">Skill Item</font></th><th width="40%" style="background-color: #58ACFA;"><font color="white">Associated Knowledge Items</font></th></tr>
 
<tr>
 
<tr>
<th font-size="14pt">Task Dictionary</th><th colspan="3">Skill Dictionary</th>
+
<td valign="top"><em><strong>Formulation of business continuity plan <br />(business continuity management)</strong></em></td>
</tr>
+
<td valign="top">Business continuity planning (BCP)</td>
<tr>
+
<td valign="top">BCP formulation methods</td>
<th width="30%">Task Layer (Task Area)</th><th  width="15%">Skill Classification</th><th  width="15%">Skill Item</th><th width="40%">Associated Knowledge Items</th>
+
</tr>
+
 
+
<tr>
+
<td valign="top"><em><strong>Formulation of business continuity plan <br />(Business continuity management)</strong></em></td>
+
<td valign="top">Business continuity planning (BCP)
+
</td>
+
<td valign="top">BCP formulation methods
+
 
+
 
<td> <ul>
 
<td> <ul>
 
<li>Risk analysis</li>
 
<li>Risk analysis</li>
Line 277: Line 282:
 
<li>Clarification of implementation standards</li>
 
<li>Clarification of implementation standards</li>
 
<li>Recovery prioritization</li>
 
<li>Recovery prioritization</li>
<li>Setting of target recovery time</li>
+
<li>Setting target recovery time</li>
 
</ul>
 
</ul>
 
</td>
 
</td>
 
</tr>
 
</tr>
 
</table>
 
</table>
 
 
<h2>Key Roles</h2>
 
<h2>Key Roles</h2>
<p>These roles are are common to ITSM:</p>
+
<p>These roles are common to ITSM:</p>
 
<ul>
 
<ul>
<li>IT Service Continuity Manager</li>
+
<li>Financial Manager</li>
 +
<li>Facilities Manager</li>
 +
<li>EIT Service Continuity Manager</li>
 
<li>Risk Manager</li>
 
<li>Risk Manager</li>
 +
</ul>
 +
<p>Other roles include:</p>
 +
<ul>
 +
<li>Disaster recovery team</li>
 
<li>Information Security Manager</li>
 
<li>Information Security Manager</li>
<li>Financial Manager</li>
+
<li>Operations management team</li>
 +
<li>Service Manager</li>
 +
<li>System specialists</li>
 +
<li>Test team</li>
 
</ul>
 
</ul>
 
 
<h2>Standards</h2>
 
<h2>Standards</h2>
<p> ANSI/ASIS SPC.1-2009. Organizational Resilience: Security, Preparedness and Continuity Management Systems—Requirements with Guidance for Use</p>
+
<p>ANSI/ASIS SPC.1-2009. Organizational Resilience: Security, Preparedness and Continuity Management Systems—Requirements with Guidance for Use</p>
<p>ISO 22301:2012, Societal security -- Business continuity management systems --- Requirements </p>
+
<p>ISO 22301:2012, Societal security—Business continuity management systems—Requirements </p>
<p> ISO/IEC 20000-1:2011, (IEEE Std 20000-1:2013) Information technology – Service management – Part 1: Service management system requirements</p>
+
<p>ISO/IEC 20000-1:2011, (IEEE Std 20000-1:2013) Information technology—Service management—Part 1: Service management system requirements</p>
<p>ISO/IEC 27031:2011, Information technology -- Security techniques -- Guidelines for information and communication technology readiness for business continuity </p>
+
<p>ISO/IEC 27031:2011, Information technology—Security techniques—Guidelines for information and communication technology readiness for business continuity </p>
 
+
 
+
 
<h2>References</h2>
 
<h2>References</h2>
 
<div id="One"></div><p>[1] Systems and Software Engineering Vocabulary. (2009). ISO/IEC 24765 </p>
 
<div id="One"></div><p>[1] Systems and Software Engineering Vocabulary. (2009). ISO/IEC 24765 </p>
Line 310: Line 320:
 
<h2>Related and Informing Disciplines</h2>
 
<h2>Related and Informing Disciplines</h2>
 
<ul>
 
<ul>
 +
<li>Application lifecycle management</li>
 
<li>Business continuity management</li>
 
<li>Business continuity management</li>
<li>Application life-cycle management</li>
 
<li>Risk management</li>
 
 
<li>Change management</li>
 
<li>Change management</li>
<li>Enterprise and business architecture</li>
 
 
<li>Configuration management</li>
 
<li>Configuration management</li>
 +
<li>Enterprise and business architecture</li>
 +
<li>Risk management</li>
 
<li>Testing and validation</li>
 
<li>Testing and validation</li>
 
</ul>
 
</ul>
Line 323: Line 333:
 
<ul>
 
<ul>
 
<li>Scope of the plan</li>
 
<li>Scope of the plan</li>
<li>Objectives &mdash; RTO, RPO</li>
+
<li>Objectives—RTO, RPO</li>
 
<li>Authority</li>
 
<li>Authority</li>
 
<li>Distribution</li>
 
<li>Distribution</li>

Latest revision as of 01:24, 23 December 2017

Welcome to the initial version of the EITBOK wiki. Like all wikis, it is a work in progress and may contain errors. We welcome feedback, edits, and real-world examples. Click here for instructions about how to send us feedback.
Ieee logo 1.png
Acm logo 3.png

 

1 Introduction

Disaster preparedness and disaster recovery (DR) support business-continuity planning and include planning for Enterprise information technology (EIT) resiliency, as well as recovery from adversity, so that critical business services affected are restored to a satisfactory working state within an acceptable timeframe after an event.

DR can be defined as "in computer system operations, the return to normal operation after a hardware or software failure." [1] Also, the "activities and programs designed to return the organization to an acceptable condition. And the ability to respond to an interruption in services by implementing a disaster recovery plan to restore an organization's critical business functions." [2]

This chapter defines these processes and deliverables, and who should be responsible for planning, creating the documents, and communicating if a disaster occurs. The following are some examples for context:

  • Examples of disasters
    • Natural disaster affecting datacenters or EIT service operations (flood, fire, earthquake, wind)
    • Security breach resulting in a disaster (destruction of data, admin password changes, virus/malware installation, sabotage)
    • Usage error (accidental deletion, unplug/turn off system resulting in corruption)
    • Utility failure affecting datacenters (loss of power even after UPS)
    • Vendor failure (cloud provider security failure, oil spill)
    • Staffing issue (employment dispute/walkout, epidemic)
  • Examples of unpreparedness
    • Requiring use of computers or printers when power is out
    • Requiring use of Internet when power or connectivity is out
    • Single point of knowledge/control for administration access
    • Lack of offsite backup storage
    • Lack of working restoration from backups
    • Lack of failover datacenters in separate locations
    • Undocumented or out-of-date documentation for system interfaces
    • Requiring use of phones that are out of power
    • Lack of designation of leaders in restoration efforts (who is in charge of restoring service and they know they are in charge)
    • In general, no cohesive, comprehensive EIT service restoration plan

2 Goals and Principles

EIT organizations are responsible for the following goals:

  • To document and plan for appropriate backup and recovery processes for all systems, and priority of systems for restoration.
  • To create and deploy an EIT disaster recovery plan.
  • To ensure that the business has business-continuity processes in place in case of a disaster.

The fundamental principles of disaster recovery depend on the business functions within the enterprise, and how critical each is to the health of the business. There are several methods for determining criticality of functions:

  • Hierarchy of need as stated in SLAs, which is that the most critical business functions should be restored first, or in the first phase of disaster recovery.
  • Keep the lights on (KTLO) or keep the business running (KTBR), which are not the same thing.
  • All non-critical services are in the final phase of recovery.
  • Industry-specific, so all systems delivering lifesaving functions are the highest priority for recovery efforts, whereas administration systems wait for second or third wave of recovery.

However, a fundamental recovery principle is that all systems to be recovered should be attended to within the specifications for recovery time objectives (RTOs) and recovery point objectives (RPOs) laid out by the business in the DR plan.

3 Context Diagram

07 Disaster Preparedness CD.png
Figure 1. Context Diagram for Disaster Preparedness and Recovery

3.1 Gather Inputs

The following inputs are necessary for this process to initiate or continue:

The obvious business driver is to reduce risk for the business, by providing both mitigation strategies and contingency plans. High-risk projects or operational inefficiencies can lead to lost business, which ultimately causes lost income for the business—this can be the high price of risk.

Another business driver for formal DR processes may be to meet regulatory (i.e., SOX) or sustainability objectives. Part of the information gathering includes conducting workshops or interviews to document the drivers to ensure that deliverables meet these requirements.

Another related information-gathering effort is to define and document the technical drivers driving DR, including aging technology and lack of application-support capabilities.

4 Description of Activities

4.1 Business Impact Analysis

4.1.1 Define Critical Business Services

The first activity is to define services critical to operations. Critical services are those that, if missing, would mean that the enterprise could no longer meet commitments and deliver business products or services. Use business impact analysis, and get input from the business, such as the risk management group, the business continuity management, audit departments, and executives. Use business process diagrams to assist with analysis.

The first activity is to define services critical to operations. Critical services are those that, if missing, would mean that the enterprise could no longer meet commitments and deliver business products or services. Use business impact analysis, and get input from the business, such as the risk management group, the business continuity management, audit departments, and executives. Use business process diagrams to assist with analysis.

The following list is a suggested structure for determining the service categories and corresponding criticality of organizations services (for definitions of the categories, refer to [3]):

  • Mission critical
  • Business critical
  • Business operational
  • Administrative services [3]

Examples of typical critical services within an enterprise are safety processes, safety documentation management, communication polices and processes, and financial data and processes.

4.1.2 Map Critical Business Services to EIT Services

This function is often referred to as building an EIT service catalog, which is an important input to disaster recovery planning. A service catalog is "a database or structured document with information about all live EIT services, including those available for deployment…The service catalog includes information about deliverables, prices, contact points, ordering, and request processes." [4] There are templates to assist with this mapping. [5] See the Operations and Support chapter for more information on service catalogs.

4.1.3 Define Relevant Disaster Scenarios and Responsible Parties

Clearly define criteria for who declares a disaster, including when and how. Mature organizations have assigned who is in charge during disasters so that there is a clear leader who can decide which processes and procedures to implement, and who knows to follow the communication plan. If no plan is in place, it allows for invalid assumptions about who is in charge, including no one taking responsibility, or multiple parties competing to be in charge, neither of which helps resolve the disaster and recover service.

4.1.4 Define Successive Waves for Extending Recovery Across the Business

Due to the complex nature of EIT systems within the enterprise today, it is unrealistic to provide recovery for all services in the initial recovery phase. There are different levels of recovery for different tiers of business services, and a corresponding, agreed-to timeframe for recovery of each service within the enterprise. These waves of recovery begin with the most critical services, and move through to the least critical in an acceptable timeframe based on a risk-mitigation process. For example, level one (i.e., Tier 1) recovery may take place within 72 hours of a disaster and would include services such as product production, shipping, and customer-service applications. Note: A non-critical service may be recovered in the first pass of recovery based solely on a critical service having it as a dependency.

Critical systems management is a useful process in the identification and documentation of critical systems. [6] Also, it ensures that proper application lifecycle management is occurring for these EIT services. [7]

Use risk-assessment techniques to analyze how disaster scenarios could adversely affect the business. One such process would be to tier possible risks into levels such as:

  • Affecting the entire enterprise
  • Affecting only certain business units
  • Affecting a single component (either a technology component or a business unit)
  • Affecting a single business function (such as processing credit card transactions)

4.2 Recovery Objectives and DR Plan

4.2.1 Determine Recovery Objectives and Develop Plan

In cooperation with the business, define the recovery point objective (RPO) and recovery time objective (RTO).

RPO is the point in time to which all integrated systems are recovered, taking into account backup schedules, sync points, and data-transfer points to ensure data quality and integrity.

RTO is how long it will take to return an EIT service to active duty. This varies depending on the criticality of the service as well as how integrated the service is with other services.

RecoveryTimeline.jpg
Figure 2. Recovery Timeline

Configuration management is a process that helps document the business impact of a service, as well as documenting the backup and recovery requirements. Also, it provides an inventory of the applications and supporting infrastructure needed in the restoration processes.

Organization and Culture

The risk tolerance and depth of capabilities within the organization have a large impact on the organization's disaster preparedness level. In other words, the business's disaster tolerance is the "the time gap the business can accept the non-availability of EIT facilities." [2] The lower the tolerance, the more extensive and costly DR practices and techniques are deployed.

Also, the business product deliveries determine the requirements of the planning effort and metrics.

4.2.2 Develop Communications Plan

An effective communication plan is an essential component to the successful implementation and adoption of the DR processes. The communication plan should include:

  • How to deliver communications when standard communication systems are unavailable (such as email or phone systems)
  • Who to contact in a disaster situation, including specific lists for specific situations or systems affected
  • What information each communication should and shouldn't include

Contact information lists should include the following stakeholders:

  • External partners (service providers and suppliers)
  • Police/fire/municipal departments
  • EIT management and staff
  • Business management and product owners

The DR communication plan should describe the process to provide business updates to business-continuity plans after the recovery has been completed.

A process for disaster declaration needs to be included in the DR plan and be well communicated to the team. In this section, all contact information and approval authority should be spelled out (i.e., who has the authority to declare a disaster within the company).

4.2.3 Develop Backup and Archive Strategies and Schedule

  • The EIT team responsible for DR is either responsible for backup and recovery or works closely with the team who is. Archiving and incremental backups need to be scheduled for the varying needs of the systems being supported. Backup standards and recovery strategies should be defined to ensure the business requirements are met.
  • Backup and storage technology has a large role to play in the recoverability of applications and systems. Current backup utilities provide incremental forever-backup processes, which can help reduce the cost of storage used for holding backups. In addition, architecture features such as high-availability options and failover redundancies can both reduce risk of service loss, and provide mitigation strategies for unstable or unreliable systems.

4.2.4 Develop and Document DR Plan

A disaster recovery plan (DRP) is "a set of human, physical, technical, and procedural resources to recover, within a defined time and cost, an activity interrupted by an emergency or disaster." [2]

The DR plan document needs to include all the information required to recover all critical systems that a business needs to operate. EIT must work with the business to develop and document a DR plan. See the template at the end of this chapter for recommended sections of a DR plan.

Data collection techniques are critical to the development of a meaningful DR plan that meets the needs of the business.

4.2.5 Interface with Business Continuity

The EIT team must communicate their processes to the business, and make consistent updates to the business-continuity plan (BCP). As new business components or services are added, the business assigns a criticality level, which then needs to be translated into EIT services that are assigned internally to a tier to determine the disaster recovery requirements. The relationship between business continuity and EIT disaster recovery is symbiotic and is critical to the success of both functions within the enterprise.

4.3 Implement and Test DR plan (Drill or Simulation)

The first step to implementing a DR plan is to allocate resources and assign responsibilities. The DR team needs to be assigned early in the process to ensure accountability and an understanding of roles at the time of a disaster. Many different roles are needed to define and execute a successful DR plan. The DR test is an opportunity to cross train roles, to mitigate the risk of key roles not being available if a disaster occurs. It is likely that no one from the business DR team will be available for the recovery of the systems, so documentation, testing, and assigning a strategic partner is important to the recovery of business services.

4.3.1 Roles and Responsibilities

Input supplier roles are roles and teams that supply the inputs to the process:

  • Enterprise risk-management team
  • BCP manager
  • EIT managers
  • Enterprise architecture team
  • Solution management team

Key roles are the responsible individuals or teams that perform the process:

  • DR team leads
  • Test team
  • Recovery center manager
  • System specialists (multiple)
  • Business management team
    • Facilities manager
    • Service manager

User roles expect and receive the deliverables:

  • Operations management team
    • Backup process manager
  • Test manager
  • Business management team

Stakeholder roles are informed or consulted on the process execution:

  • Enterprise risk management team
  • Operations management team
    • Business continuity manager
  • Business management team
    • Contract manager

4.3.2 Document Recovery Strategies

As mentioned above, there are many strategies to recover the services that the business needs to function. There is a different solution for every service. The most important element is to choose a strategy, then document and communicate it.

  • Use a third-party hot recovery site. The hot site should be in a geographically separate location to ensure that a natural disaster does not take out both the primary production location as well as the backup site location. These distances vary depending on geographic and infrastructure dependences (such as power, water, and network commonalities).
  • Real-time mirroring is a technique used to replicate data to a geographically separate location to ensure that data is available if a restore processes is needed.
  • Manual, non-standard, or ad hoc/on-demand/unscheduled procedures are an important aspect that is a responsibility of the business units to ensure business continuity while EIT is rebuilding system services. Recommend to business management that manual processes either be automated, or have testing be completed on a regular basis. Document the methods used to mitigate problems caused by aging technology, such as having parts inventories, and redundant or cold-standby equipment.
  • Offsite data archiving ensures that backups are available if a disaster makes the primary site unavailable. Offsite services are available through many service providers. Due diligence by the DR team is important to ensure that the offsite facilities can guarantee secure and proper handling of backup data, which is an important enterprise asset.
  • Action plans and recovery processes differ depending on what type of disaster has occurred. A single-component failure results in a standalone recovery of the failing component (such as an application, server, or appliance). An enterprise-wide disaster results in a disaster declaration event with a full DR plan being executed with the full DR team being mobilized.
  • Identify and document potential disaster scenarios that have a high probability. For example, intrusion or denial of service attacks could have adverse effects on a technology company, whereas adverse environment conditions create higher risks to a construction company.

4.3.3 Define a Schedule for Service Continuity Testing

For the success of the recovery plan, it is critical to define a schedule for disaster recovery testing. One method often used is to simulate a disaster to test system recovery. Another process strongly recommended is to have production support test refreshes on a regular (i.e., monthly) basis. This not only ensures that backups are usable, but also that processes are well documented and functional, ensuring data quality and integration integrity.

4.3.4 Implement and Test DR Plan (Drill or Simulation)

  • Implementation can take many forms. A hot-site contract is an agreement with a third-party vendor to provide the facilities and infrastructure needed to restore agreed to services in the timeframe specified. There are many variants to this type of contract depending on the dollar value of the contract and the expected availability of internal staff at the time of a disaster. If the hot sites are geographically distant from the enterprise offices, it is likely the contract includes staff to perform the recovery as well.
  • Due to the size and complexity of many enterprises, in-house DR facilities are often the norm, meaning these are secondary facilities used as recovery centers for primary facilities, if needed.
  • A useful metric from testing processes is the timing of the actual recovery procedures as well as a measure of the capabilities of the DR team, third-party, or secondary facilities, and the level of maturity of both staff knowledge and processes accuracy.

4.4 DR Plan—Change Management

Regular verification and updates to backup processes are necessary to ensure that accurate and usable backups are delivered. This change-management process needs to provide updates to the documentation of the backup and recovery processes. For example:

  • DR testing cycle changes as services change or risk tolerances change.
  • DR test results always cause process improvements and lessons learned to be added to the documentation.
  • Updates and changes to the business-continuity plan (BCP) go hand in hand with the changes to systems and services.

Mature organizations build continual improvement evaluation and activities into all processes.

4.4.1 Update DR Plan Based on DR Test Results and Validation

Validation metrics are measurements that quantify the success of processes, based on the requirements and goals of the business. The following measures can be used to determine the success of a DR test or simple restore procedure execution:

  • Recovery point objectives met
  • Recovery time objectives met
  • Testing result measurements (for example, timing of restore, accuracy of data, and integration points)
  • Verification of backup usability

5 Summary

Like most processes, DR processes are a closed loop of plan > build > test > review with action. Continuous improvement and maturity of these processes are obtained through the regular execution of DR tests, measuring results, and then revising the DR plan as necessary. Stakeholder involvement with setting requirements is critical to the success of DR processes.

6 Key Maturity Frameworks

Capability maturity for EIT refers to its ability to reliably perform. Maturity is measured by an organization's readiness and capability expressed through its people, processes, data, technologies, and the consistent measurement practices that are in place. See Appendix F for additional information about maturity frameworks.

Many specialized frameworks have been developed since the original Capability Maturity Model (CMM) that was developed by the Software Engineering Institute in the late 1980s. This section describes how some of those apply to the activities described in this chapter.

6.1 IT-Capability Maturity Framework (IT-CMF)

The IT-CMF was developed by the Innovation Value Institute in Ireland. This framework helps organizations to measure, develop, and monitor their EIT capability maturity progression. It consists of 35 EIT management capabilities that are organized into four macro capabilities:

  • Managing EIT like a business
  • Managing the EIT budget
  • Managing the EIT capability
  • Managing EIT for business value

The three most relevant critical capabilities are technical infrastructure management (TIM), information security management (ISM), and enterprise information management (EIM).

6.1.1 Technical Infrastructure Management Maturity

The following statements provide a high-level overview of the technical infrastructure management (TIM) capability at successive levels of maturity.

Level 1Management of the EIT infrastructure is reactive or ad hoc.
Level 2Documented policies are emerging relating to the management of a limited number of infrastructure components. Predominantly manual procedures are used for EIT infrastructure management. Visibility of capacity and utilization across infrastructure components is emerging.
Level 3Management of infrastructure components is increasingly supported by standardized tool sets that are partly integrated, resulting in decreased execution times and improving infrastructure utilization.
Level 4Policies related to EIT infrastructure management are implemented automatically, promoting execution agility and achievement of infrastructure utilization targets.
Level 5The EIT infrastructure is continually reviewed so that it remains modular, agile, lean, and sustainable.

6.1.2 Information Security Management Maturity

The following statements provide a high-level overview of the information security management (ISM) capability at successive levels of maturity.

Level 1The approach to information security tends to be localized. Incidents are typically not responded to in a timely manner.
Level 2Defined security approaches, policies, and controls are emerging, primarily focused on complying with regulations.
Level 3Standardized security approaches, policies, and controls are in place across the EIT function, dealing with access rights, business continuity, budgets, toolsets, incident response management, audits, non-compliance, and so on.
Level 4Comprehensive security approaches, policies, and controls are in place and are fully integrated across the organization.
Level 5Security approaches, policies, and controls are regularly reviewed to maintain a proactive approach to preventing security breaches.

6.1.3 Enterprise Information Management Maturity

The following statements provide a high-level overview of the enterprise information management (EIM) capability at successive levels of maturity.

Level 1Management has limited awareness of information management opportunities.
Level 2Basic and discrete information management approaches are in place, typically by function or line of business.
Level 3Standardized information management policies, standards, and controls are in place across the EIT function, enabling formal oversight of all aspects of information management.
Level 4Comprehensive information management policies, standards, and controls are in place across the organization. Business intelligence and analysis are recognized as key to organizational success.
Level 5Information management policies, standards, and controls are continually reviewed based on agreed risk tolerance factors. Their scope effectively extends to key business ecosystem partners.

7 Key Competence Frameworks

While many large companies have defined their own sets of skills for purposes of talent management (to recruit, retain, and further develop the highest quality staff members that they can find, afford and hire), the advancement of EIT professionalism will require common definitions of EIT skills that can be used not just across enterprises, but also across countries. We have selected three major sources of skill definitions. While none of them is used universally, they provide a good cross-section of options.

Creating mappings between these frameworks and our chapters is challenging, because they come from different perspectives and have different goals. There is rarely a 100 percent correspondence between the frameworks and our chapters, and, despite careful consideration some subjectivity was used to create the mappings. Please take that in consideration as you review them.

7.1 Skills Framework for the Information Age

The Skills Framework for the Information Age (SFIA) has defined nearly 100 skills. SFIA describes seven levels of competency that can be applied to each skill. However, not all skills cover all seven levels. Some reach only partially up the seven-step ladder. Others are based on mastering foundational skills, and start at the fourth or fifth level of competency. SFIA is used in nearly 200 countries, from Britain to South Africa, South America, to the Pacific Rim, to the United States. (http://www.sfia-online.org)

SFIA skills have not yet been defined for this chapter.

7.2 European Competency Framework

The European Union's European e-Competence Framework (e-CF) has 40 competences and is used by a large number of companies, qualification providers, and others in public and private sectors across the EU. It uses five levels of competence proficiency (e-1 to e-5). No competence is subject to all five levels.

The e-CF is published and legally owned by CEN, the European Committee for Standardization, and its National Member Bodies (www.cen.eu). Its creation and maintenance has been co-financed and politically supported by the European Commission, in particular, DG (Directorate General) Enterprise and Industry, with contributions from the EU ICT multi-stakeholder community, to support competitiveness, innovation, and job creation in European industry. The Commission works on a number of initiatives to boost ICT skills in the workforce. Version 1.0 to 3.0 were published as CEN Workshop Agreements (CWA). The e-CF 3.0 CWA 16234-1 was published as an official European Norm (EN), EN 16234-1. For complete information, see http://www.ecompetences.eu.

e-CF Dimension 2e-CF Dimension 3
E.3. Risk Management (MANAGE)
Implements the management of risk across information systems through the application of the enterprise-defined risk management policy and procedure. Assesses risk to the organization's business, including web, cloud, and mobile resources. Documents potential risk and containment plans.
Level 2-4

7.3 i Competency Dictionary

The Information Technology Promotion Agency (IPA) of Japan has developed the i Competency Dictionary (iCD) and translated it into English, and describes it at https://www.ipa.go.jp/english/humandev/icd.html. The iCD is an extensive skills and tasks database, used in Japan and southeast Asian countries. It establishes a taxonomy of tasks and the skills required to perform the tasks. The IPA is also responsible for the Information Technology Engineers Examination (ITEE), which has grown into one of the largest scale national examinations in Japan, with approximately 600,000 applicants each year.

The iCD consists of a Task Dictionary and a Skill Dictionary. Skills for a specific task are identified via a "Task x Skill" table. (See Appendix A for the task layer and skill layer structures.) EITBOK activities in each chapter require several tasks in the Task Dictionary.

The table below shows a sample task from iCD Task Dictionary Layer 2 (with Layer 1 in parentheses) that corresponds to activities in this chapter. It also shows the Layer 2 (Skill Classification), Layer 3 (Skill Item), and Layer 4 (knowledge item from the IPA Body of Knowledge) prerequisite skills associated with the sample task, as identified by the Task x Skill Table of the iCD Skill Dictionary. The complete iCD Task Dictionary (Layer 1-4) and Skill Dictionary (Layer 1-4) can be obtained by returning the request form provided at http://www.ipa.go.jp/english/humandev/icd.html.

Task DictionarySkill Dictionary
Task Layer 1 (Task Layer 2)Skill ClassificationSkill ItemAssociated Knowledge Items
Formulation of business continuity plan
(business continuity management)
Business continuity planning (BCP) BCP formulation methods
  • Risk analysis
  • Business continuity and identification of bottlenecks
  • Clarification of implementation standards
  • Recovery prioritization
  • Setting target recovery time

8 Key Roles

These roles are common to ITSM:

  • Financial Manager
  • Facilities Manager
  • EIT Service Continuity Manager
  • Risk Manager

Other roles include:

  • Disaster recovery team
  • Information Security Manager
  • Operations management team
  • Service Manager
  • System specialists
  • Test team

9 Standards

ANSI/ASIS SPC.1-2009. Organizational Resilience: Security, Preparedness and Continuity Management Systems—Requirements with Guidance for Use

ISO 22301:2012, Societal security—Business continuity management systems—Requirements

ISO/IEC 20000-1:2011, (IEEE Std 20000-1:2013) Information technology—Service management—Part 1: Service management system requirements

ISO/IEC 27031:2011, Information technology—Security techniques—Guidelines for information and communication technology readiness for business continuity

10 References

[1] Systems and Software Engineering Vocabulary. (2009). ISO/IEC 24765

[2] ISACA. (n.d.). http://www.isaca.org/Pages/Glossary.aspx

[3] ITIL Service Catalogue: How to produce a Service Catalogue; http://www.itilnews.com/ITIL_Service_Catalogue_How_to_produce_a_Service_Catalogue.html

[4] Introduction to the ITIL Service Lifecycle, Second Edition, Office of Government Commerce, 2010

[5] Dwight Kayto, Defining IT Services, Art of Change; http://www.artofchange.ca/images/documents/defining%20it%20services.pdf

[6] British Computing Society, BCS Delivery mission critical system, 2011; http://www.bcs.org/content/conWebDoc/43139
http://www.downloads.xdelta.co.uk/2011/2011_07_19-bcs-mission_critical-colin_butcher.pdf

[7] Realtech, Application Lifecycle Management, Diagram; http://www.realtech.com/wInternational/software/solutions/application-lifecycle-management/application-lifecycle-managementW3DnavanchorW262110100.php

11 Related and Informing Disciplines

  • Application lifecycle management
  • Business continuity management
  • Change management
  • Configuration management
  • Enterprise and business architecture
  • Risk management
  • Testing and validation

12 Disaster Recovery Plan Template

Here is an example template for a disaster recovery plan.

  1. Introduction
    • Scope of the plan
    • Objectives—RTO, RPO
    • Authority
    • Distribution
    • Disaster declaration process
    • Plan review
  2. Recovery
    • Recovery team
    • Recovery plan
    • Disaster preparation
    • Recovery tasks (short term and long term)
  3. Backup
    • Backup strategy for each critical system
  4. Contact information
    • Facilities information
    • Recovery team information
    • Other important business contacts