IBM Books

Planning Volume 2, Control Workstation and Software Environment

Difference between fault tolerance and high availability

Before planning whether to use a high availability control workstation, read the following section to understand the difference between high availability and fault tolerance.

Fault tolerance

The fault tolerant or continuous availability model relies on specialized hardware to detect a hardware fault and instantaneously switch to a redundant hardware component - whether the failed component is a processor, memory board, power supply, I/O subsystem, or storage subsystem.

Although this failover is apparently seamless and offers non-stop service, a high premium is paid in both hardware cost and performance because the redundant components do no processing.

More importantly, the fault tolerant model does not address software failures, by far the most common reason for down time.

High availability

The high availability or fault resiliency model views availability not as a series of replicated physical components, but rather as a set of system-wide, shared resources that cooperate to provide essential services.

High availability combines software with industry-standard hardware to minimize down time by quickly restoring services when a system, component, or application fails. While not instantaneous, restoring services is rapid, often less than a minute.

The distinguishing factor between fault tolerance and high availability is that a fault tolerant environment offers no service interruption, versus a minimal service interruption in a highly available environment. Many sites are willing to absorb a small amount of down time with high availability rather than pay the much higher cost of providing fault tolerance. Moreover, in most highly available configurations, the backup processors are available for use during normal operation.

[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]