and if the two systems do not share common disks, then the standby provides a
functional backup but has no access to the databases managed by the primary.
The passive standby is generally not referred to as a cluster. The term clusteris
reserved for multiple interconnected computers that are all actively doing processing while maintaining the image of a single system to the outside world. The term
active secondaryis often used in referring to this configuration. Three classifications
of clustering can be identified: separate servers, shared nothing, and shared memory.
In one approach to clustering, each computer is a separate serverwith its own
disks and there are no disks shared between systems (Figure 17.9a). This arrangement provides high performance as well as high availability. In this case, some type
of management or scheduling software is needed to assign incoming client requests
to servers so that the load is balanced and high utilization is achieved. It is desirable
to have a failover capability, which means that if a computer fails while executing an
application, another computer in the cluster can pick up and complete the application. For this to happen, data must constantly be copied among systems so that each
system has access to the current data of the other systems. The overhead of this data
exchange ensures high availability at the cost of a performance penalty.
To reduce the communications overhead, most clusters now consist of servers
connected to common disks (Figure 17.9b). In one variation on this approach, called
shared nothing, the common disks are partitioned into volumes, and each volume is
owned by a single computer. If that computer fails, the cluster must be reconfigured
so that some other computer has ownership of the volumes of the failed computer.
Table 17.2 Clustering Methods: Benefits and Limitations
Clustering Method Description Benefits Limitations
Passive Standby A secondary server takes
over in case of primary
server failure.
Easy to implement. High cost because the
secondary server is
unavailable for other
processing tasks.
Active Secondary: The secondary server is
also used for processing
Reduced cost because
secondary servers can be
used for processing.
Increased complexity.
Separate Servers Separate servers have
their own disks. Data is
continuously copied from
primary to secondary
High availability. High network and server
overhead due to copying
Servers Connected
to Disks
Servers are cabled to
the same disks, but each
server owns its disks. If
one server fails, its disks
are taken over by the
other server.
Reduced network and
server overhead due to
elimination of copying
Usually requires disk
mirroring or RAID
technology to
compensate for risk
of disk failure.
Servers Share
Multiple servers simultaneously share access to
Low network and server
overhead. Reduced risk
of downtime caused by
disk failure.
Requires lock manager
software. Usually used
with disk mirroring or
RAID technology.
It is also possible to have multiple computers share the same disks at the same
time (called the shared diskapproach), so that each computer has access to all of the
volumes on all of the disks. This approach requires the use of some type of locking
facility to ensure that data can only be accessed by one computer at a time.
Operating System Design Issues
Full exploitation of a cluster hardware configuration requires some enhancements
to a single-system operating system.
FAILURE MANAGEMENTHow failures are managed by a cluster depends on the
clustering method used (Table 17.2). In general, two approaches can be taken to
dealing with failures: highly available clusters and fault-tolerant clusters. A highly
available cluster offers a high probability that all resources will be in service. If a failure
occurs, such as a system goes down or a disk volume is lost, then the queries in progress
are lost. Any lost query, if retried, will be serviced by a different computer in the
cluster. However, the cluster operating system makes no guarantee about the state of
partially executed transactions. This would need to be handled at the application level.
A fault-tolerant cluster ensures that all resources are always available. This
is achieved by the use of redundant shared disks and mechanisms for backing out
uncommitted transactions and committing completed transactions.
The function of switching applications and data resources over from a failed
system to an alternative system in the cluster is referred to as failover. A related
function is the restoration of applications and data resources to the original system
once it has been fixed; this is referred to as failback. Failback can be automated, but
this is desirable only if the problem is truly fixed and unlikely to recur. If not, automatic failback can cause subsequently failed resources to bounce back and forth
between computers, resulting in performance and recovery problems.
