Online Recovery in Cluster Databases.


W. Liang, B. Kemme
Abstract:

Cluster based replication solutions are an attractive mechanism to provide both high-availability and scalability for the database backend within the multi-tier information systems of service-oriented businesses. An important issue that has not yet received sufficient attention is how database replicas that have failed can be reintegrated into the system or how completely new replicas can be added in order to increase the capacity of the system. Ideally, recovery takes place online, i.e, while transaction processing continues at the replicas that are already running. In this paper we present a complete online recovery solution for database clusters. One important issue is to find an efficient way to transfer the data the joining replica needs. In this paper, we present two data transfer strategies. The first transfers the latest copy of each data item, the second transfers the updates a rejoining replica has missed during its downtime. A second challenge is to coordinate this transfer with ongoing transaction processing such that the joining node does not miss any updates. We present a coordination protocol that can be used with Postgres-R, a replication tool which uses a group communication system for replica control. We have implemented and compared our transfer solutions against a set of parameters, and present heuristics which allow an automatic selection of the optimal strategy for a given configuration.
Int. Conf. on Extending Database Technology (EDBT), March 2008.