Database Replication

Data replication is very attractive in order to increase system throughput and provide fault-tolerance. However, it is a challenge to keep data copies consistent. Furthermore, in order to fully take advantage of the processing power of all replicas, adaptive load-balancing schemes are needed.

Conceptually, our work can be split into two branches:

Middleware-based replication

Middle-R is our middleware based replication tool. Clients connect to Middle-R via a JDBC driver and Middle-R forwards the requests to one of several database replicas. Each database replica is an instance of a non-replicated standard database system. We currently work with PostgreSQL. Middle-R can have one middleware instance, or one middleware instance for each database instance. Middle-R provides efficient, fast and consistent database replication for both cluster configurations (all replicas are within a LAN) and in WAN environments. Our approach provides fault-tolerance. Current  projects related to Middle-R:

Postgres-R

Postgres-R is an extension of the open-source relational database system PostgreSQL. Postgres-R provides efficient, fast and consistent database replication for cluster configuration. To address the performance and consistency challenges we exploit the rich semantics of group communication systems. In particular, the approach exploits the total order delivery semantics of the multicast primitives to guarantee the isolation of transactions (all sites serialize conflicting transactions according to the total order in which the group communication system delivers messages), and the reliable delivery of messages despite failures to provide fault-tolerance (the same messages are delivered to all available sites making it is easy for the surviving system to decide on the commit/abort of pending transactions). Our approach provides atomicity and the same isolation level in regard to concurrency control than the underlying PostgreSQL system (snapshot isolation). Furthermore, its performance is excellent. For update transactions, it adds an overhead of a few milliseconds in order to propagate changes to all replicas. By adding new replicas to the system, the read load can be distributed leading to excellent scalability. Postgres-R uses the Spread group communication system. The product uses software developed by Spread Concepts LLC for use in the Spread toolkit. For more information about Spread see http://www.spread.or.g The current status and project related to Postgres-R are as follows.

Collaboration

Part of the work of both projects has been performed in the context of the Adapt project (Middleware Technologies for Adaptive and Composable Distributed Components). Adapt was a RTD project funded by the Information Sociaty Technologies Programme of the European Commision under FP5, and the Programme de soutien à la recherche (PSR) of the Ministère du Développement économique, de l'innovation et de l'exportation (MDEIE) du Québec, Canada.