Database Replication
Data replication is very attractive in
order to increase system throughput and provide fault-tolerance.
However, it is a challenge to keep data copies consistent. Furthermore,
in order to fully take advantage of the processing power of all
replicas, adaptive load-balancing schemes are needed.
Conceptually, our work can be split into two branches:
Middleware-based replication
Middle-R is our middleware based replication tool. Clients connect to
Middle-R via a JDBC driver and Middle-R forwards the requests to one of
several database replicas. Each database replica is an instance of a
non-replicated standard database system. We currently work with
PostgreSQL. Middle-R can have one middleware instance, or one middleware instance for each database instance. Middle-R provides efficient, fast and consistent database replication for both cluster configurations (all replicas are within a LAN) and in WAN environments. Our approach provides fault-tolerance.
Current projects related to Middle-R:
- Isolation levels: Our solutions allows for different levels of isolation of concurrent transactions. More recently, we have focused on snapshot isolation, as this is the level that is also provided by the underlying PostgreSQL system.
- Wide-Area Systems:
Most research has focused on local area networks since
communication is, in principle, fast in these networks. However, the
requirement is also high to provide transparent, efficient, and
consistent data replication in wide area networks. The usual communication technology used in clusters (e.g., group communication systems) does not work well in WAN settings. Our solution is nearly as
performant as current commercial solutions for wide-area replication
but at the same time provides a higher degree of flexibility and
consistency.
- Partial Replication: While full replication places copies of data items at all replicas, partial replication only assigns copies of an individual data item to some replicas. When there is a high update workload full replication has too much overhead to keep all copies consistent and the individual replicas have little resources left to execute read operations. In contrast, with partial replication, a replica only has to execute the updates for data items for which it has local copies, and thus, has more potential to execute read operations. We have analyzed the performance gains that can be achieved with partial replication. We also addressed many challenges associated with partial replication, such as a more complex concurrency control, the challenge of finding a replica with the data copies needed for a request, and finally with the necessity of distributed query execution.
- Relationship between middleware and database system: When implementing a replication solution outside the database system, the replication tool does not have access to important components within the database system, such as concurrency control. Thus, functionality has to be reimplemented in the middleware. Understanding database interfaces, and the possibility to expose some of the internals to the outside world might allow for better performance and simpler solutions at the middleware layer.
Related Papers:
- An Autonomic Approach for Replication of Internet-based Services. D. Serrano, M.Patiño-Martínez, R. Jiménez-Peris, B. Kemme. IEEE Symp. on Reliable Distributed Systems (SRDS), Oct. 2008.
- Exploiting Reflection to Enable Scalable and Performance Database Replication at the Middleware Level. J. Salas, R. Jiménez-Peris, M.Patiño-Martínez, B. Kemme. Chapter of the book Software Engineering and Fault Tolerance. P. Pelliccione, H. Muccini, N. Gueli, A. Romanovsky (eds). 2007. ISBN 978-981-270-503-7. World Scientific.
- Boosting Database Replication Scalability through Partial Replication and 1-Copy-Snapshot-Isolation. D. Serrano, M.Patiño-Martínez, R. Jiménez-Peris, B. Kemme. IEEE Pacific Rim Int. Symp. on Dependable Computing (PRDC), Dec. 2007.
- Enhancing Edge Computing with Database Replication. Y. Lin, B. Kemme, M.Patiño-Martínez, R. Jiménez-Peris. IEEE Symp. on Reliable Distributed Systems (SRDS), October 2007.
- A Recovery Protocol for Middleware Replicated Databases Providing GSI. J. E. Armendáriz-Iñigo, F. D. Muñoz-Escoí, J. R. Juárez, J. R. González de Mendívil, B. Kemme. IEEE Int. Conference on Availability, Reliability and Security (ARES), April 2007.
- Lightweight Reflection for
Middleware-based Database Replication. J. Salas, R.
Jiménez-Peris, M.
Patiño-Martínez, B. Kemme. IEEE
Int. Symp. on Reliable Distributed Systems (SRDS), Leeds, England, Oct.
2006
- Consistent Database Replication
at
the
Middleware Level. M. Patiño-Martínez, R.
Jiménez-Peris, B. Kemme, G. Alonso. ACM Transactions on Computer
Systems (TOCS). Volume 23, No. 4, 2005, pp 1-49.
- Consistent Data Replication:
Is
it
feasible in WANs? Y. Lin, B. Kemme, M.
Patiño-Martínez, R. Jiménez-Peris. Europar Conf.,
Lisbon (Portugal), August 2005.
- Middleware based Data
Replication
providing Snapshot Isolation. Y. Lin, B. Kemme, M.
Patiño-Martínez, R. Jiménez-Peris. ACM Int. Conf.
on Management of Data (SIGMOD), Baltimore, Maryland, June 2005.
- Adaptive Middleware for
Data
Replication. J. M. Milan-Franco, R. Jiménez-Peris,
M. Patiño-Martínez, B. Kemme. ACM/IFIP/USENIX Conference
on Middleware, Toronto, Canada, October 2004.
- Improving the Scalability of
Fault-Tolerant Database Clusters: Early Results. R.
Jiménez-Peris, M. Patiño-Martínez, B. Kemme, G.
Alonso.
Proc. of the IEEE 22nd Int. Conf. on Distributed Computing Systems
2002, ICDCS'02. Vienna, Austria. July 2002.
Postgres-R
Postgres-R is an extension of the open-source relational database
system PostgreSQL. Postgres-R
provides efficient, fast and consistent database replication for
cluster configuration. To address the performance and consistency
challenges we exploit the rich semantics of group communication
systems.
In particular, the approach exploits the total
order delivery semantics of the multicast primitives to guarantee
the
isolation of transactions (all sites serialize conflicting
transactions according to the total order in which the group
communication system delivers messages), and the reliable
delivery of
messages despite failures to provide fault-tolerance (the same
messages are delivered to all available sites making it is easy for
the surviving system to decide on the commit/abort of pending
transactions). Our approach provides atomicity and the same isolation
level in regard to concurrency control than the underlying PostgreSQL
system (snapshot isolation). Furthermore, its performance is excellent.
For update transactions, it adds an overhead of a few milliseconds in
order to propagate changes to all replicas. By adding new replicas to
the system, the read load can be distributed leading to excellent
scalability. Postgres-R uses the Spread group communication system. The
product uses software
developed by Spread Concepts LLC for use in the Spread toolkit. For
more
information about Spread see http://www.spread.or.g The current status
and project related to Postgres-R are as follows.
- Postgres-R for Snapshot Isolation: Our first prototype
of Postgres-R in 2000 was based PostgreSQL 6.4. PostgreSQL 6.4
uses 2-phase-locking as
concurrency control protocol. In contrast, the current versions
use a multiversion concurrency control mechanism similar
to the one on Oracle 8i providing the isolation level snapshot
isolation. Our newest version Postgres-R(SI) works correctly with this
concurrency control method providing exactly the same isolation level
as a centralized version of PostgreSQL.
- Recovery:
An essential aspect of cluster databases is the need to allow failed
nodes to recover and rejoin the system without interruption of the
ongoing transaction processing on the available nodes (denoted as
online recovery). In particular, before a joining site can execute
transactions, an up-to-date peer site has to provide the current state
of
the data to the joining site.
We have developed online-recovery solutions and integrated them into
our Postgres-R prototype. One solution transfers the entire database
state to the joining replica, the other only sends the changes that
were performed during the downtime of the rejoined site. With the means
of heuristics, the system automatically chooses the recovery strategy
that is expected to take less time depending on the database size and
the changes the joining site needs to install.
- Client transparency: So
far, a client has to know the address of a replica and must
directly connect to this replica. When the replica fails, it receives
the typical error message, and has to connect to another replica. We
are planning to generate an automatic failover component. For instance,
the JDBC driver could be extended with the following features. While
the client sees a generic database name, the JDBC driver can retrieve a
configuration file that provides the addresses of the individual
replicas behind the database. Then it can connect to any (allowing
possibly for some load-balancing features). When the client the driver
is connected to crashes, the driver can automatically connect to
another replica without the client being aware of it.
Related papers:
- Online Recovery in Cluster Databases. W. Liang, B. Kemme. Int. Conf. on Extending Database Technology (EDBT), March 2008.
- Postgres-R(SI): Combining Replica
Control with Concurrency
Control based on Snapshot Isolation. S. Wu, B. Kemme. IEEE Int.
Conference
on Data Engineering (ICDE), Tokyo, Japan, April 2005.
- Database Replication Based on
Group Communication: Implementation Issues. Future Directions in
Distributed Computing (FuDiCo), Research and Position Papers.
Lecture Notes in Computer Science 2584 Springer 2003.
- Using Optimistic Atomic Broadcast
in
Transaction Processing Systems. B. Kemme,
F. Pedone,
G. Alonso,
A. Schiper,
M. Wiesmann. IEEE Transactions on Knowledge and Data Engineering,
Volume 15, No. 4,
2003. pp. 1018-1032
- Online Reconfiguration in
Replicated
Databases Based on
Group Communication B. Kemme, A. Bartoli, O.
Babaouglu. Proc. of the IEEE International Conference on Dependable
Systems and Networks (DSN 2001), Goteborg, Sweden, June 2001.
- Don't
be lazy, be consistent: Postgres-R, a new way to implement Database
Replication.
B. Kemme, G. Alonso. Proc. of 26th International Conference
on Very Large Databases (VLDB), Cairo, Egypt, September 2000.
Collaboration
Part of the work of both projects has been performed in the context of the Adapt
project (Middleware Technologies for Adaptive and Composable
Distributed Components). Adapt was a RTD project funded by the
Information Sociaty Technologies Programme of the European Commision
under FP5, and the Programme de soutien à la recherche (PSR) of
the
Ministère du Développement économique, de
l'innovation et de
l'exportation (MDEIE) du Québec, Canada.