COMP-614 Distributed Data Management
Winter Term 2010
News
Names and Numbers
Overview
Lecture Notes
Talk Schedule + Slides
Project Information
News:
Names and Numbers:
- Lecture: Tuesdays/Thursdays 11:30-13:00, McConnell 103
- Instructor: Bettina Kemme. Office McConnell 109N. e-mail:
kemme
at cs.mcgill.ca
- Office Hours Instructor: TBA
- TA: TBA
Overview
Distributed computing environments have become the standard IT
environment. Examples are workstation
clusters, mobile environments, the grid, publish-subscribe systems,
multi-tier systems, or peer-2-peer database
systems. Data management in such environments is challenging.
In this course we will cover issues such as data replication, data
caching, and distributed query execution. We will
also look at fault-tolerance issues. We will first look at these topics from
an abstract point of view, and then analyze them for specific
computing environments, in particular for cluster-based and
peer-2-peer environments. A common thread through most of
these topics is that data must be kept consistent.
Possible Topic List
1) Principles in Distributed Data Management
- Overview of communication mechanisms (point-to-point, RMI
multicast, message queues, event-based, ...): 1-2 lectures
- Transactions: Concurrency control (1-3 lectures), Atomic Commit
Protocols (1-2 lectures)
- Architecture Principles (1-2 lectures)
- Replication Overview (1 lecture)
- Caching Overview (1 lecture)
- Distributed Query Execution: Principles (1 lecture)
2) Advanced Topics (for more information see
Project Information
- Cluster-based replication: consistency and query execution (2
lectures)
- caching: consistency and dynamic caching (2 lectures)
- Multi-tier architectures: performance, fault-tolerance, load
distribution/load balancing (2-3 lectures)
- peer-2-peer systems: search (file based and record based),
replication (2-3 lectures)
- distributed stream processing (1-2 lectures)
- massively multiplayer games: zoning mechanisms, architecture
alternatives (e.g. cluster-based, peer-2-peer) (2 lectures)
- Security
This course is useful for all who want to
- learn more about the essential problems associated with
data distribution
- learn the essential techniques and mechanisms to address these
problems
- deepen the knowledge in one or two sub-areas
- get to know the current research questions raised in distributed
data management
- learn advanced working techniques that are essential to do
research and/or get leadership positions in industry
- speak in front of an audience
- give small summaries (written or oral) about lectures and papers
- discuss with others
- read a paper or report analyzing its strengths and weaknesses
- do your own little research project
The course will be mainly given in form of lectures. I will introduce each of the topics with one lecture and then present
advanced material. Each student has to give a lecture
about a specific sub-area that includes material of recent
research papers. Additionally, each student has to shortly present
his/her research project at the end of the term.
Topics will be posted soon.
Prerequisites: A course in database systems (e.g.,
COMP-421).
Furthermore a computer networks (e.g., COMP-435/535) or distributed
systems course.
Marking Scheme: The marking scheme will be as follows:
- 10% class participation: you are evaluated on how actively you
participate in class discussions; pure attendance without saying
anything will result in a B.
- 30% summaries and critiques: 1 oral summary, 4 written summaries,
and 4 paper
critiques:
- Summary: provides a 1-page summary of a past lecture/talk. It
should contain the appropriate information such that a classmate who
has missed that lecture/talk gets a good idea what the missed
lecture/talk was about. For one or two of these summaries you also
have to give a five minute presentation in class.
- Critique of a paper: contains a 1-paragraph summary of the paper,
and then lists weak and strong points of the paper (weak points are
more interesting). The papers to choose from are the papers other
students present in their talks. It is expected that the paper
critique is submitted BEFORE the corresponding talk takes place.
- Note that you can submit as many summaries/critiques
as you want and the 5 best summaries and the 4 best critiques will
make up your mark.
- 60% Term project. The term project is an in-depth study of one
problem area. It consists of three deliverables:
- 15%: a survey report of
3-4 research papers,
- 15% own research work (implementation, evaluation or enhancements of
algorithms)
- 30% a class presentation of some of the survey part (1 hour), and a
presentation about the own research work (15-20 minutes)
Literature: The literature will be mainly based on survey
and research
papers. There are some books that might be interesting in the context
of
this course. Most of them are available in the library.
- M.T. Özsu and P. Valduriez, Principles of Distributed
Database
Systems, 2nd edition, Prentice-Hall, 1999.
- G. Coulouris, J. Dollimore, T. Kindberg, Distributed Systems
3rd edition, Addison-Wesley, 2000.
- S. Mullender (ed.) Distributed Systems, 2nd edition,
Addison-Wesley, 1993.
- S. Abiteboul, P. Buneman, D. Suciu Data on the Web,
Morgan
Kaufmann, 2000.
- K. Dittrich, A. Geppert Component Database Systems,
Morgan Kaufmann, 2001.
- P.A. Bernstein, E. Newcomer Principles of Transaction
Processing, Morgan Kaufmann, 1997.
- M. Buretta Data Replication, Wiley Computer
Publishing, 1997.
- G. Weikum, G. Vossen Transactional Information Systems,
Morgan Kaufmann, 2002.
- More will be announced
A note on academic integrity
McGill University values academic integrity. Therefore all students must understand
the meaning and consequences of cheating, plagiarism and other academic offences
under the Code of Student Conduct and Disciplinary Procedures (see
http://www.mcgill.ca/integrity/ for more information).
French/English
In accord with McGill University's Charter of Students' Rights, students in this
course have the right to submit in English or in French any written work that is to
be graded.