COMP-421: Database Systems

Names and Numbers:


Outline

With the raise of e-commerce all of us use complex information systems on a daily basis -- by accessing the Minerva university system, by banking online, by buying a book online. Database management systems (DBMS) build the backbone software systems that store all mission-critical data of these information systems. The big DBMS vendors like Oracle, IBM (DB2) or Microsoft (SQL Server) are well known players on the IT market; Linux distributions (for instance Red Hat), include an open source database management system in their distribution package; even the Microsoft Office suite contains kind of a database management system. And recently, the NoSQL movement has shown that data management has become important in many new domains and markets. As a result, knowing at least the basic concepts behind these systems and how to use them is becoming an essential skill for IT specialists.

This course is intended to give students a solid background in relational DBMS. Furthermore is outlines non-relational solutions. If you are interested in becoming an effective user of a DBMS, or a DBMS professional, this is a good introduction to the topic. The course has three main parts: the first part explains how to use a relational DBMS, the second part looks at how a relational DBMS works internally . A third, shorter part, looks at other data models, especially for large-scale data management, and their implementation in cloud environments.

The first part will discuss the design of databases and how to write queries that extract the data you are looking for. Or in other words: how can you squeeze your "real world" data (the product information of a bookstore, the student and activity information of a students' association) into the relational data model of the database system and how can you retrieve the data later.

The second part of the course is system-oriented and discusses the internals of a DBMS. Using a DBMS is one issue, using it efficiently and correctly is a different story. DBMS are designed to work for very different application areas and present a general purpose software. In order to be able to tune a DBMS to your specific requirements you must know how your database application is executed by the DBMS.
This second part will cover two main areas: Transaction management is important to keep data consistent when updates occur. Query execution is the process that efficiently finds the small set of data the user wants to see out of a possibly huge database.

Throughout the course we will also talk about non-relational (NoSQL) data models and their implementation: semi-structured data that is better able to handle document-type data and key-value stores that are better able to store and query huge amount of data in a distributed fashion.

Objectives:

In this course you will


Preliminary Topic List
(each topic between 2 and 4 lectures):

Prerequisites: The official pre-requisites are COMP-206, COMP-251 and COMP-302. In general, you have to have taken a good set of COMP courses. Which ones exactly is not that relevant but you have to have a good general undstanding of data structures, algorithms, programming languages and basics of program development. Also, how programs are executed and run on modern computer systems. Knowledge of operating systems is of advantage.

Marking Scheme: There are 3 written assignments, three project deliveries, one map-reduce exercise, one midterm, and one final. The scheme will be

In assignments the students have to solve concrete problems (one assignment is on SQL, and students have to turn in runnable SQL statements). In some cases, late turn-in might not be possible (e.g., if I want to post the solution before the midterm/final). Otherwise, a late turn-in will result in a penalty of 10% per day (e.g. if an assignment has in total 100 points, a student achieves 80 points but turns in the assignment a day too late, then he/she will receive 80-10=70 points).

One or more assignments might be done in groups of several students. This still has to be determined depending on TA capacity.

The three programming deliveries build a project in which students have to develope and build a database application for a real-world domain. The students will design a schema, create a database using DB2, maintain, query and update the data, and develop application programs.

The students will work in teams of three on the programming project. All team members are supposed to work together and participate equally in the project development.

In the map-reduce exercise, students will do large-scale (or more ralistically, medium-scale) data analysis using a cluster of computers and a map-reduce implementation.

Both midterm and final will likely partially consist of multiple choise questions.

Course Book:  Alternative Literature

A note on academic integrity

McGill University values academic integrity. Therefore all students must understand the meaning and consequences of cheating, plagiarism and other academic offences under the Code of Student Conduct and Disciplinary Procedures (see http://www.mcgill.ca/integrity/ for more information).

French/English

In accord with McGill University's Charter of Students' Rights, students in this course have the right to submit in English or in French any written work that is to be graded.