COMP-551 Topics in Computer Science: Applied Machine Learning (4 credits)

Syllabus - Fall 2017


General Information

When/Where:Mondays 1-2:30pm (Leacock 26) and Wednesdays 8:30-10am (MC304)
Instructor:
 
 
Prof. Joelle Pineau, jpineau@cs.mcgill.ca
Office hours: Wednesdays 10-11am, MC106N
Associate instructor:
 
 
Dr. Herke van Hoof, herke.vanhoof@mail.mcgill.ca
Office hours: By appointment only, MC104N
Teaching assistants:
 
 
Philip Amortila, philip.amortila@mail.mcgill.ca
Office hours: Thursdays 10-11am, MC106
Christopher Glasz, christopher.glasz@mail.mcgill.ca
Office hours: Wednesdays 3-4pm, TR3104
Harsh Satija, harsh.satija@mail.mcgill.ca
Office hours: Mondays 10:30-11:30am, MC106
Koustuv Sinha, koustuv.sinha@mail.mcgill.ca
Office hours: Wednesdays 4-5pm, TR3090
Matthew Smith, matthew.smith5@mail.mcgill.ca
Office hours: Tuesdays 2-3pm, MC111
Sanjay Thakur, sanjay.thakur@mail.mcgill.ca
Office hours: Fridays 10:30-11:30am, MC105
Class web page:
 
http://www.cs.mcgill.ca/~jpineau/comp551

Course Description

The course will cover selected topics and new developments in Data mining and Machine learning, with a particular emphasis on good methods and practices for effective deployment of real systems. We will study commonly used algorithms and techniques, including clustering, neural networks, support vector machines, decision trees. We will also discuss methods to address practical issues such as feature selection and dimensionality reduction, error estimation and empirical validation, algorithm design and parallelization, and handling of large datasets.

Course content (subject to minor changes):

  1. Linear regression. Linear classification.
  2. Performance evaluation, overfitting, cross-validation, bias-variance analysis, error estimation.
  3. Naive Bayes.
  4. Decision trees. Regression trees and ensemble methods.
  5. Cost-sensitive learning.
  6. Support vector machines.
  7. Artificial neural networks. Deep learning.
  8. Feature selection. Dimensionality reduction. Regularization.
  9. Online / streaming data.
  10. Data structures and Map-Reduce.
  11. Unsupervised learning and clustering. Semi-supervised learning.
  12. Applications.

Reference Materials

There is no required textbook. Lecture notes and references will be available from the course web page. The following texts can also be very useful:
  1. Shai Shalev-Shwartz and Shai Ben-David. Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press. 2014. Available free online.
  2. Trevor Hastie, Robert Tibshirani and Jerome Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition. Springer. 2009. Available free online.
  3. Christopher Bishop. Pattern Recognition and Machine Learning. Springer. 2007.
  4. Ian Goodfellow, Yoshua Bengio and Aaron Courville. Deep Learning. The MIT Press. 2016. free online
  5. Kevin Murphy. Machine Learning: A Probabilistic Perspective. The MIT Press. 2012.
  6. David MacKay. Information Theory, Inference and Learning Algorithms. Cambridge University Press. 2003.
  7. Richard Duda, Peter Hard and David Stork. Pattern Classification. 2nd Edition. Wiley & Sons. 2001.

Prerequisites / Anterequesites

Basic knowledge of a programming language is required. Basic knowledge of probabilities/statistics, calculus and linear algebra is required. Example courses at McGill providing sufficient background in probability are MATH-323 or ECSE-305. Some AI background is recommended, as provided, for instance by COMP-424 or ECSE-526, but not required. Note that while the course does not have strict prerequesites, it is a graduate-level course in computer science.

Students who took COMP-652 in Winter 2013 or before CANNOT take COMP-551. Starting in Fall 2013, COMP-551 and COMP-652 were designed to avoid significant overlap; you can take either or both.

The courses is intended for hard-working, technically skilled, highly motivated students. Participants will be expected to display initiative, creativity, scientific rigour, critical thinking, and good communication skills.

Evaluation Criteria

The class grade will be based on the following components:

The weekly exercises will consist of quizzes (in class) or practical work (take-home) designed to develop basic understanding of the course material as we progress through the topics. These are designed to provide some practice for the midterm.

The midterm is designed to assess in-depth understanding of fundamental methods and algorithms. It will be scheduled towards the later end of the semester (mid-November). There is no final exam.

The projects will require reading, writing, programming and experiments to gain hands-on experience with the application of recent machine learning methods, including concepts covered in the lectures, and concepts drawn from the literature. Students will be responsible for characterizing the problem, developing methods of analysis, and presenting the results of their work. Some projects may be individual, most will be done in groups (usually of 3 students).

We will use a peer-review system to evaluate the data analysis case studies. Each student will be asked to read and evaluate submissions of their colleagues. The emphasis will be placed on providing constructive feedback on the methodology and presentation.

Evaluation Policy

All course work should be submitted online (details to be given in class), by 11:59pm, on the assigned due date. Late work will be automatically subject to a 30% penalty, and can be submitted up to 1 week after the deadline.

No make-up quizzes or midterm will be given.

Some of the course work will be individual, other components can be completed in groups. It is the responsibility of each student to understand the policy for each work, and ask questions of the instructor if this is not clear. It is also the responsibility of each student to carefully acknowledge all sources (papers, code, books, websites, individual communications) using appropriate referencing style when submitting work.

We will use automated systems to detect possible cases of text or software plagiarism. Cases that warrant further investigation will be referred to the university disciplinary officers. Students who have concerns about how to properly use and acknowledge third-party software should consult the course instructor or TAs.

McGill University values academic integrity. Therefore all students must understand the meaning and consequences of cheating, plagiarism and other academic offences under the Code of Student Conduct and Disciplinary Procedures (see www.mcgill.ca/students/srr/honest/ ) for more information).

In accord with McGill University's Charter of Students' Rights, students in this course have the right to submit in English or in French any written work that is to be graded.

In the event of extraordinary circumstances beyond the University's control, the content and/or evaluation scheme in this course is subject to change.