Reinforcement Learning (COMP-767)
Winter 2018


General Information

Location:McConnell Engineering room 13
Times:Tuesday and Thursday, 4:05-5:25pm
Instructor:Doina Precup and Pierre-Luc Bacon, School of Computer Science
Office:McConnell Engineering building, room 111N and 107
Phone:(514) 398-6443 (Doina) and
Office hours:
See course home page
Meetings at other times by appointment only!
Class web page:
IMPORTANT: This is where class notes, announcements and homeworks are posted!

Course Description

The goal of this class is to provide an introduction to reinforcement learning, a very active part of machine learning. Reinforcement learning is concerned with building programs which learn how to predict and act in a stochastic environment, based on past experience. Applications of reinforcement learning range from classical control problems, such as powerplant optimization or dynamical system control, to game playing, inventory control, and many other fields. Notably, reinforcement learning has also produced very compelling models of animal and human learning. During this course, we will study theoretical properties and practical applications of reinforcement learning. We will follow the second edition of the classic textbook by Sutton & Barto (available online), and supplement it as needed with papers and other materials.


Basic knowledge of a programming language is required. Knowledge of probabilities/statistics, calculus and linear algebra is required. Example courses at McGill providing sufficient background in probability are MATH-323 or ECSE-305. Machine learning background, as provided for example by COMP-551 or COMP-652 is required. If you have doubts regarding your background, please contact Doina to discuss it.

Reference Materials

Required textbook: Additional textbooks: Lecture notes and other relevant materials are linked to the lectures web page.

MyCourses will be used only for bulletin board, discussion groups and assignment submission and grading.

Class Requirements

The class grade will be based on the following components:

  1. Five assignments - 60%. Each week, by Friday evening, a set of papers and possible experimental demonstration topics will be posted. Students can choose a topic from this list and turn in an experimental notebook or a short report on this topic. Every student MUST subscribe to doing this 5 times during the term. Students can work individually or in teams of 2, and have to post their material on a shared site within a week of the time when their c hosen assignment was posted. This material will be graded. Students are expected to be able to answer questions and/or make a short presentation of their assignment to one of the teachers or TAs, if asked. This would be done in person.
  2. A midterm exam - 10%. The exam is tentatively scheduled on March 13. It is an in-class exam, and you are permitted one double-sided crib sheet.
  3. A final project - 30%. For the final project, students can work individually or in groups of up to 3 students, on a topic of their choice. Students must send by email to the course instructors a brief description of their topic by March 15. Project presntation will be scheduled outside of class hours, and project reports wil be due after the presentations.
  4. Participation in class discussions - up to 1% extra credit.
Minor changes to the evaluation scheme (if any) will be announced in class by Thursday January 11 (pending in-class discussion and the estimated total enrollment).

McGill University values academic integrity. Therefore all students must understand the meaning and consequences of cheating, plagiarism and other academic offenses under the Code of Student Conduct and Disciplinary Procedures (see for more information).

In accord with McGill University's Charter of Students' Rights, students in this course have the right to submit in English or in French any written work that is to be graded.

In the event of extraordinary circumstances beyond the University's control, the content and/or evaluation scheme in this course is subject to change.