COMP-579 : Reinforcement Learning

Date	Topic	Materials
January 6	Introduction to reinforcement learning. Bandit algorithms	RL book, chapters 1,2. Intro slides Bandit slides
January 11	More on bandits and exploration	RL book, chapter 2. Bandit slides (Definitions, epsilon-greedy, optimistic initialization, UCB)
January 13	More on Bandits.	RL book, chapter 2. Bandit slides (Regret, bounds, gradient-based algorithms) Assignment 1
January 18	Wrap-up on bandits (other versions of the problem) Finite MDPs, value functions and policies	RL book, chapter 3 Slides (will still be slightly revised)
January 20	More on MDPs. Bellman equations, policy evaluation. Policy iteration. Value iteration	RL book, chapter 4. Slides
January 25	Policy evaluation using Monte-Carlo Methods and Temporal-Difference learning	RL book, Sec 5.1, 6.1, 6.2, 6.3 Slides
January 27	More on MC and TD, including n-step TD. Control using Monte Carlo and TD, including SARSA, Q-learning if we have time)	RL book, Sec. 5.3, 5.4, 6.4, 6.5, 7.1 Assignment 1 due; <Slides
February 1	Q-learning. Starting discussion of convergence results	RL book, chapter 7 Slides
February 3	Finish discussion of previous results. Planning and learning	RL book, chapter 8 Finishing slides from last time. Planning slides
February 8	Intro to RL with function approximation. Value-based methods	RL book Sec. 9.1-9.4 Slides
February 10	More on RL with function approximation. Eligibility traces. Control with function approximation	RL book chapter 9 Slides
February 15	Off-policy learning	RL book chapter 12 Slides
February 17	No lecture	RL book Chapter 10
February 22	No lecture
February 24	More on off-policy learning	RL book, chapter 12 Assignment 2
March 1	Study break
March 3	Study break
March 8	Policy gradient	RL book Chapter 13 Slides (with thanks to Hado Van Hasselt)
March 10	More on policy gradient	Slides Assignment 2 due Assignment 3 to be posted
March 15	More on Deep RL: Model-based, temporal abstraction	Slides Project details to be posted
March 17	More on Deep RL
March 22	Distributional RL (Guest lecturer Marc Bellemare)	Distributional RL book Final project document
March 24	Distributional RL (Guest lecturer Marc Bellemare)
March 29	Distributional RL (Guest lecturer Marc Bellemare)
March 31	Distributional RL (Guest lecturer Marc Bellemare)
April 5	Special topics: Batch RL	Slides (with thanks to Emma Brunskill) Assignment 3
April 7	Special topics: Rewards and Tasks	Slides Part 1, Part 2
April 12	Wrap-up: Thoughts on RL for AI	Slides Project due (can be turned in without penalties until April 26)

Schedule