COMP-767 : Reinforcement Learning

Date	Topic	Materials
January 9	Introduction to reinforcement learning. Bandit algorithms	RL book, chapters 1,2. Intro slides Bandit slides
January 11	More on bandits and exploration	RL book, chapter 2. Bandit slides Assignment
January 16	More on Bandits.	RL book, chapters 2. Also see Csaba's book and blog
January 18	Finite MDPs, Bellman equations, policy evaluation	RL book chapter 4 Handhout
January 23	Control, optimality equations, value iteration, policy iteration.	RL book chapter 4. Assignment
January 25	Monte-Carlo Methods, Temporal-Difference Learning	RL book, chapters 5 and 6
January 30	More on TD learning, Multi-step Bootstrapping	RL book, chapter 6 and 7
February 1	More on multi-step Bootstrapping	RL book, chapter 7
February 6	Planning and learning with tabular methods	RL book, chapter 8
February 8	More on planning and learning with tabular methods	RL book, chapter 8
February 13	SARSA, Q-Learning and model-free control	TBA
February 15	Temporal abstraction	Option's paper Assignment
February 20	On-policy control with function approximation	RL book, chapter 10
February 22	Off-policy learning with function approximation	RL book, chapter 11
February 27	More on off-policy learning. Eligibility traces.	RL book, chapters 11, 12
March 1	Eligibility traces.	RL book, chapter 12 Assignment
March 6	Study break
March 8	Study break
March 13	LSTD, LSPI, Fitted-Q
March 15	In-class midterm exam
March 20	Policy Gradient Methods	RL book, chapter 13 Assignment
March 22	More on gradient-based methods	TBD
March 27	Frontiers: learning options using gradient-based methods	TBD
March 29	Frontiers: Meta-learning	TBD
April 3	Frontiers: Intrinsic motivation and reward origins	TBD
April 5	Frontiers: Generalized value functions	TBD
April 10	Frontiers: TBD	TBD
April 12	Wrap-up	TBD

Lecture Schedule

Further Reading

January 11