COMP-579 : Reinforcement Learning

Date	Topic	Materials
January 4	Introduction to reinforcement learning. Bandit algorithms	RL book, chapter 1. Intro slides
January 9	Bandits: definition of multi-armed bandit, epsilon-greedy exploration, optimism, UCB.	RL book, Sec. 2.1-2.7 Bandit slides
January 11	Bandits: regret definition and analysis for epsilon-gredy and UCB, gradient-based bandits	RL book, chapter 2 Assignment 1 posted
January 16	Wrap up of bandits: Gradient-based bandits, Thompson sampling.	RL book, chapter 2
January 18	Markov Decision Processes. Value functions. Bellman equations, policy evaluation. Policy iteration. Value iteration	RL book, chapter 3
January 23	More on dynamic programming: policy iteration, value iteration, contractions. Policy evaluation using Monte-Carlo Methods and Temporal-Difference learning	RL book, Chapter 4, Sec 5.1, 6.1, 6.2, 6.3
January 25	More on TD. Control using Monte Carlo and TD, including SARSA	RL book, Sec. 5.3, 5.4, 6.4, 6.5, 7.1
January 30	Q-learning	RL book, chapter 7 Assignment 1 due Assignment 2 posted
February 1	More on value-based RL, function approximation	RL book, chapter 8
February 6	More on value-based RL with function approximation	RL book Sec. 9.1-9.4
February 8	More on Deep RL	RL book chapter 9
February 13	Plannning and model-based RL	RL book chapter 8
February 15	More on model-based RL and planning	RL book chapter 8 Assignment 2 due Project information posted
February 20	Policy-gradient methods	RL book chapter 13
February 22	More on policy gradient. Actor-critic	RL book chapter 13 Assignment 3 posted
February 27	More on policy gradient: DDPG, TRPO
February 29	Wrap up material from the RL book
March 5	Study break
March 7	Study break
March 12	Hierachical RL
March 14	More on hierarchical RL
March 19	Offline and Batch RL	Assignement 3 due
March 21	More on offline and batch RL
March 26	Where do rewards come from? Inverse RL
March 28	Where do rewards come from? Learning from preferences and human feedback
April 2	Meta-learning; Never-ending / continual RL
April 4	More on never-ending and continual RL
April 9	Wrap-up: Thoughts on RL for AI	Project due April 12

Schedule