> COMP-579 : Reinforcement Learning
Date Topic Materials
January 6 Introduction to reinforcement learning. Bandit algorithms RL book, chapter 1
Intro slides
January 8 Bandits: definition of multi-armed bandit, epsilon-greedy exploration, optimism, UCB. RL book, Sec. 2.1-2.7
Bandit slides
January 13 Bandits: regret definition and analysis for epsilon-gredy and UCB, gradient-based bandits
RL book, chapter 2
Assignment 1 posted
Bandit slides
January 15 Wrap up of bandits: Gradient-based bandits, Thompson sampling.
Sequential decision making
RL book, chapter 2
Bandit slides
January 20 Value functions and policies. Policy evaluation. Monte Carlo methods. Policy improvement. RL book, Sec. 3.1, 3.2, 5.1, 5.2, 5.3, 5.4
Slides
January 22 Markovian assumption. Bellman equations and dynamic programming. Policy iteration. Value iteration. RL book, Chapter 3.3-3.5, 4.1-4.8
Slides
January 27 Temporal-Difference learning RL book, Sec. 6.1 6.2

Slides
Assignment 1 due. Assignment 2 posted
January 29 Learning Control using TD, including SARSA RL book, Sec. 5.6 5.7 6.3 6.4 7.1 7.2 7.3 7.5
Slides
February 3 Q-learning RL book, Sec. 6.5-6.7 9.1-9.3
Q-Learning slides
February 5 Value Function Approximation for TD methods, DQN, Eligibility Traces RL book Sec. 10.2 10.5 16.5 12.1 12.2 12.4 12.5
Slides
David Silver's lecture on RL with function approximation
February 10 More on Eligibility Trace and TD(λ) RL book chapter 12
Slides
Assignment 2 due. Assignment 3 posted
February 12 Policy gradient. REINFORCE Slides
February 17 Actor-critic methods. Deterministic policy gradient. RL book 13.1-13.7
Slides
February 19 Policy gradient methods: DDPG and TRPO DPG paper, DDPG paper
Slides
February 24 Introduction to Large Language Models (LLMs) and Reinforcement Learning from Human Feedback (RLHF) Slides
Assignement 3 due. Project information posted
February 26 More on LLMs and RLHF Slides
March 3 Study break
March 5 Study break
March 10 Slack time and midterm review. Slides
Project proposal due
March 12 In-class midterm Example questions to be posted
March 17 Model-based RL RL book Chapter 8
Slides
March 19 Deep Model-based methods PlaNet Paper, Dreamer Paper, MuZero Paper
Slides
March 24 Offline and batch RL Slides
Offline RL tutorial (Levine, Kumar, Tucker and Fu, 2020)
March 26 Where do rewards come from? Inverse RL. Slides
Inverse RL survey
March 30 Hierarchical RL Slides
April 2 More on HRL. Continual RL Slides
April 7 Never-ending / continual RL Slides
Continual RL survey
April 9RL applications Slides. If we have time: Reward is enough