>
| Date | Topic | Materials |
| January 6 | Introduction to reinforcement learning. Bandit algorithms |
RL book, chapter 1 Intro slides |
| January 8 | Bandits: definition of multi-armed bandit, epsilon-greedy exploration, optimism, UCB. |
RL book, Sec. 2.1-2.7 Bandit slides |
| January 13 | Bandits: regret definition and analysis for epsilon-gredy and UCB, gradient-based bandits |
RL book, chapter 2 Assignment 1 posted Bandit slides |
| January 15 | Wrap up of bandits: Gradient-based bandits, Thompson sampling. Sequential decision making |
RL book, chapter 2 Bandit slides |
| January 20 | Value functions and policies. Policy evaluation. Monte Carlo methods. Policy improvement. | RL book, Sec. 3.1, 3.2, 5.1, 5.2, 5.3, 5.4 Slides |
| January 22 | Markovian assumption. Bellman equations and dynamic programming. Policy iteration. Value iteration. | RL book, Chapter 3.3-3.5, 4.1-4.8 Slides |
| January 27 | Temporal-Difference learning | RL book, Sec. 6.1 6.2 Slides Assignment 1 due. Assignment 2 posted |
| January 29 | Learning Control using TD, including SARSA | RL book, Sec. 5.6 5.7 6.3 6.4 7.1 7.2 7.3 7.5 Slides |
| February 3 | Q-learning | RL book, Sec. 6.5-6.7 9.1-9.3 Q-Learning slides |
| February 5 | Value Function Approximation for TD methods, DQN, Eligibility Traces |
RL book Sec. 10.2 10.5 16.5 12.1 12.2 12.4 12.5 Slides David Silver's lecture on RL with function approximation |
| February 10 | More on Eligibility Trace and TD(λ) |
RL book chapter 12 Slides Assignment 2 due. Assignment 3 posted |
| February 12 | Policy gradient. REINFORCE | Slides |
| February 17 | Actor-critic methods. Deterministic policy gradient. |
RL book 13.1-13.7 Slides |
| February 19 | Policy gradient methods: DDPG and TRPO |
DPG paper, DDPG paper Slides |
| February 24 | Introduction to Large Language Models (LLMs) and Reinforcement Learning from Human Feedback (RLHF) |
Slides Assignement 3 due. Project information posted |
| February 26 | More on LLMs and RLHF |
Slides |
| March 3 | Study break | |
| March 5 | Study break | |
| March 10 | Slack time and midterm review. |
Slides Project proposal due |
| March 12 | In-class midterm | Example questions to be posted |
| March 17 | Model-based RL | RL book Chapter 8 Slides |
| March 19 | Deep Model-based methods |
PlaNet Paper, Dreamer Paper, MuZero Paper Slides |
| March 24 | Offline and batch RL |
Slides Offline RL tutorial (Levine, Kumar, Tucker and Fu, 2020) |
| March 26 | Where do rewards come from? Inverse RL. |
Slides Inverse RL survey |
| March 30 | Hierarchical RL | Slides |
| April 2 | More on HRL. Continual RL | Slides |
| April 7 | Never-ending / continual RL |
Slides Continual RL survey |
| April 9 | RL applications | Slides. If we have time: Reward is enough |