>
| Date | Topic | Materials |
| January 6 | Introduction to reinforcement learning. Bandit algorithms |
RL book, chapter 1 Slides |
| January 8 | Bandits: definition of multi-armed bandit, epsilon-greedy exploration, optimism, UCB. |
RL book, Sec. 2.1-2.7 Slides |
| January 13 | Bandits: regret definition and analysis for epsilon-gredy and UCB, gradient-based bandits |
RL book, Sec. 2.8-2.10 Assignment 1 posted Slides |
| January 15 | Wrap up of bandits: Thompson sampling. Sequential decision making. Policy Evaluation. Monte Carlo |
Thompson sampling tutorial RL book Sec. 5.1 Slides |
| January 20 | More on Monte Carlo | RL book, Sec. 5.2-5.4 Slides |
| January 22 | Markovian assumption. Bellman equations and dynamic programming for policy evaluation. | RL book, Sec. 3.1-3.5, 4.1, 4.5 Slides |
| January 27 | Temporal-Difference learning. | RL book, Sec. 6.1-6.2, 9.1-9.4, 9.7 Slides Assignment 1 due. Assignment 2 posted |
| January 29 | Multi-step methods | RL book, Sec. 9.3-9.4 , 12.1-12.3 Slides |
| February 3 | Theory of policy evaluation | RL book, Sec. 11.1 11.4, 9.5.4 Csaba book Appendix A Slides Animations: v_pi | Bellman | Contraction | TD(0) | TD(lambda) Paper: Section 3 for TD(0) dynamics |
| February 5 | Value Function Approximation for TD methods, DQN, Eligibility Traces |
RL book Sec. 10.2 10.5 16.5 12.1 12.2 12.4 12.5 Slides David Silver's lecture on RL with function approximation |
| February 10 | More on Eligibility Trace and TD(λ) |
RL book chapter 12 Slides Assignment 2 due. Assignment 3 posted |
| February 12 | Policy gradient. REINFORCE | Slides |
| February 17 | Actor-critic methods. Deterministic policy gradient. |
RL book 13.1-13.7 Slides |
| February 19 | Policy gradient methods: DDPG and TRPO |
DPG paper, DDPG paper Slides |
| February 24 | Introduction to Large Language Models (LLMs) and Reinforcement Learning from Human Feedback (RLHF) |
Slides Assignement 3 due. Project information posted |
| February 26 | More on LLMs and RLHF |
Slides |
| March 3 | Study break | |
| March 5 | Study break | |
| March 10 | Slack time and midterm review. |
Slides Project proposal due |
| March 12 | In-class midterm | Example questions to be posted |
| March 17 | Model-based RL | RL book Chapter 8 Slides |
| March 19 | Deep Model-based methods |
PlaNet Paper, Dreamer Paper, MuZero Paper Slides |
| March 24 | Offline and batch RL |
Slides Offline RL tutorial (Levine, Kumar, Tucker and Fu, 2020) |
| March 26 | Where do rewards come from? Inverse RL. |
Slides Inverse RL survey |
| March 30 | Hierarchical RL | Slides |
| April 2 | More on HRL. Continual RL | Slides |
| April 7 | Never-ending / continual RL |
Slides Continual RL survey |
| April 9 | RL applications | Slides. If we have time: Reward is enough |