Date | Topic | Materials | |
January 6 | Introduction to reinforcement learning. Bandit algorithms |
RL book, chapters 1,2. Intro slides Bandit slides |
|
January 11 | More on bandits and exploration |
RL book, chapter 2. Bandit slides (Definitions, epsilon-greedy, optimistic initialization, UCB) | |
January 13 | More on Bandits. |
RL book, chapter 2. Bandit slides (Regret, bounds, gradient-based algorithms) Assignment 1 |
|
January 18 | Wrap-up on bandits (other versions of the problem) Finite MDPs, value functions and policies | RL book, chapter 3 Slides (will still be slightly revised) |
|
January 20 | More on MDPs. Bellman equations, policy evaluation. Policy iteration. Value iteration | RL book, chapter 4. Slides | |
January 25 | Policy evaluation using Monte-Carlo Methods and Temporal-Difference learning | RL book, Sec 5.1, 6.1, 6.2, 6.3 Slides |
|
January 27 | More on MC and TD, including n-step TD. Control using Monte Carlo and TD, including SARSA, Q-learning if we have time) | RL book, Sec. 5.3, 5.4, 6.4, 6.5, 7.1 Assignment 1 due; <Slides |
|
February 1 | Q-learning. Starting discussion of convergence results | RL book, chapter 7 Slides |
|
February 3 | Finish discussion of previous results. Planning and learning | RL book, chapter 8 Finishing slides from last time. Planning slides |
|
February 8 | Intro to RL with function approximation. Value-based methods |
RL book Sec. 9.1-9.4 Slides |
|
February 10 | More on RL with function approximation. Eligibility traces. Control with function approximation |
RL book chapter 9 Slides |
|
February 15 | Off-policy learning |
RL book chapter 12 Slides |
|
February 17 | No lecture |
RL book Chapter 10 | |
February 22 | No lecture |
| |
February 24 | More on off-policy learning |
RL book, chapter 12 Assignment 2 |
|
March 1 | Study break | ||
March 3 | Study break | ||
March 8 | Policy gradient |
RL book Chapter 13 Slides (with thanks to Hado Van Hasselt) |
|
March 10 | More on policy gradient |
Slides Assignment 2 due Assignment 3 to be posted |
March 15 | More on Deep RL: Model-based, temporal abstraction | Slides Project details to be posted |
March 17 | More on Deep RL | ||
March 22 | Distributional RL (Guest lecturer Marc Bellemare) |
Distributional RL book Final project document |
|
March 24 | Distributional RL (Guest lecturer Marc Bellemare) | ||
March 29 | Distributional RL (Guest lecturer Marc Bellemare) | ||
March 31 | Distributional RL (Guest lecturer Marc Bellemare) | ||
April 5 | Special topics: Batch RL |
Slides (with thanks to Emma Brunskill) Assignment 3 |
|
April 7 | Special topics: Rewards and Tasks | Slides Part 1, Part 2 | |
April 12 | Wrap-up: Thoughts on RL for AI |
Slides Project due (can be turned in without penalties until April 26) |