Date | Topic | Materials |
January 9 | Introduction to reinforcement learning. Bandit algorithms |
RL book, chapters 1,2. Intro slides Bandit slides |
January 11 | More on bandits and exploration |
RL book, chapter 2. Bandit slides Assignment |
January 16 | More on Bandits. | RL book, chapters 2. Also see Csaba's book and blog |
January 18 | Finite MDPs, Bellman equations, policy evaluation | RL book chapter 4 Handhout |
January 23 | Control, optimality equations, value iteration, policy iteration. | RL book chapter 4. Assignment |
January 25 | Monte-Carlo Methods, Temporal-Difference Learning | RL book, chapters 5 and 6 |
January 30 | More on TD learning, Multi-step Bootstrapping | RL book, chapter 6 and 7 |
February 1 | More on multi-step Bootstrapping | RL book, chapter 7 |
February 6 | Planning and learning with tabular methods | RL book, chapter 8 |
February 8 | More on planning and learning with tabular methods | RL book, chapter 8 |
February 13 | SARSA, Q-Learning and model-free control | TBA |
February 15 | Temporal abstraction | Option's paper Assignment |
February 20 | On-policy control with function approximation | RL book, chapter 10 |
February 22 | Off-policy learning with function approximation | RL book, chapter 11 |
February 27 | More on off-policy learning. Eligibility traces. | RL book, chapters 11, 12 |
March 1 | Eligibility traces. | RL book, chapter 12 Assignment |
March 6 | Study break | |
March 8 | Study break | |
March 13 | LSTD, LSPI, Fitted-Q | |
March 15 | In-class midterm exam | March 20 | Policy Gradient Methods | RL book, chapter 13 Assignment |
March 22 | More on gradient-based methods | TBD |
March 27 | Frontiers: learning options using gradient-based methods | TBD |
March 29 | Frontiers: Meta-learning | TBD |
April 3 | Frontiers: Intrinsic motivation and reward origins | TBD |
April 5 | Frontiers: Generalized value functions | TBD |
April 10 | Frontiers: TBD | TBD |
April 12 | Wrap-up | TBD |