Date | Topic | Materials |
January 4 | Introduction to reinforcement learning. Bandit algorithms |
RL book, chapter 1. Intro slides |
January 9 | Bandits: definition of multi-armed bandit, epsilon-greedy exploration, optimism, UCB. |
RL book, Sec. 2.1-2.7 Bandit slides |
January 11 | Bandits: regret definition and analysis for epsilon-gredy and UCB, gradient-based bandits |
RL book, chapter 2 Assignment 1 posted |
January 16 | Wrap up of bandits: Gradient-based bandits, Thompson sampling. | RL book, chapter 2 |
January 18 | Markov Decision Processes. Value functions. Bellman equations, policy evaluation. Policy iteration. Value iteration | RL book, chapter 3 |
January 23 | More on dynamic programming: policy iteration, value iteration, contractions. Policy evaluation using Monte-Carlo Methods and Temporal-Difference learning | RL book, Chapter 4, Sec 5.1, 6.1, 6.2, 6.3 |
January 25 | More on TD. Control using Monte Carlo and TD, including SARSA | RL book, Sec. 5.3, 5.4, 6.4, 6.5, 7.1 |
January 30 | Q-learning | RL book, chapter 7 Assignment 1 due Assignment 2 posted |
February 1 | More on value-based RL, function approximation | RL book, chapter 8 |
February 6 | More on value-based RL with function approximation |
RL book Sec. 9.1-9.4 |
February 8 | More on Deep RL |
RL book chapter 9 |
February 13 | Plannning and model-based RL | RL book chapter 8 |
February 15 | More on model-based RL and planning |
RL book chapter 8 Assignment 2 due Project information posted |
February 20 | Policy-gradient methods | RL book chapter 13 |
February 22 | More on policy gradient. Actor-critic |
RL book chapter 13 Assignment 3 posted |
February 27 | More on policy gradient: DDPG, TRPO | |
February 29 | Wrap up material from the RL book | |
March 5 | Study break | |
March 7 | Study break | |
March 12 | Hierachical RL |
|
March 14 | More on hierarchical RL |
March 19 | Offline and Batch RL | Assignement 3 due |
March 21 | More on offline and batch RL | |
March 26 | Where do rewards come from? Inverse RL | |
March 28 | Where do rewards come from? Learning from preferences and human feedback | |
April 2 | Meta-learning; Never-ending / continual RL | |
April 4 | More on never-ending and continual RL | |
April 9 | Wrap-up: Thoughts on RL for AI | Project due April 12 |