Date  Topic  Materials  
January 8  Introduction to reinforcement learning. Bandit algorithms 
RL book, chapters 1,2. Intro slides Bandit slides 

January 10  More on bandits and exploration 
RL book, chapter 2. Slides  
January 15  More on Bandits. 
RL book, chapter 2. David Silver's slides, up to slide 25 

January 17  Finite MDPs, Bellman equations, policy evaluation  RL book, chapter 3 Slides Assignment 1 

January 22  Control, optimality equations, value iteration, policy iteration.  RL book, chapter 4. Slides  
January 24  MonteCarlo Methods  RL book, chapter 5 Slides 

January 29  Temporaldifference learning methods (including TD(0), SARSA, Qlearning)  RL book, chapter 6 Slides Assignment 1 due. 

January 31  Wrapup of TD. Multistep Bootstrapping  RL book, chapter 7 Slides 

February 5  Planning and learning. Convergence of TDstyle methods  RL book, chapter 8 Slides Slides on MCTS from Alan Fern 

February 7  More on theory of tabular TD  Notes to be posted  
February 12  No class  Assignment 2 posted  
February 14  No class  David Silver's lecture on RL with function approximation  
February 19  Onpolicy reinforcement learning with function approximation: TD, Sarsa 
RL book, chapter 9, 10 Slides on prediction, control Assignment 2  
February 21  More on onpolicy learning with function approximatiion: averagereward case, eligibility traces 
RL book, chapter 12 Slides 

February 26  More on eligibility traces. offpolicy learning with function approximation 
RL book, chapter 12 Slides Project ideas posted 

February 28  More on valuebased RL with function approximation  RL book, chapter 12 Slides 

March 5  Study break  
March 7  Study break  
March 12  Midterm recap 
Slides Assignment 2 due 

March 14  Inclass midterm exam  Midterm from 2018  
March 19  LSTD and LSPI. Policy Gradient Methods  RL book, chapter 13. Boyan paper; Lagoudakis and Parr paper Slides form David Silver on leastsquares methods (part 3  batch RL). Slides on policy gradient Assignment 3 posted 

March 21  More on policy gradientbased methods 
Slides Project proposal due 

March 26  More on policy gradient (Riashat) 
Slides Options paper 

March 28  Frontiers: Temporal abstraction  Slides  
April 2  Frontiers: Finish temeporal abstraction. Inverse reinforcement learning  Pieter Abbeel sldies on inverse RL  
April 4  Frontiers: Metalearning  Slides (courtesy of Di Wu)  
April 9  Frontiers: More on exploration 
Slides (based on material from Sergei Levine Pseudocounts paper (Bellemare et al, 2016) Deep exploration via randomized valuee functions (Osband et al, 2017) 

April 11  Final project poster session (inclass)  TBD 