Feature Extraction in Sparse State Spaces
Very often in reinforcement learning, we are working with state spaces that are sparse under some known or unknown basis. These included the commonly used tile-coded and almost any type of discritized spaces. We have shown that the effective linear dimenionality of these spaces is only logarithmic in the nominal dimensionality of the space, and grows linearly with the sparsity level (AAAI'12). We introduced an approximate policy evaluation algorithm that uses this structure to learn linear value functions with very small sample sizes (EWRL'12). Possible application to online RL is an interesting future work.
PAC-Bayesian Model Selection in Reinforcement Learning
The PAC-Bayesian learning is an attempt to get the best of both PAC and Bayesian approaches by combining the ability to be tuned with an informative prior, along with PAC guarantees that hold under all sampling distributions. We introduced the idea of PAC-Bayesian model-selection in reinforcement learning for finite state spaces, and provided PAC-Bayesian bounds on the approximation error in the value function when a prior distribution is available either on the space of possible models, or on the space of value functions (NIPS'10). We also extended our analysis for the batch RL problem to continuous state spaces (UAI'11). Extending these results to online RL has proved to be challenging due to changes in the distribution of states.
Non-Deterministic Policies in Markov Decision Processes
In recent years, attempts were made to apply methods from reinforcement learning to construct decision support systems for action selection in Markovian environments. Such systems often require confidence measures on action optimality and need to provide flexible suggestions. We introduced the concept of non-deterministic policies to capture the user's decision making process (NIPS'08). We use this concept to provide flexible choices to users among near-optimal solutions (JAIR'11), and provide statistical guarantees for decisions with uncertainties. We provide two algorithms to propose flexible options to the user, while making sure the performance is always close to optimal. We show how to provide confidence measures over the value function of Markovian processes, and finally use them to find sets of actions that will almost surely include the optimal action (Masters Thesis).
Model Uncertainty in Partially Observable Markov Decision Processes
Studies on Partially Observable Markov Decision Processes often involve calculation of the value function of a specific policy, given a model of the transition and observation probabilities, and the reward. These models can be learned using labeled samples of on-policy trajectories. However, when using empirical models, some bias and variance terms are introduced into the value function as a result of using imperfect models. We proposed a method to estimate the bias and variance of the value function estimate in terms of the statistics of the empirical transition and observation model (AAAI'08). Such error terms can be used to meaningfully compare the value of different policies.