Current research projects
Planning, Learning, and Decision-making
- Bayesian reinforcement learning: (with Stephane Ross, Brahim Chaib-draa, Robin Jaulmes and Doina Precup)
Most approximate POMDP solutions assume a known model. In many domains, its hard to get an accurate model. We look at ways of formulating POMDP solutions such that we take into account model uncertainty, and improve model accuracy through careful selection of queries (see [ECML'05] [ECML'05 workshop] [ICRA'07]).
We also propose a full Bayes-optimal formulation of the problem, for both discrete and continuous domains (see [NIPS'07] and [ICRA'08]).
Most recently we have applied similar methods to Bayesian structure learning in MDPs (see [UAI'08]).
- Fast approximation algorithms for POMDPs: (with Stephane Ross, Brahim Chaib-draa)
The main goal of this project is to find tractable solutions for large discrete POMDP problems. This started with some of my earlier work on anytime approximate algorithms for solving discrete POMDPs (see [IJCAI'03] [NIPS'03] [ISRR'05] and [JAIR'06]).
More recently, we have been exploring the use of online methods for fast POMDP approximations (see [NIPS'07]).
A survey paper of online POMDP methods is now available (see [JAIR'08]).
- Learning and representation of systems with hidden state: (with Ricard Gavalda, Prakash Panangaden and Doina Precup)
We investigate a new framework for representing systems with hidden state, which is based on duality theory, and unifies earlier representations such as PSRs and POMDPs (see [AAAI'06]).
We also propose an algorithm to learn a provably good (and preferably compact) representation for such systems in polynomial time (see [ECML'06]).
- Randomized algorithms for planning in STRIPS domains: (with Dan Burfoot)
We investigated a randomized approach to planning in determinstic (STRIPS) domains. Our approach is inspired by Rapidly-exploring Random Trees (RRTs), an approach originally proposed (by S. LaValle and colleagues) for path planning in continuous spaces (see [ICAPS'06] or longer Tech Report version[TR]).
- SmartWheeler: Multi-modal intelligent wheelchair control: (with Amin Atrash, Robert Kaplow, Chris Prahacs, Julien Villemure, Robert West and Hiba Yamani)
This project looks at customizing a robotic wheelchair such that it can be operated by a person with severe mobility impairements. The goal is to optimize a flexible multi-modal interface that allows high-level control of the wheelchair in a manner that is safe and effective (see the project webpage or a poster on the project).
- Large-scale dialogue management: (with Amin Atrash)
The role of a dialogue manager is to pick appropriate actions and responses, when interacting with a person. In this project, we aim to build a large-scale dialogue manager for a robot interface, and to apply machine learning techniques (e.g. reinforcement learning, parameter estimation) to optimize the performance of the dialogue manager (see [ACL'00] [AAAI'06 workshop]).
Adaptive treatment design
- Automated learning of adaptive treatment strategies for chronic depression: (with Mahdi Milani Fard, Peter Biernot, Erica Moodie, Susan Murphy and John Rush)
The STAR*D trials consist of large multi-stage randomized treatment sequences for people with clinical depression. The goal for computer scientists is to automatically learn optimal treatment sequences. This can phrased as a reinforcement learning problem, where we use the data collected in the STAR*D trials to learn an (approximately) optimal policy (see [DAD'07]. See also white paper on adaptive treatment strategies [MCATS'06]).
We have recently derived a method for computing the variance for POMDP policy evaluation, which is particularly useful to compare different treatement strategies (see [AAAI'08]).
- Computational modelling and adaptive treatment of epilepsy: (with Robert Vincent, Arthur Guez, Keith Bush, Aaron Courville and Massimo Avoli)
The goal of this project is to investigate the use of reinforcement learning to optimize deep-brain stimulation strategies for the treatment of epilepsy. We began by investigating the problem of seizure detection from electrophysiological recordings (see [CanAI'07]). We are currently implementing a mathematical model of the brain neural network, which exhibits the synchronous patterns that are characteristic of epilepsy.
We have recently shown that reinforcement learning can be used to optimize treatment strategies using batch data collected from in-vitro rat hippocampal slices under fixed stimulation policies (see [IAAI'08]).
Past research activities:
My past research has focused on developing new planning approaches for the POMDP framework. More details are available on my old CMU web page.
I was also actively involved in the Nursebot project, which developed a nursing-assistant robot prototype that provided help and companionship to elderly individuals (see [RAS'03] [AAAI'02]).