Previous Research Projects

More Efficient Trace Matching through Data Sharing

(Winter 2008, Course project, supervised by Laurie Hendren, and Eric Bodden)
    Tracematches are an extension to aspect-oriented programming which offer more flexibility by making it possible to observe the history of a computation. Using tracematches, it is possible to execute advice after observing a certain sequence of events in the computation, while still preserving the conciseness and readability of the code [1]. Unfortunately, these new features come at a cost. In their current implementation, tracematches require that several generated classes be woven into the original code, causing significant overhead in terms of execution time and memory usage.
    When several different tracematches are woven into the same piece of code, these costs are magnified, since each tracematch operates independently of all others. There is also a high probability that these tracematches share many similarities. This means that there may be a significant amount of overlap in the computations being done by each tracematch. In this case, if the tracematches were able to share data between them, rather than operating independently, it would be possible to reduce both the runtime and memory costs of the tracematches.
    The goal of this project was to develop an algorithm for data-sharing in tracematches, by means of merging the underlying structure beneath the tracematches (finite automata). In addition to this, some preliminary results are presented to validate the approach.
    Project report is here. Slides can be found here. Supporting code can be found here.

Equivalence Concepts in Extensive Games

(Fall 2007, course project, supervised by Shie Mannor)
    Games and game theory are being used to model more and more different concepts. These concepts vary from internet routing, to economics, to artificial intelligence. As the concepts modelled by these games become more and more diverse, an interesting question to ask is ”are two games the same?”. The reason this is such an interesting question is that its answer could connect two previously unrelated fields, which might then mutually benefit from one another. In this work, we focus on finite two-player extensive games. We look at several different equivalence relations of extensive games, and investigate which ones preserve subgame-perfect equilibria.
    Project report is here. Slides can be found here.

Methods for Policy Transfer in Markov Decision Processes

(Summer 2007, Summer Research project, supervised by Joelle Pineau)
    Reinforcement learning agents frequently are required to learn and preform several tasks that are very similar in nature. Therefore, it would be very beneficial if the RL agent, like a human being, could apply the knowledge it learned in one task to solving other similar (but not identical) tasks. Another desirable quality in an RL agent is the ability to combine knowledge from two or more previously learned tasks, and apply it to the one at hand. An effective way to formulate these "tasks" is in Markov Decision Processes (MDPs). An MDP is considered to be solved when the agent learns an optimal or near optimal policy which describes how to behave in the MDP, in order to maximize its reward. Using bisimulation metrics described in previous works, we present and evaluate several methods of policy transfer. That is, given a target MDP, and one or more known MDPs, with corresponding policies, we look at several ways of applying the policies of the known MDPs to the target one, in order to best transfer the knowledge acquired in the known systems. Finally, we present some preliminary results on how these techniques can be used in practice.
Slides can be found here.

Exploiting Symmetry in MDPs

(Winter 2007, Course project, supervised by Prakash Panangaden)
    Markov Decision Processes (MDPs) are a common way of encoding decision making problems in reinforce- ment learning environments. They provide a standard formalism for multi-stage decision making, while at the same time being versatile enough to capture the stochastic nature of realistic situations. The goal of an MDP is to maximize some measure of performance over time. An MDP is considered to be solved when a policy (a way of behaving for each state) has been discovered which maximizes the expected return. However, finding this policy is a computationally intensive task, usually done using dynamic programming algorithms such as policy iteration or value iteration. It is often the case that such algo- rithms are too costly to be used on any decision-making problem of realistic size. Thus, much focus has been put on finding methods for collapsing the size of the problem, without losing any of the information. In this paper, we investigate ways to collapse MDPs by exploiting the symmetries in the state space. The ideas presented in this paper are mostly derived from Barto and Ravindran’s work: [Barto and Ravindran, 2001]. We examine several different ways of capturing symmetries in systems, and work though examples to demonstrate the strengths and weaknesses of each.
    Project report is here. Slides can be found here.

Knowledge Transfer in MDPs

(Summer 2006, NSERC USRA/ CDMP project, supervised by Prakash Panangaden, Doina Precup, and Joelle Pineau)
    Markov Decision Processes (MDPs) are an effective way to formulate many problems in Machine Learning. However, learning the optimal policy for an MDP can be a time-consuming process, especially when nothing is known about the policy to begin with. An alternative approach is to find similar MDP, for which an optimal policy is known, and modify this policy as needed. We present a framework for measuring the quality of knowledge transfer when transferring policies from one MDP to another. Our formulation is based upon the use of MDP bisimulation metrics, which provide a stable quantitative notion of state similarity for MDPs. Given two MDPs and a state mapping from the first to the second, a policy defined on the latter naturally induces a policy on the former. We provide a bound on the value function of the induced policy, showing that if the two MDPs are behaviourally close in terms of bisimulation distance and the original policy is close to optimal then the induced policy is guaranteed to be close to optimal as well. We also present some experiments in which simple MDPs are used to test the tightness of the bound provided by the bisimulation distance. In light of the results of these experiments, we suggest a new similarity measure.
    For more details on this project check out the site I maintained while working on it, or the poster I presented.