# SOCS Undergraduate Summer Research Symposium 2008

The SOCS Undergraduate Summer Research Symposium 2008 is an opportunity for
undergraduate students working with SOCS professors during the summer of 2008
to showcase their research and compete for prizes, which will be decided by a
jury of SOCS professors. All undergraduate summer students may participate,
whether their research arrangement is formal (e.g., NSERC USRAs or COMP-396
students) or informal, and whether paid or unpaid.
### Dates

**Monday Aug 18** - Send titles and 1-paragraph plain-text abstracts to Doina Precup (dprecup at cs dot mcgill dot ca)
**Thursday Aug. 28** - Equipment test, MC103, 3:30pm

**Friday Aug. 29** - The day of the symposium, in MC103

### The Jury

Prof. Luc Devroye
Prof. Ted Perkins
Prof. Clark Verbrugge
### Schedule for Aug. 29

### Abstracts

**Evolutionary game theory** by **Artem Kaznatcheev**
Cooperation is essential for complex biological systems, especially human society, but its evolution is puzzling because of the competitiveness of natural selection. This intricate relation between evolution and cooperation can be addressed through game theory. This talk will discuss recent efforts in evolutionary game theory and their far-reaching implications; from cancer to government. In particular, I will discuss recent agent-based simulations, introduce a novel mathematical analysis, and use both to show that a popular hypothesis for the emergence of cooperation - the green-beard effect - cannot create cooperation, only sustain it.

**Performance-driven facial animation** by **Emmanuel Piuze-Phaneuf**

Facial animation is typically produced using a character rig, which consists of a geometric model and a set of parametric deformations. While creating a character rig is a complex task, the subsequent problem of creating parameter trajectories that produce sincere, fluid, and subtle expressions is also difficult and time consuming. Motion capture can greatly simplify this task of animating the rig parameters, and has the benefit of directly capturing all the subtle movements of a real face. With motion capture, the problem is reduced to finding a mapping from the captured performance to an existing rig; however, this is not always straightforward due to mismatches between a subject's movements and the rig deformations. This project uses a real-time motion capture system to animate a collection of face rigs with the objective of improving each rig's mapping function interactively

**A Learning Algorithm for Automata with Observations** by **Dorna Kashef Haghighi**

We consider the problem of learning the behavior of a Partially Observable Markov Decision Process (POMDP) with
deterministic actions and observations. This remains a challenging problem, especially if the observations only
partially identify the states. In this work we propose an algorithm which can be used to summarize large amounts
of observed data into an automaton that simulates the original POMDP. This is particularly useful for generating
and predicting future data, and can be used as part of a planning and decision-making system.

**Methods for exploring large graphs** by **Alexandre Frechette**

Exploring large graphs is important in many applications, such as web crawling, social network mining or robotics. I will describe
different methods that help the management of an agent's memory during large graph exploration. Forgetting
some of the data acquired during exploration is crucial for preventing memory overload. I will present two ideas for removing useless
parts of the data. The first one is local forgetting, in which the agent must free some memory as it is acting in the world, in a
simple and efficient way. Graph cutting and randomization are useful for this purpose. The second is global forgetting. In this
case, processing can be done in a slower manner, and the goal is to minimize the remaining amount of memory. I will show how
a system of marking the nodes is helpful for this purpose.

**How to beat a multi-armed bandit** by **Volodymyr Kuleshov**

In order to study the tradeoff any learning agent has to make between exploiting its known options
and trying unknown but potentially better ones, scientists have come up with a mathematical model
called the multi-armed bandit. It consists of an n-armed slot machine whose arms have different
probabilities of giving back a reward. The player must find a balance between exploiting high-paying
arms and searching for better ones. I will explain different strategies for playing this game
and look at what are their strengths and weaknesses.

**Haptic Snow: A graphic model for interactive material deformation in virtual reality** by **Benjamin Peck**

Ground material provides important information about an environment, such as location. In previous work, a floor-space system is
designed that provides the impression of walking on various terrains by rendering graphical, audio and haptic
stimuli synchronously with low latency. If this effort is successful, we might provide people with the sensation of walking on remote terrains, such as arctic snow, desert sand, or the lunar surface. This presentation will focus on the graphical model of deformable terrains, and the incorporation of motion capture and force sensor data in order to link the graphic model to the entire floor-space system. The graphic model must display the current state of the terrain and deform realistically when rigid objects such as feet interact with it in real time. We show that by utilizing programmable graphics processors to compute deformation calculations, this level of interaction is possible at very high frame rates.

**Termination in a Functional Language Equipped with Higher-Order Data** by ** Maja Frydrychowicz**

We examine the termination properties of the functional
language Beluga. This language can be used to recursively analyze open
terms, a feature that allows the automation of programming language
metatheory. Our goal is to adapt a termination proof that relies on
the notion of "structurally recursive" terms. We extend the definition
of structural recursion to account for the contextual modalities that
allow Beluga to analyze higher-order data.

**An Implementation of the Solovay-Kitaev Algorithm** by **Dana Mendelson**

A quantum circuit is a sequence of unitary operations called gates that rotate qubits, the complex vectors that represent information in quantum computation. Quantum compiling addresses the problem of efficiently approximating these arbitrary unitaries with another sequence of gates. Realistically such an implementation should begin with only a finite set of easily implementable gates. The Solovay- Kitaev theorem proves that this is possible and an algorithm based on this theorem has been developed. This project describes an implementation of the Solovay-Kitaev algorithm and analyzes the performance of the implementation by simulating Grover quantum search algorithm.

**Information locking in the black hole information loss paradox** by **Jan Florjanczyk**

Information entering a black hole can be lost if the black hole evaporates due to Hawking radiation. It is
possible, however, for the emitted radiation to contain the information. We estimate the rate at which
information will emerge from the black hole using a model constructed from plausible assumptions.
First, we find that the message is completely hidden (locked) when the size of the emitted radiation
is less than the size of the inbound message. We show that no operation performed by an observer
outside the black hole can successfully decrease the amount of uncertainty about the message. We also
find that the message is completely accessible when the size of the emitted radiation is greater than
that of the message. We show that the likelihood of wrongly identifying a message in the radiation is
small.

**Detecting Inefficient API Usage in Source Code** by **David Kawrykow**

Large software projects often rely on a number of third-party libraries
that are exported through Application Programming Interfaces (API). We
have observed instances where APIs were used in ways that are clearly not
efficient. Such inefficient API usage is sympomatic of library use by
novices or of code that was not adapted to updated libraries. We developed
a technique to automatically detect instances of inefficient API usage in
software projects. Our technique analyzes source code to find interactions
with an API that mimics code in the library itself. In addition to warning
developers of potentially inefficient API usage, our technique also
indicates how to improve how the API is used. Application of our approach
on the source code of ten popular open-source Java systems revealed almost
200 instances of inefficient API usage. Adapting the code to use the
recommended API produced a significant number of tangible improvements in
the quality of that code.

** Understanding Bug Severity Assessment in Open Source Projects** by **David Marple**

Bug severity assessment is the task of rating the perceived severity of a bug. This task is typically performed by the user reporting the bug. Evidence shows bug severity assessment to be problematic for many projects: developers use inconsistent standards, and often provide incorrect severity assessments. The goal of this work is to understand the phenomenon of bug severity assessment, and in particular the factors influencing its outcome. We conducted a qualitative exploratory study of bug severity assessment in four large open source projects.

**Ranking Recommendations for Program Navigation** by **Tristan Ratchford**

Most software change tasks require developers to navigate the source
code to understand how a subset of it works. During such
investigation, developers can become stuck---unsure what to look at or
where to go next to increase their knowledge of the code relevant to
their task. In prior work, we proposed an algorithm to automatically
recommend program elements likely to be relevant to a task. This
algorithm was based solely on an analysis of structural dependencies in
the source code. However, keywords used by programmers also carry
information that can be useful to guide developers towards code of
interest. We conducted an empirical study to measure the usefulness of
textual similarity for recommending locations in the source code. We
compared three recommendation approaches: structural dependencies
only, textual similarity only, and a combination of both.

**Design and development of an intermediate representation for
the Matlab language** by **Jesse Doherty**

Matlab is a high level programming language broadly used in the scien-
tific community. It includes many high level and dynamic features to make
programming simpler. However these features can also be a detriment to
performance. The goal of the McLab project is to create an optimizing
compiler framework for the Matlab programming language. Such a frame-
work requires an intermediate representation (IR) of the program code on
which to perform optimizations. This IR must be expressive enough to
capture important features in Matlab, but must also be useful for static
analysis and optimizing transformations. This project is focused on the
design and development of the IR and is part of the design and devel-
opment of the McLab system. The IR is designed to be tree-based and
closely matching the structure of the Matlab language. This allows us
to capture more high level information such as various looping constructs
within the IR. We have also made certain information more explicit within
the IR. Such information includes explicit declaration sites for variables.
This structure and added information has allowed us to start creating
analyses which will be used for optimizations.

**Optimal Time-Scales for Reinforcement Learning Behavior Strategies** by **Gheorghe Comanici**

Reinforcement learning methods are easily applied to robotic control, game playing and operations research problems because they can easily be expressed as stochastic transition models with well defined behavior performance measures. However, other prior knowledge that cannot be captured implicitly in the model formulation is ignored. This includes knowledge that can improve the speed and reliability of existing learning processes, like behavior strategies that are supposed to perform well locally or have been determined using some other method. We first propose a mathematical model that relies on Semi-Markov Decision Processes to determine on-line optimal time usage of prior strategies based on different performance measures. Next, we propose an algorithm that asynchronously learns optimal time-usage and optimal control based on the proposed model.

**Mining protein-protein interaction networks for local functional enrichment** by **Mathieu Lavallee-Adam**

A human cell contains thousands of different proteins that, through their interactions, define its behaviour.
The network of interactions formed by these proteins can reveal much about the function of individual proteins
and protein complexes. A computational approach is required in order to extract functional information from this type of
large, dense, and complex network. We developed an algorithm analyzing the clustering, within a given network, of
proteins sharing a same Gene Ontology (GO) functional annotation. We first defined a distance and a similarity
measure of clustering. We then described a set of methods to assess the statistical significance of the observed
clustering scores: Given a fixed undirected graph G, an integer k and a total similarity t, what is the probability that a random subset of k vertices of G has a total pairwise similarity greater or equal to t? We describe two families of analytical methods to approximate the distribution of clustering scores. The first is based on a normal distribution hypothesis, where the mean and variance are estimated analytically from the graph structure. The second family of approaches, based on the convolution of a number of simpler, data-estimated distributions, is also introduced. Simulations show that our approaches yield relatively tight upper and lower bounds on the p-values of each type of GO functional annotation. This algorithm was applied to two large protein-protein interaction networks: one containing 2 559 proteins of the yeast Saccharomyces cerevisiae (Krogan et al., 2006) and one containing 1154 proteins involved in mammalian transcription and RNA processing machineries (Jeronimo et al., 2007) obtained from our collaborator Benoit Coulombe at the IRCM (Institut de Recherches Cliniques de Montreal). We were able to recover known functional groups and to reveal promising functional annotations to specific sub-networks, such as microtubule motor activity related proteins, which will help direct future experimental validation.

**Fourier Tags - Smoothly Degradable Fiducial Markers** by **Anqi Xu**

Fourier Tags are synthetic fiducial markers used to visually encode information and provide controllable positioning. These symbols, akin to bar codes, are designed to be efficiently and robustly detected in an image, and have the property that the information content degrades gracefully with resolution. Each tag's payload and signature information are encoded within the frequency and phase spectra of the respective symbol. This encoding scheme allows part of the payload to be extracted even when the tag is viewed from afar, or when the image is blurred. This talk introduces the first complete implementation of the Fourier Tag protocol as three components: the tag encoder, the image detector, and the tag decoder.

**GINI: A Toolkit for Constructing User-Level Micro Internets** by **Alexis Malozemoff**

Experimenting with computer networks in a safe and secure manner is
very difficult to achieve, due to the inherent insecurities in the
Internet and the large number of critical applications being run.
This causes problems for students in networking courses, where experimentation
and modification of core network code is essential to learning the ins and
outs of networking. The GINI Toolkit was designed to alleviate these issues
by providing a user-level toolkit to create large and arbitrary network
topologies, all running on a single computer, or distributed between
multiple computers. In this presentation we will discuss the inner workings
of the GINI Toolkit, and how it can be used as a tool for teaching
computer networking as well as experimentating with security tools

**
KubeGenerator:
Variations on the Rubik and Sudokube Generation Strategies** by **Tim Geoghegan**

Since its introduction to the world in 1980, the Rubikube has fascinated puzzle-lovers everywhere. More serious fans of these geometric puzzles have also played with a variety of related puzzles: larger cubes, different shapes and more complex solution requirements. The KubeGenerator project studies the Sudokube, a 4x4x4 cross between a Rubikbe and a Sudoku where each square is labeled with one of sixteen labels, which must occur uniquely on each face and on each column and row of the cube. Solving the Sudokube is a formidable challenge, but the KubeGenerator project is concerned with a less difficult but no less crucial problem: generating solvaban not create cooperation, only sustain it.le Sudokubes. This talk will discuss the Sudokube, the rules for solving it, and algorithms that have been implemented in KubeGenerator to create solvable cubes. We will also explore other questions of interest relating to the Sudokube as well as approaches to solution methods.

**Socially aware security and the end of theft** by **Andrew Frankel**

Many schemes exist that allow various devices to identify their owners or particular users. People are well-accustomed to remembering passwords (something-you-know) and to carrying house keys (something-you-have). These traditional factors all share the common need for the user to present his credentials in some sort of an authentication ritual. We propose a novel new technique that can instead identify users ubiquitously and transparently by analyzing the device's social context, that is, the set of other people and devices in the near vicinity (someone-you-know). Such so-called socially aware security is particularly suited to detecting and resolving thefts. We discuss the feasibility and challenges of implementing such a security scheme using simulations conducted on real-world data.