SOCS Undergraduate Summer Research Symposium 2008

The SOCS Undergraduate Summer Research Symposium 2008 is an opportunity for undergraduate students working with SOCS professors during the summer of 2008 to showcase their research and compete for prizes, which will be decided by a jury of SOCS professors. All undergraduate summer students may participate, whether their research arrangement is formal (e.g., NSERC USRAs or COMP-396 students) or informal, and whether paid or unpaid.

Dates

Monday Aug 18 - Send titles and 1-paragraph plain-text abstracts to Doina Precup (dprecup at cs dot mcgill dot ca)

Thursday Aug. 28 - Equipment test, MC103, 3:30pm

Friday Aug. 29 - The day of the symposium, in MC103

The Jury

  • Prof. Luc Devroye
  • Prof. Ted Perkins
  • Prof. Clark Verbrugge
  • Schedule for Aug. 29

    TimePresenterTitleSupervisor
    9:00 Coffee + treats
    9:15 Jesse DohertyDesign and development of an intermediate representation for the Matlab languageLaurie Hendren
    Maja FrydrychowiczTermination in a functional language equipped with higher-order dataBrigitte Pientka
    Tristan RatchfordRanking recommendations for program navigationMartin Robillard
    David KawrykowDetecting inefficient API usage in source codeMartin Robillard
    David MarpleUnderstanding bug severity assessment in open source projectsMartin Robillard
    10:30 Break
    10:45 Volodymyr KuleshovHow to beat a multi-armed banditDoina Precup
    Dorna Kashef HaghighiA learning algorithm for automata with observationsDoina Precup
    Artem KaznatcheevEvolutionary game theoryThomas R. Schultz
    Tim GeogheganKubeGenerator: Variations on the Rubik's cube and Sudokube generation strategiesClaude Crepeau
    Jan FlorjanczykInformation locking in the black hole information loss paradoxPatrick Hayden
    Dana MendelsonAn Implementation of the Solovay-Kitaev AlgorithmAbubak Muhammad and Patrick Hayden
    12:15 Lunch (Asha, Indian restaurant, 3490 Ave. du Parc)
    2:00 Emmanuel Piuze-PhaneufPerformance-Driven Facial AnimationPaul Kry
    Benjamin PeckHaptic Snow: A graphic model for interactive material deformation in virtual realityPaul Kry
    Gheorghe ComaniciOptimal time scales for reinforcement learning behavior strategiesDoina Precup
    Mathieu Lavallee-AdamMining protein-protein interaction networks for local functional enrichmentMathieu Blanchette
    3:00 Break
    3:15 Anqi XuFourier Tags - Smoothly Degradable Fiducial MarkersGreg Dudek
    Alexandre Frechette Methods for exploring large graphsDoina Precup
    Alexis MalozemoffGINI: A user-level toolkit for constructing micro-internetsMuthucumaru Maheswaran
    Andrew FrankelSocially aware security and the end of theftMuthucumaru Maheswaran
    4:30 Reception and Awards

    Abstracts

    Evolutionary game theory by Artem Kaznatcheev

    Cooperation is essential for complex biological systems, especially human society, but its evolution is puzzling because of the competitiveness of natural selection. This intricate relation between evolution and cooperation can be addressed through game theory. This talk will discuss recent efforts in evolutionary game theory and their far-reaching implications; from cancer to government. In particular, I will discuss recent agent-based simulations, introduce a novel mathematical analysis, and use both to show that a popular hypothesis for the emergence of cooperation - the green-beard effect - cannot create cooperation, only sustain it.

    Performance-driven facial animation by Emmanuel Piuze-Phaneuf

    Facial animation is typically produced using a character rig, which consists of a geometric model and a set of parametric deformations. While creating a character rig is a complex task, the subsequent problem of creating parameter trajectories that produce sincere, fluid, and subtle expressions is also difficult and time consuming. Motion capture can greatly simplify this task of animating the rig parameters, and has the benefit of directly capturing all the subtle movements of a real face. With motion capture, the problem is reduced to finding a mapping from the captured performance to an existing rig; however, this is not always straightforward due to mismatches between a subject's movements and the rig deformations. This project uses a real-time motion capture system to animate a collection of face rigs with the objective of improving each rig's mapping function interactively

    A Learning Algorithm for Automata with Observations by Dorna Kashef Haghighi

    We consider the problem of learning the behavior of a Partially Observable Markov Decision Process (POMDP) with deterministic actions and observations. This remains a challenging problem, especially if the observations only partially identify the states. In this work we propose an algorithm which can be used to summarize large amounts of observed data into an automaton that simulates the original POMDP. This is particularly useful for generating and predicting future data, and can be used as part of a planning and decision-making system.

    Methods for exploring large graphs by Alexandre Frechette

    Exploring large graphs is important in many applications, such as web crawling, social network mining or robotics. I will describe different methods that help the management of an agent's memory during large graph exploration. Forgetting some of the data acquired during exploration is crucial for preventing memory overload. I will present two ideas for removing useless parts of the data. The first one is local forgetting, in which the agent must free some memory as it is acting in the world, in a simple and efficient way. Graph cutting and randomization are useful for this purpose. The second is global forgetting. In this case, processing can be done in a slower manner, and the goal is to minimize the remaining amount of memory. I will show how a system of marking the nodes is helpful for this purpose.

    How to beat a multi-armed bandit by Volodymyr Kuleshov

    In order to study the tradeoff any learning agent has to make between exploiting its known options and trying unknown but potentially better ones, scientists have come up with a mathematical model called the multi-armed bandit. It consists of an n-armed slot machine whose arms have different probabilities of giving back a reward. The player must find a balance between exploiting high-paying arms and searching for better ones. I will explain different strategies for playing this game and look at what are their strengths and weaknesses.

    Haptic Snow: A graphic model for interactive material deformation in virtual reality by Benjamin Peck

    Ground material provides important information about an environment, such as location. In previous work, a floor-space system is designed that provides the impression of walking on various terrains by rendering graphical, audio and haptic stimuli synchronously with low latency. If this effort is successful, we might provide people with the sensation of walking on remote terrains, such as arctic snow, desert sand, or the lunar surface. This presentation will focus on the graphical model of deformable terrains, and the incorporation of motion capture and force sensor data in order to link the graphic model to the entire floor-space system. The graphic model must display the current state of the terrain and deform realistically when rigid objects such as feet interact with it in real time. We show that by utilizing programmable graphics processors to compute deformation calculations, this level of interaction is possible at very high frame rates.

    Termination in a Functional Language Equipped with Higher-Order Data by Maja Frydrychowicz

    We examine the termination properties of the functional language Beluga. This language can be used to recursively analyze open terms, a feature that allows the automation of programming language metatheory. Our goal is to adapt a termination proof that relies on the notion of "structurally recursive" terms. We extend the definition of structural recursion to account for the contextual modalities that allow Beluga to analyze higher-order data.

    An Implementation of the Solovay-Kitaev Algorithm by Dana Mendelson

    A quantum circuit is a sequence of unitary operations called gates that rotate qubits, the complex vectors that represent information in quantum computation. Quantum compiling addresses the problem of efficiently approximating these arbitrary unitaries with another sequence of gates. Realistically such an implementation should begin with only a finite set of easily implementable gates. The Solovay- Kitaev theorem proves that this is possible and an algorithm based on this theorem has been developed. This project describes an implementation of the Solovay-Kitaev algorithm and analyzes the performance of the implementation by simulating Grover quantum search algorithm.

    Information locking in the black hole information loss paradox by Jan Florjanczyk

    Information entering a black hole can be lost if the black hole evaporates due to Hawking radiation. It is possible, however, for the emitted radiation to contain the information. We estimate the rate at which information will emerge from the black hole using a model constructed from plausible assumptions. First, we find that the message is completely hidden (locked) when the size of the emitted radiation is less than the size of the inbound message. We show that no operation performed by an observer outside the black hole can successfully decrease the amount of uncertainty about the message. We also find that the message is completely accessible when the size of the emitted radiation is greater than that of the message. We show that the likelihood of wrongly identifying a message in the radiation is small.

    Detecting Inefficient API Usage in Source Code by David Kawrykow

    Large software projects often rely on a number of third-party libraries that are exported through Application Programming Interfaces (API). We have observed instances where APIs were used in ways that are clearly not efficient. Such inefficient API usage is sympomatic of library use by novices or of code that was not adapted to updated libraries. We developed a technique to automatically detect instances of inefficient API usage in software projects. Our technique analyzes source code to find interactions with an API that mimics code in the library itself. In addition to warning developers of potentially inefficient API usage, our technique also indicates how to improve how the API is used. Application of our approach on the source code of ten popular open-source Java systems revealed almost 200 instances of inefficient API usage. Adapting the code to use the recommended API produced a significant number of tangible improvements in the quality of that code.

    Understanding Bug Severity Assessment in Open Source Projects by David Marple

    Bug severity assessment is the task of rating the perceived severity of a bug. This task is typically performed by the user reporting the bug. Evidence shows bug severity assessment to be problematic for many projects: developers use inconsistent standards, and often provide incorrect severity assessments. The goal of this work is to understand the phenomenon of bug severity assessment, and in particular the factors influencing its outcome. We conducted a qualitative exploratory study of bug severity assessment in four large open source projects.

    Ranking Recommendations for Program Navigation by Tristan Ratchford

    Most software change tasks require developers to navigate the source code to understand how a subset of it works. During such investigation, developers can become stuck---unsure what to look at or where to go next to increase their knowledge of the code relevant to their task. In prior work, we proposed an algorithm to automatically recommend program elements likely to be relevant to a task. This algorithm was based solely on an analysis of structural dependencies in the source code. However, keywords used by programmers also carry information that can be useful to guide developers towards code of interest. We conducted an empirical study to measure the usefulness of textual similarity for recommending locations in the source code. We compared three recommendation approaches: structural dependencies only, textual similarity only, and a combination of both.

    Design and development of an intermediate representation for the Matlab language by Jesse Doherty

    Matlab is a high level programming language broadly used in the scien- tific community. It includes many high level and dynamic features to make programming simpler. However these features can also be a detriment to performance. The goal of the McLab project is to create an optimizing compiler framework for the Matlab programming language. Such a frame- work requires an intermediate representation (IR) of the program code on which to perform optimizations. This IR must be expressive enough to capture important features in Matlab, but must also be useful for static analysis and optimizing transformations. This project is focused on the design and development of the IR and is part of the design and devel- opment of the McLab system. The IR is designed to be tree-based and closely matching the structure of the Matlab language. This allows us to capture more high level information such as various looping constructs within the IR. We have also made certain information more explicit within the IR. Such information includes explicit declaration sites for variables. This structure and added information has allowed us to start creating analyses which will be used for optimizations.

    Optimal Time-Scales for Reinforcement Learning Behavior Strategies by Gheorghe Comanici

    Reinforcement learning methods are easily applied to robotic control, game playing and operations research problems because they can easily be expressed as stochastic transition models with well defined behavior performance measures. However, other prior knowledge that cannot be captured implicitly in the model formulation is ignored. This includes knowledge that can improve the speed and reliability of existing learning processes, like behavior strategies that are supposed to perform well locally or have been determined using some other method. We first propose a mathematical model that relies on Semi-Markov Decision Processes to determine on-line optimal time usage of prior strategies based on different performance measures. Next, we propose an algorithm that asynchronously learns optimal time-usage and optimal control based on the proposed model.

    Mining protein-protein interaction networks for local functional enrichment by Mathieu Lavallee-Adam

    A human cell contains thousands of different proteins that, through their interactions, define its behaviour. The network of interactions formed by these proteins can reveal much about the function of individual proteins and protein complexes. A computational approach is required in order to extract functional information from this type of large, dense, and complex network. We developed an algorithm analyzing the clustering, within a given network, of proteins sharing a same Gene Ontology (GO) functional annotation. We first defined a distance and a similarity measure of clustering. We then described a set of methods to assess the statistical significance of the observed clustering scores: Given a fixed undirected graph G, an integer k and a total similarity t, what is the probability that a random subset of k vertices of G has a total pairwise similarity greater or equal to t? We describe two families of analytical methods to approximate the distribution of clustering scores. The first is based on a normal distribution hypothesis, where the mean and variance are estimated analytically from the graph structure. The second family of approaches, based on the convolution of a number of simpler, data-estimated distributions, is also introduced. Simulations show that our approaches yield relatively tight upper and lower bounds on the p-values of each type of GO functional annotation. This algorithm was applied to two large protein-protein interaction networks: one containing 2 559 proteins of the yeast Saccharomyces cerevisiae (Krogan et al., 2006) and one containing 1154 proteins involved in mammalian transcription and RNA processing machineries (Jeronimo et al., 2007) obtained from our collaborator Benoit Coulombe at the IRCM (Institut de Recherches Cliniques de Montreal). We were able to recover known functional groups and to reveal promising functional annotations to specific sub-networks, such as microtubule motor activity related proteins, which will help direct future experimental validation.

    Fourier Tags - Smoothly Degradable Fiducial Markers by Anqi Xu

    Fourier Tags are synthetic fiducial markers used to visually encode information and provide controllable positioning. These symbols, akin to bar codes, are designed to be efficiently and robustly detected in an image, and have the property that the information content degrades gracefully with resolution. Each tag's payload and signature information are encoded within the frequency and phase spectra of the respective symbol. This encoding scheme allows part of the payload to be extracted even when the tag is viewed from afar, or when the image is blurred. This talk introduces the first complete implementation of the Fourier Tag protocol as three components: the tag encoder, the image detector, and the tag decoder.

    GINI: A Toolkit for Constructing User-Level Micro Internets by Alexis Malozemoff

    Experimenting with computer networks in a safe and secure manner is very difficult to achieve, due to the inherent insecurities in the Internet and the large number of critical applications being run. This causes problems for students in networking courses, where experimentation and modification of core network code is essential to learning the ins and outs of networking. The GINI Toolkit was designed to alleviate these issues by providing a user-level toolkit to create large and arbitrary network topologies, all running on a single computer, or distributed between multiple computers. In this presentation we will discuss the inner workings of the GINI Toolkit, and how it can be used as a tool for teaching computer networking as well as experimentating with security tools

    KubeGenerator: Variations on the Rubik and Sudokube Generation Strategies by Tim Geoghegan

    Since its introduction to the world in 1980, the Rubikube has fascinated puzzle-lovers everywhere. More serious fans of these geometric puzzles have also played with a variety of related puzzles: larger cubes, different shapes and more complex solution requirements. The KubeGenerator project studies the Sudokube, a 4x4x4 cross between a Rubikbe and a Sudoku where each square is labeled with one of sixteen labels, which must occur uniquely on each face and on each column and row of the cube. Solving the Sudokube is a formidable challenge, but the KubeGenerator project is concerned with a less difficult but no less crucial problem: generating solvaban not create cooperation, only sustain it.le Sudokubes. This talk will discuss the Sudokube, the rules for solving it, and algorithms that have been implemented in KubeGenerator to create solvable cubes. We will also explore other questions of interest relating to the Sudokube as well as approaches to solution methods.

    Socially aware security and the end of theft by Andrew Frankel

    Many schemes exist that allow various devices to identify their owners or particular users. People are well-accustomed to remembering passwords (something-you-know) and to carrying house keys (something-you-have). These traditional factors all share the common need for the user to present his credentials in some sort of an authentication ritual. We propose a novel new technique that can instead identify users ubiquitously and transparently by analyzing the device's social context, that is, the set of other people and devices in the near vicinity (someone-you-know). Such so-called socially aware security is particularly suited to detecting and resolving thefts. We discuss the feasibility and challenges of implementing such a security scheme using simulations conducted on real-world data.