##### Personal tools
You are here: Home Seminar History

 Date( Winter 2007 ) Category Seminar Info 2007/04/30 Bioinformatics Place: Duff Medical Building, 3775 University Street, Main Amphitheatre 1 Time: 12:00 - 13:00 Speaker: Mark Gerstein Affiliation: Molecular Biophysics & Biochemistry, Computer Science, Yale University Area: Analysis and understanding of the human genome Title: Human Genome Annotation Abstract: A central problem for 21st century science will be the analysis and understanding of the human genome. My talk will be concerned with topics within this area, in particular annotating pseudogenes (protein fossils) in the genome. I will discuss a comprehensive pseudogene identification pipeline and storage database we have built. This has enabled use to identify >10K pseudogenes in the human and mouse genomes and analyze their distribution with respect to age, protein family, and chromosomal location. One interesting finding is the large number of ribosomal pseudogenes in the human genome, with 80 functional ribosomal proteins giving rise to ~2,000 ribosomal protein pseudogenes. I will try to inter-relate our studies on pseudogenes with those on tiling arrays, which enable one to comprehensively probe the activity of intergenic regions. At the end I will bring these together, trying to assess the transcriptional activity of pseudogenes. Throughout I will try to introduce some of the computational algorithms and approaches that are required for genome annotation and tiling arrays -- i.e. the construction of annotation pipelines, developing algorithms for optimal tiling, and refining approaches for scoring microarrays. 2007/04/30 Faculty Candidate Talk Place: MC437 Time: 9:30 - 10:30 Speaker: Christine Vogel Affiliation: nstitute for Cellular and Molecular Biology, The University of Texas at Austin Area: aspects of domain duplication and combination Title: The Evolution of Complex Protein Repertoires - and beyond Abstract: Much of an organism's physiology is determined by the proteins encoded in its genes. Proteins in turn are composed of smaller structural, functional and evolutionary units called domains. Duplication, combination and divergence of these domains have shaped the protein repertoire by forming protein families and multi-domain proteins. My talk discusses different aspects of domain duplication and combination from a genomic, structural and functional perspective; it describes the relationship between the two processes, and their likely contributions to the evolution of complex organisms. I will also briefly describe more recent work which addresses 'faster' processes that shape the protein repertoire, e.g., a mass-spectrometry-based method to determine absolute protein abundance and its applications Biography of Speaker: Since 2005, Christine Vogel is a Post-doctoral Fellow at the Institute for Cellular and Molecular Biology at the University of Texas at Austin. In 2004, she received a Ph.D. in Computational and Structural Biology from the University of Cambridge, MRC Laboratory of Molecular Biology, U.K. Her research interests lie in the understanding of eukaryotic complexity, with a focus on development and application of an integrative and quantitative definition of eukaryotic cell types, combining morphological and large-scale molecular data for various eukaryotic model organisms. 2007/04/17 General Place: MC437 Time: 9:30 - 10:30 Speaker: Jernej Barbič Affiliation: Carnegie Mellon University Area: Graphics Title: Real-time Deformable Objects: Graphics, Haptics, Sound Abstract: Real-time deformable objects are an exciting research area in computer graphics, with applications to computer games, movie industry, CAD/CAM, and virtual medicine. Deformable objects are well-understood in solid mechanics, however the standard simulation algorithms are too slow for interactive simulation with detailed geometry. How can we support real-time simulation on commodity workstations, while compromising physical correctness as little as possible? First, I will present reduced-coordinate nonlinear deformable objects, a novel class of deformable objects obtained by applying statistical model reduction to finite element models. The idea is to replace the general degrees of freedom of a deformable object for a much smaller set of reduced degrees of freedom, thereby trading accuracy for speed. The reduced degrees of freedom incorporate geometric and material information, and are chosen automatically from the first principles of continuum mechanics. In addition, I will present a time-critical algorithm for collision detection, deformable object simulation and contact force computation between reduced-coordinate deformable (or rigid) objects with detailed geometry. The algorithm runs at haptic rates (1000 Hz), enabling applications in interactive path planning, virtual assembly and game haptics. Finally, I will present an algorithm for real-time sound synthesis where both the mechanical vibrations (deformations) that cause sound, and the sound propagation (wave equation) into the surrounding air for detailed geometries, are (approximately) simulated at audio rates (44,100 Hz). This captures effects such as object self-shadowing and diffraction of sound around corners, and further improves realism of real-time virtual environments. PDF version Biography of Speaker: 2007/04/16 General Place: MC437 Time: 9:30 - 10:30 Speaker: Vivek Kwatra Affiliation: University of North Carolina Area: Vision Title: Spatio-temporal Textural Modeling for Data-driven Synthesis and Visualization Abstract: With the increased accessibility of capture devices and techniques, rich amounts of real-world data is available to us in various forms including images, video, 3D models, motion capture, weather patterns, etc. On the other hand, one of the primary goals in computer graphics has been to synthesize this data from first principles either through simulation or user interaction. My research goal has been to develop synthesis-friendly models from spatio-temporal data that not only exploit the richness of real data, but also afford the controllability of simulation and interaction. In this talk, I will focus on synthesis methods that treat visual as well as dynamic data as texture. Such textural modeling is especially conducive for controllable synthesis because of its locality in space and time. I will first demonstrate a technique that allows for spatio-temporal extension of image and video data, and combined with intelligent user interfaces, can be used for computational photography applications such as smart image compositing and storyboarding. I will then describe a more flexible texture model that can be used to augment fluid simulation with appearance and shape textures to generate complex fluid effects such as ripples and foam in turbulent flow and patterns in the crust of flowing lava. I will also show how we can incorporate physical and geometric characteristics of the flow into the synthesis process in a temporally coherent manner. This technique, besides adding to the visual realism of the simulated fluid, also provides a handy visualization tool that lays bare significant features on the surface of the flowing fluid, which may not be apparent otherwise. Biography of Speaker: 2007/04/16 Bioinformatics Place: Duff Medical Building, 3775 University Street, Main Amphitheatre 1 Time: 13:30 - 14:30 Speaker: Dr. Daniel Figeys Affiliation: The Ottawa Institute of Systems Biology, BMI, University of Ottawa Area: Analyzing specific portions of the proteome (sub-proteome) Title: Probing sub-proteomes using affinity purification Abstract: The concentration range of protein present in biological samples remains a serious challenge for proteomic technologies. The proteome is defined as the ensemble of the proteins in a sample; however, the reality is that a good portion of the proteome remains invisible because of detection and processing limitations in proteomic technologies. We have developed technologies to analyze specific portions of the proteome (sub-proteome) that combine affinity purification, mass spectrometry, and bioinformatics. In this presentation, we will provide some examples of sub-proteome studies. First, we will report our results on mapping protein-protein interaction for over 330 human genes using immunopurification coupled to mass spectrometry and bioinformatics. Using this approach, 2235 human proteins were observed to participate in 6463 interactions with the bait proteins. A suite of bioinformatic approaches were used to assess the validity of the results. Second, we will discuss the mapping of protein polyubiquitination sites using affinity purification coupled to the proteome reactor and mass spectrometry and will discuss the potential applications of this approach. Biography of Speaker: 2007/04/12 Faculty Candidate Talk Place: MC437 Time: 9:30 - 10:30 Speaker: Martin Isenburg Affiliation: University California, Berkeley Area: HCI Title: Streaming Geometry Processing Abstract: Modern technology enables the creation of digital 3D models that represent objects or processes for scientific or engineering applications with incredible detail. Operating on these large data sets is difficult because they cannot be completely loaded into the main memory of common desktop PCs. We have developed new streaming representations for geometric data sets that we use as input to new streaming algorithms that can process data sets without first loading them into memory. The key insight is to keep the data in streams and document, for example, when all triangles around a vertex or all points in a particular spatial region have arrived with "finalization tags". These tags allow us to complete operations, output results, and de-allocate data structures. We have designed streaming simplification, compression, triangulation, and extraction algorithms that can operate on data sets much larger than the available memory. An added benefit of streaming is the ability to pipe several stream modules together and avoid storing temporary results to disk. I present two example processing pipelines: One extracts, simplifies, and compresses iso-surfaces from a volume grid. The other generates elevation contours from densely sampled terrain points via a temporarily constructed and then simplified triangulation. Biography of Speaker: Martin Isenburg is currently a postdoc at UC Berkeley. He received a Ph.D. in Computer Science from UNC Chapel Hill in 2004 and a M.Sc. in Computer Science from UBC Vancouver in 1999. He has published over 25 papers including three SIGGRAPH, five Visualization, and five journal publications. His dissertation is on compressing and streaming of polygon meshes, both in-core and out-of-core. Recently has worked on streaming other processing tasks for large geometric data sets. His SIGGRAPH'06 paper on Streaming Computation of Delaunay Triangulations is his showcase example for an extremely scalable streaming algorithm. 2007/04/05 General Place: MC437 Time: 9:30 - 10:30 Speaker: Paul Kry Affiliation: Université René Descartes, Paris (France) Title: Measurement, Models, and Simulation for Computer Animation Abstract: Motion capture has become popular for computer animation because it directly captures the subtleties of human motion. However, reusing captured motion becomes difficult when contact is involved. In addition to motion, the compliance with which a character makes contact also reveals important aspects of the movement's purpose. In this talk, I will present a new technique called Interaction Capture and Synthesis, for capturing and reusing these contact phenomena. Using hands and grasping as an example, we capture contact forces at the same time as motion, at a high rate, and use both to estimate a nominal reference trajectory and joint compliance. Unlike traditional methods, our method estimates joint compliance without the need for motorized perturbation devices. New interactions can then be synthesized by physically based simulation using a novel position-based linear complementarity problem formulation that includes friction, breaking contact, and the compliant coupling of contacts at different fingers. I will discuss validation and show examples of interaction synthesis. I will also briefly describe related problems in character deformation, contact sounds, and locomotion control. Biography of Speaker: Paul Kry is a postdoc with the EVASION group at INRIA Rhone Alpes and the LNRS at the Université René Descartes Paris. He received his PhD from the University of British Columbia, and conducted his research at both UBC and Rutgers while completing his thesis with Dinesh Pai. 2007/04/03 Math Place: MC103 Time: 16:00 - 17:00 Speaker: Jonathan Farley Affiliation: University of the West Indies Title: Distributive Lattices of Small Width: A problem from Stanley's Enumerative Combinatorics Abstract: In Richard P. Stanley's 1986 text, Enumerative Combinatorics, the following problem is posed: Fix a natural number k. Consider the posets P of cardinality n such that, for 0 < i < n , P has exactly k order ideals (down-sets) of cardinality i . Let f_k(n) be the number of such posets. What is the generating function \sum f_k(n) x^n? I will give a solution to this problem (joint work with Ryan Klippenstine.) I will also relate this to a problem of Ivo Rosenberg (University of Montreal) from the 1981 Banff Conference on Ordered Sets. Biography of Speaker: 2007/04/03 General Place: MC437 Time: 15:30 - 16:30 Speaker: Caitlin Kelleher Affiliation: School of Computer Science, Carnegie Mellon University Area: Human-Computer Interaction Title: Storytelling Alice: presenting programming as a means to the end of storytelling Abstract: The Higher Education Research Institute (HERI) estimates that the number of incoming college students intending to major in computer science has dropped by 70% since 2000, despite the fact that the projected need for computer scientists continues to grow. Increasing the numbers of female students who pursue computer science has the potential both to help fill projected computing jobs and improve the technology we create by diversifying the viewpoints that influence technology design. Numerous studies have found that girls begin to turn away from math and science related disciplines, including computer science, during middle school. By the end of eighth grade, twice as many boys as girls are interested in pursuing science, engineering, or technology based careers. In this talk, I will describe the development of Storytelling Alice, a programming environment that gives middle school girls a positive first experience with computer programming. Rather than presenting programming as an end in itself, Storytelling Alice presents programming as a means to the end of storytelling, a motivating activity for a broad spectrum of middle school girls. More than 250 girls participated in the formative user testing of Storytelling Alice. To determine girls' storytelling needs, I observed girls interacting with successive versions of Storytelling Alice and analyzed their storyboards and the programs they developed. To enable and encourage middle school girls to create the kinds of stories they envision, Storytelling Alice includes high-level animations that enable users to program social interaction between characters, a gallery of 3D objects designed to spark story ideas, and a story-based tutorial presented using Stencils, a novel tutorial interaction technique. To determine the impact of the storytelling focus on girls' interest in and success at learning to program, I conducted a study comparing the experiences of girls introduced to programming using Storytelling Alice with those of girls introduced to programming using a version of Alice without storytelling features (Generic Alice). Participants who used Storytelling Alice and Generic Alice were equally successful at learning basic programming concepts. However, I found that users of Storytelling Alice show more evidence of engagement with programming. Storytelling Alice users spent 42% more time programming and were more than three times as likely to sneak extra time to continue working on their programs (51% of Storytelling Alice users vs. 16% of Generic Alice users snuck extra time). Biography of Speaker: Caitlin Kelleher is currently a post-doctoral researcher in Computer Science and Human-Computer Interaction at Carnegie Mellon University. She received her bachelor's degree in Computer Science from Virginia Tech and her Ph.D. in Computer Science from Carnegie Mellon University with Professor Randy Pausch. Caitlin was a National Science Foundation Graduate Fellow 2007/04/02 Bioinformatics Place: MC437 Time: 9:30 - 10:30 Speaker: Chen-Hsiang Yeang Affiliation: University of California, Santa Cruz Title: Computational methods for reconstructing Computational methods for reconstructing Abstract: Recent progress in high throughput technologies has generated an enormous amount of data which allow us to study biology at systems level. Two research directions arise from computational systems biology: to reconstruct a biomolecular system by integrating multiple sources of data and to study the evolution of the components within the system. In this talk I will present research works in each of those directions. In the first part, I will describe a computational model capturing the co-evolution between the components in a molecular system. It extends the continuous-time Markov process of sequence substitution by rewarding co-variation and penalizing single-site transitions in the rate matrix. The model accurately predicts the secondary interactions of tRNA and 16S rRNA molecules, and identifies the tertiary interactions which do not follow typical Watson-Crick or GU base pairing rules. We then apply the model to screen co-evolving amino acid pairs among all the protein domain families in the Pfam database. The inferred pairs are highly enriched with the domains which are physically or functionally coupled. Among the top 100 inferred family pairs, 82 occur in the same proteins or share the same functional annotations. By inspecting the 3D protein structures, we find many co-evolving positions are either close and exhibit compensatory substitution across species, or located at functionally important sites of the proteins. In the second part, I will describe a constraint-based modeling framework of inferring gene regulatory networks by data integration. The model treats each piece of evidence as a constraint over attributes in the system and builds a joint probabilistic graphical model from all constraints, and applies statistical inference algorithms to calculate attribute values. To demonstrate its use we apply this framework to four problems. First we infer the causal/functional order of genes in the regulatory network of colon cancer invasiveness using RNAi knock-down expression data. Second we infer the causal order and combinatorial functions of the regulatory network for the surface roughness phenotype of Vibrio cholerae, using multiple knock-out expression data. Third we identify the active pathways and edge directions/signs in the physical interaction network of yeast that explain the knock-out expression data. Finally, we propose information theoretic criteria for suggesting new knock-out experiments that best disambiguate the existing models. Biography of Speaker: 2007/04/02 Math Place: Burnside 1205 Time: 16:30 - 17:30 Speaker: Janos Pach Affiliation: City College and Courant Institute, New York Title: Turan-type results on intersection graphs Abstract: We establish several geometric extensions of the Lipton-Tarjan separator theorem for planar graphs. For instance, we show that any collection $C$ of Jordan curves in the plane with a total of $m$ crossings has a partition into three parts $C=S\cup C_1\cup C_2$ such that $|S|=O(\sqrt{m}),$ $\max\{|C_1|,|C_2|\}\leq\frac{2}{3}|C|,$ and no element of $C_1$ has a point in common with any element of $C_2$. These results are used to obtain various properties of intersection patterns of geometric objects in the plane. In particular, we prove that if a graph $G$ can be obtained as the intersection graph of $n$ convex sets in the plane and it contains no complete bipartite graph $K_{k,k}$ as a subgraph, then the number of edges of $G$ cannot exceed $c_kn$, for a suitable constant $c_k$. Joint work with Jacob Fox. Biography of Speaker: 2007/04/02 Bioinformatics Place: Duff Medical Building, 3775 University Street, Main Amphitheatre 1 Time: 12:00 - 13:00 Speaker: Dr. Edwin Wang Affiliation: National Reseach Council, Biotechnology Research Institute Title: Cellular signaling networks: from regulation to information superhighways Abstract: During the last 50 years, cellular signaling data and information have been generated worldwide and accumulated in literature. In recent years, high-throughput technologies allow to generating a large mount of DNA sequence, microarray and protein data. We manually curated signaling events from literature and combined with other high-throughput data to construct a human signaling network. By integrative analysis of the network with other types of high-throughput biological data, we explored the regulation and signal propagation on the human signaling network. I will summarize the principles of gene regulation, protein phosphorylation and signal information superhighways of the network. Biography of Speaker: 2007/03/30 Software Engineering Place: McConnell 103 Time: 11:00 - 12:30 Speaker: Eric Bodden and Laur Affiliation: McGill University Area: Conference Report (RV 2007 and AOSD 2007) Title: Abstract: We will present an overview of the RV (Runtime Verification) workshop and AOSD 2007 conference. We will start with an overview of what is hot and then give a more detailed presentation of three of the papers. 2007/03/23 Software Engineering Place: McConnell 103 Time: 11:45 - 12:30 Speaker: Jan Rupar Affiliation: McGill University Title: Software Security Analysis on Stripped Binaries - An Exploration Abstract: This talk will give an overview of the major problems faced by security auditors today: how does one conduct the security evaluation of executable files without the support of documentation or source code? Current tools and methodologies will be presented, along with their challenges and limitations. This talk has the goal of stimulating a discussion on how we may apply current software engineering and dataflow analysis research in order to address these challenges. 2007/03/23 Software Engineering Place: McConnell 103 Time: 11:00 - 11:45 Speaker: Barthelemy Dagenais Affiliation: McGill University Title: Recommending adaptive changes in the face of evolving software. Abstract: Software constantly evolves and sometime, backward compatibility breaks and developers must then adapt dependent programs. Unfortunately, adapting a client program to support a new version of a used software can be a tedious task and is a low value activity since no new features are introduced and the quality of the program does not necessarily increase. We present the ideas behind a recommendation system that 1) mine version histories of a software to analyze evolution changes and 2) recommend ways (e.g. method calls) to reconcile the client program with the new version of the used software. During this talk, I will cover in more details how we can mine software version histories, infer simple changes like refactoring and infer complex changes using partial program analysis. Since this is still a work in progress, comments on the approach will be more than welcome! 2007/03/21 General Place: MC437 Time: 9:30 - 10:30 Speaker: Steve Oudot Affiliation: Stanford University Area: Computational Geometry Title: Manifold Reconstruction Using Witness Complexes Abstract: The problem of reconstructing manifolds from scattered data finds applications in a number of areas of computer science, including image or signal processing, computer graphics, robotics, machine learning, or scientific visualization. It has been extensively studied by both the computational geometry community and others. Ideas from computational geometry have led to elegant solutions for the cases of sampled curves and surfaces, based on the use of the Delaunay triangulation D(S) of the input point set S. In these methods, an output simplicial complex is extracted from D(S) that is guaranteed to be topologically equivalent to the underlying manifold M, and geometrically close to M, provided that S satisfies some mild sampling conditions. After introducing these Delaunay-based methods, I will show that their nice properties do not carry over to manifolds of arbitrary dimensions, since in this more general setting the input point set may well-sample several manifolds with different topological types. I will then introduce a novel approach to manifold reconstruction, which builds a one-parameter family of complexes approximating the input data at multiple scales. This approach is motivated by recent advances in the theory of topological persistence, showing that the "topology of a point cloud" depends on the scale at which this point cloud is considered. To generate the family of complexes, our algorithm uses a special construction, called the witness complex, which mimics the Delaunay construction using only elementary geometric tests. This makes the algorithm applicable in virtually any metric space. In Euclidean spaces, its efficiency can be guaranteed theoretically, both in terms of complexity and in terms of approximation. To support these claims, I will show some experimental results obtained on real-life data sets from computer graphics and medical imaging. To conclude the talk, I will give some prospects for future research, emphasizing on the obstacles and challenges of the sampling and analysis of manifolds in arbitrary dimensions. Biography of Speaker: 2007/03/14 Algorithms Place: MC437 Time: 9:30 - 10:30 Speaker: Yufeng Wu Affiliation: Department of Computer Science, University of California, Davis Area: Recombination and Applications Title: Algorithms for Inferring Recombination and AssociationMapping in Populations Abstract: With increasingly available population-scale genetic variation data, a current high priority research goal is to understand how genetic variations influence complex diseases (or more generally genetic traits). Recombination is an important genetic process that plays a major role in the logic behind association mapping, a currently intensely studied method widely hoped to efficiently find genes (alleles) associated with complex genetic diseases. In this talk, I will present algorithmic and computational results on inferring historical recombination and constructing genealogical networks with recombination and applications to two biologically important problems: association mapping of complex diseases and detecting recombination hotspots. On association mapping, I will present a method that generates the most parsimonious genealogical networks uniformly and show how it can be applied in association mapping. I will introduce results on evaluating how well the inferred genealogy fits the given phenotypes (i.e. cases and controls) and locates genes associated with the disease. Our recent work on detecting recombination hotspots by inferring minimum recombination will also be briefly described. For both biological problems, I will demonstrate the effectiveness of these methods with experimental results on simulated or real data. Biography of Speaker: Yufeng Wu received his Master degree in Computer Science from the University of Illinois at Urbana-Champaign in 1998 and his Bachelor degree from Tsinghua University, China in 1994. From 1998 to 2003 he was a software engineer at a startup company in Illinois, USA. He currently works with Professor Dan Gusfield on algorithms in computational biology and bioinformatics. His current research is focused on computational problems arising in population genomics. 2007/03/12 General Place: MC437 Time: 9:30 - 10:30 Speaker: Sylvain Paris Affiliation: Massachusetts Institute of Technology (MIT) Area: Graphics - HCI Title: Exploiting the Richness of Digital Photographs and Videos Abstract: Cameras have become popular with the recent development of digital equipment. Today, it is extremely simple to take high-resolution pictures and movies, thereby giving easy access to an abundance of information. However, the size and number of acquired images challenge most existing algorithms. In this context, I will present my work on low-level image processing and computational photography. I will first describe the bilateral filter, a nonlinear edge-preserving process which is becoming ubiquitous in computational photography. I will reformulate this filter in a higher-dimensional homogenous space and show that, using signal-processing arguments, it is possible to compute very quickly an approximation visually similar to the exact result. In a second part, I will use this technique to analyze digital photographs and enhance them by automatically transferring the visual qualities of an artist masterpiece. Finally, I will briefly present my on-going projects that extend these ideas to image segmentation and video processing. Biography of Speaker: Computer Science and Artificial Intelligence Laboratory (CSAIL) 2007/03/02 Software Engineering Place: McConnell 103 Time: 11:00 - 12:00 Speaker: Richard Halpert Affiliation: McGill University - Sable Group Title: Improving Lock Allocation with Supporting Analyses Abstract: Multi-threaded programming using locks and monitors in Java (or any other language) is difficult and error prone, even for experts. For this reason, we have developed an automatic lock allocator. This lock allocator could be used to enable a transactional version of the Java language without the need for new code in the JVM, or simply to warn programmers if their own lock allocation results in a program with data races. If a pair of lock regions have side effects to the same object, then they are said to "interfere". A rudimentary solution to determining which lock regions potentially interfere with each other is to inspect the results of a Points-To analysis. Using more accurate Points-To analyses can remove some spurious edges on the interference graph. However, in order to remove the majority of spurious edges, more thread-specialized analyses were needed. 2007/02/16 Software Engineering Place: McConnell 103 Time: 11:00 - 12:00 Speaker: Dayong GU Affiliation: McGill University Title: Hardware-related optimizations in a JVM Abstract: Detecting repetitive "phases" in program execution is helpful for program understanding, runtime optimization, and for reducing simulation/profiling workload. The nature of the phases that may be found, however, depend on the kinds of programs, as well how programs interact with the underlying hardware. We present a technique to detect program phases by monitoring microarchitecture-level hardware events. We also show a set of sample adaptive optimizations of our phase detection technique. 2007/02/02 Software Engineering Place: McConnell 103 Time: 11:00 - 12:30 Speaker: Ekwa Duala-Ekoko Affiliation: McGill University Title: Tracking Code Clones in Evolving Software Abstract: Code clones - source code regions with identical syntax and semantics - are generally considered harmful in software development, and the predominant approach is to try to eliminate them through refactoring. However, recent research has provided evidence that it may not always be practical, feasible, or cost-effective to eliminate certain clone groups. In this talk, I will propose a technique that enables developers to maintain clone groups of interest, and to monitor and support modification tasks that intersect with the documented clone model as a system evolves. Our technique relies on the concept of abstract clone region descriptors (CRD), which describe clone regions within methods in a robust way that is independent from the exact text of the clone region or its location in a file. Next, I will present our definition of CRDs, and describe a complete clone tracking system capable of producing CRDs from the output of a clone detection tool, notify developers of modifications to clone regions, and support the simultaneous editing of clone regions. Finally, I will discuss the results of two experiments and a case study conducted to assess the performance and usefulness of our approach. (This is a joint work with Martin Robillard - Assistant Professor, McGill) 2007/01/15 Bioinformatics Place: Duff Medical Building, 3775 University Street, Main Amphitheatre 1 Time: 15:30 - 16:30 Speaker: Corey Yanofsky Affiliation: Biomedical Engineering, McGill University Area: proteomics experiments Title: Peptide retention time prediction from high-throughput proteomic data Abstract: In high-throughput proteomics experiments, complex mixtures of peptides usually undergo chromatographic separation prior to identification by mass spectrometry. Chromatographic separation was originally incorporated into the experimental design principally to limit the complexity of the peptide mixture entering the mass spectrometer at any one time. Over the course of years, large databases have been built to store information, including chromatographic retention time information, about peptides identified experimentally. This creates the possibility of using retention time data as an extra dimension to improve the identification of peptides based on mass spectra. However, there is considerable variability in the elution time for a particular peptide which must be corrected before the information can be use for peptide identification. At McGill University, many thousands of proteomic experimental runs have been run using a variety of chromatographic protocols. These runs have different chromatographic dead times, elution gradients, and experimental precisions, all of which interfere with a straightforward prediction of future retention times from previous data. To overcome these complications, we have developed a Bayesian model to estimate the physical property of peptides underlying experimental retention times, explicitly modelling varying chromatographic dead time, elution gradient, and run-specific variance as separate effects. The model was fit using a data set of 113160 peptide identifications, comprising 6681 unique peptides in 3163 runs, thus providing estimates of the “true” retention time (relative to a chosen reference run) of the 6681 peptides. A cross-validation study was performed, showing that the predictive error of the model was typically wthin +/- 2 minutes. (This covers about 8% of the total elution time for a run with a 60 minute gradient and 10 minutes of dead time.) A second stage was necessary to predict the retention times of peptides which are not present in the data set, so the results of the first stage were used to fit a peptide-sequence-based retention time model. In a leave-one-out cross-validation study, the sequence-based model was able to predict retention times with an overall accuracy of +/- 5 minutes, which covers about 20% of the total elution time for a run with a 60 minute gradient and 10 minutes of dead time. 2007/01/04 General Place: MC13 Time: 15:30 - 16:30 Speaker: last test Affiliation: test Area: tests Title: testing Abstract: Biography of Speaker: tet