Skip to content. Skip to navigation
McGill Home SOCS Home
Personal tools
You are here: Home Announcements and Events Seminars Schedule

Seminar Home
Fall 2015 Schedule
Winter 2016 Schedule
Summer 2016 Schedule

SOCS Faculty Candidate Talk Seminar Schedule

Date Category Seminar Info
2014/02/25 Faculty Candidate Talk Place: MC 103
Time: 9:30 - 10:30
Speaker: Finale Doshi-Velez
Affiliation: Harvard School of Engineering and Applied Sciences and Harvard Medical School
Area: Datamining and health informatics
Title: Prediction and Interpretation with Latent Variable Models

Latent variable models provide a powerful tool for summarizing data through a set of hidden variables. These models are generally trained to maximize prediction accuracy, and modern latent variable models now do an excellent job of finding compact summaries of the data with high predictive power. However, there are many situations in which good predictions alone are not sufficient. Whether the hidden variables have inherent value by providing insights about the data, or whether we wish to interface with domain expert on how to improve a system, understanding what the discovered hidden variables mean is an important first step. In this talk, I will discuss how the language of probabilistic modeling naturally and flexibly allows us to incorporate information about how humans organize knowledge in addition to finding predictive summaries of data. In particular, I will talk about how a new model, GraphSparse LDA, discovers interpretable latent structures without sacrificing (and sometimes improving upon!) prediction accuracy. The model incorporates knowledge about the relationships between observed dimensions into a probabilistic framework to find a small set of human-interpretable "concepts" that summarize the observed data. This approach allows us to recover interpretable descriptions of novel, clinically-relevant autism subtypes from a medical data-set with thousands of dimensions.

Biography of Speaker:

Finale Doshi-Velez is a postdoctoral fellow jointly between Harvard's School of Engineering and Applied Sciences and the Center for Biomedical Informatics. She received her MSc from the University of Cambridge in 2009 (as a Marshall Scholar) and her PhD from MIT in 2012. She was selected as one of IEEE's "AI 10 to Watch" in 2013. Her research interests include latent variable modeling, sequential decision-making, and clinical applications.

2014/02/20 Faculty Candidate Talk Place: MC 437
Time: 9:30 - 10:30
Speaker: Julian McAuley
Affiliation: Postdoc at Stanford University
Area: Datamining and health informatics
Title: Leveraging Data Across Time and Space to Build Predictive Models for Healthcare-Associated Infections

The proliferation of user-generated content on the web provides a wealth of opportunity to study humans through their online traces. I will discuss three aspects of my research, which aims to model and understand people's behavior online. First, I will develop rich models of opinions by combining structured data (such as ratings) with unstructured data (such as text). Second, I will describe how preferences and behavior evolve over time, in order to characterize the process by which people "acquire tastes" for products such as beer and wine. Finally, I will discuss how people organize their personal social networks into communities with common interests and interactions. These lines of research require models that are capable of handling high-dimensional, interdependent, and time-evolving data, in order to gain insights into how humans behave.

Biography of Speaker:

Julian McAuley is a postdoctoral scholar at Stanford University, where he works with Jure Leskovec on modeling the structure and dynamics of social networks. His current work is concerned with modeling opinions and behavior in online communities, especially with respect to their linguistic and temporal dimensions. Previously, Julian received his PhD from the ANU under Tiberio Caetano, with whom he worked on inference and learning in structured output spaces. His work has been featured in Time, Forbes, New Scientist, and Wired, and has received over 30,000 "likes" on Facebook.

2014/02/18 Faculty Candidate Talk Place: MC103
Time: 9:30 - 10:30
Speaker: Jenna Wiens
Affiliation: MIT
Area: Machine Learning and Data Mining
Title: Leveraging Data Across Time and Space to Build Predictive Models for Healthcare-Associated Infections

The proliferation of electronic medical records holds out the promise of using machine learning and data mining to build models that will help healthcare providers improve patient outcomes. However, building useful models from these datasets presents many technical problems. The task is made challenging by the large number of factors, both intrinsic and extrinsic, influencing a patient’s risk of an adverse outcome, the inherent evolution of that risk over time, and the relative rarity of adverse outcomes. In this talk, I will describe the development and validation of hospital-specific models for predicting healthcare-associated infections (HAIs), one of the top-ten contributors to death in the US. I will show how by adapting techniques from time-series classification, transfer learning and multi-task learning one can learn a more accurate model for patient risk stratification for the HAI Clostridium difficile (C. diff). Applied to a held-out validation set of 25,000 patient admissions, our model achieved an area under the receiver operating characteristic curve of 0.81 (95%CI 0.78-0.84). On average, we can identify high-risk patients five days in advance of a positive test result. The model has been successfully integrated into the health record system at a large hospital in the US, and is being used to produce daily risk estimates for each in-patient. Clinicians at the hospital are now considering ways in which that information can be used to reduce the incidence of HAIs

Biography of Speaker:

Jenna Wiens is a Ph.D. Candidate in the Department of Electrical Engineering and Computer Science at the Massachusetts Institute of Technology (MIT). She holds an S.M. degree in EECS from MIT. She is interested in solving the technical challenges that arise when considering the practical application of machine learning in medicine. In addition to her work on predicting healthcare associated infections she has applied machine-learning methods to the automated interpretation of electrocardiograms and the extraction of strategically useful information from player tracking data in the NBA.

2014/02/14 Faculty Candidate Talk Place: MC103
Time: 10:30 - 11:30
Speaker: Jackie Chi Kit Cheung
Affiliation: University of Toronto
Area: Distributional Semantics
Title: Towards Large-Scale Natural Language Inference with Distributional Semantics

Language understanding and semantic inference are crucial for solving complex natural language applications, from intelligent personal assistants to automatic summarization systems. However, current systems often require hand-coded information about the domain of interest, an approach that will not scale up to the large array of possible domains and topics in text collections today. In this talk, I demonstrate the potential of distributional semantics (DS), an approach to modeling meaning by using the contexts in which a word or phrase appears, to assist in acquiring domain knowledge and to support the desired inference. I present a method that integrates phrasal DS representations into a probabilistic model in order to learn about the important events and slots in a domain, resulting in state-of-the-art performance on template induction and multi-document summarization for systems that do not rely on hand-coded domain knowledge. I also propose to evaluate DS by their ability to support inference, the hallmark of any semantic formalism. These results demonstrate that the utility of DS for current natural language applications, and provide a principled framework for measuring progress towards automated inference in any domain.

Biography of Speaker:

Jackie CK Cheung is a PhD candidate at the University of Toronto. His research interests span several areas of natural language processing, including computational semantics, automatic summarization, and natural language generation. His work is supported by the Natural Sciences and Engineering Research Council of Canada (NSERC), as well as a Facebook Fellowship

2014/02/11 Faculty Candidate Talk Place: MC 103
Time: 9:15 - 10:15
Speaker: Byron Wallace
Affiliation: Brown University
Area: Machine Learning in Evidence-Based Medicine
Title: Machine Learning in Evidence-Based Medicine: Taming the Clinical Data Deluge

An unprecedented volume of biomedical evidence is being published today. Indeed, PubMed (a search engine for biomedical literature) now indexes more than 600,000 publications describing human clinical trials and upwards of 22 million articles in total. This volume of literature imposes a substantial burden on practitioners of Evidence-Based Medicine (EBM), which now informs all levels of healthcare. Systematic reviews are the cornerstone of EBM. They address a well-formulated clinical question by synthesizing the entirety of the published relevant evidence. To realize this aim, researchers must painstakingly identify the few tens of relevant articles among the hundreds of thousands of published clinical trials. Further exacerbating the situation, the cost of overlooking relevant articles is high: it is imperative that all relevant evidence is included in a synthesis, else the validity of the review is compromised. As reviews have become more complex and the literature base has exploded in volume, the evidence identification step has consumed an increasingly unsustainable amount of time. It is not uncommon for researchers to read tens of thousands of abstracts for a single review. If we are to realistically realize the promise of EBM (i.e., inform patient care with the best available evidence), we must develop computational methods to optimize the systematic review process. To this end, I will present novel data mining and machine learning methods that look to semi-automate the process of relevant literature discovery for EBM. These methods address the thorny properties inherent to the systematic review scenario (and indeed, to many tasks in health informatics). Specifically, these include: class imbalance and asymmetric costs; expensive and highly skilled domain experts with limited time resources; and multiple annotators of varying skill and price. In this talk I will address these issues in turn. In particular, I will present new perspectives on class imbalance, novel methods for exploiting dual supervision (i.e., labels on both instances and features), and new active learning techniques that address issues inherent to real-world applications (e.g., exploiting multiple experts in tandem). I will present results that demonstrate that these methods can reduce by half the workload involved in identifying relevant literature for systematic reviews, without sacrificing comprehensiveness. Finally, I will conclude by highlighting emerging and future work on automating next steps in the systematic review pipeline, and methods for making sense of biomedical data more generally.

Biography of Speaker:

Byron Wallace is an assistant research professor in the Department of Health Services, Policy & Practice at Brown University; he is also affiliated with the Brown Laboratory for Linguistic Processing (BLLIP) in the department of Computer Science. His research is in machine learning/data mining and natural language processing, with an emphasis on applications in health. Before moving to Brown, he completed his PhD in Computer Science at Tufts under the supervision of Carla Brodley. He was selected as the runner-up for the 2013 ACM SIGKDD Doctoral Dissertation Award and he was awarded the Tufts Outstanding Graduate Researcher at the Doctoral Level award in 2012 for his thesis work.