Skip to content. Skip to navigation
McGill Home SOCS Home
Personal tools
You are here: Home Announcements and Events Seminars profile

Fall 2015 Schedule
Winter 2016 Schedule

2013/04/09, McConnell 437, 10:00 - 11:00

Big Data, Mining Multimedia and Semi-supervised Learning
Christopher Pal , École Polytechnique de Montréal

Area: Data Mining


In this talk I’ll begin by presenting some recent work with my students on mining the text and images of Wikipedia biographies. I’ll present a technique to disambiguate faces and identities using richly structured probabilistic models. I’ll discuss the role of generative, discriminative and semi-supervised learning in both text and general data mining and some considerations regarding the importance and interaction of model complexity and the amount of data available. I’ll then present some of our work on large-scale recognition in the wild in which we scale up with more data through mining image search results and YouTube. I’ll discuss a semi-supervised learning technique based on a probabilistic sparse kernel method and the use of null categories to account for noisy labels. The technique allows us to boost performance on the standard labeled faces in the wild evaluation using faces mined from YouTube as well as boost accuracy on the task of tagging faces of celebrities within YouTube videos. I’ll discuss the issues involved with scaling to truly big data and our experiences working in collaboration with Google using a corpus of over 1.2 million YouTube videos and their text annotations.

Biography of Speaker:

Christopher Pal is an assistant professor in the department of « génie informatique et génie logiciel » at the École Polytechnique Montréal. Prior to arriving in Montréal, he was an assistant professor in the department of Computer Science at the University of Rochester. He has been a research scientist with the University of Massachusetts Amherst at the Center for Intelligent Information Retrieval and Information Extraction and Synthesis Lab. He has also been affiliated with Microsoft Research's Interactive Visual Media Group and their Machine Learning and Adaptive Systems group. He has four patents, there are over 1100 citations to his work on Google Scholar and he has been the recipient of a Google research award. He earned his M. Math and Ph.D. from the University of Waterloo