UAI 2009 Invited Speakers
Yoshua Bengio, Universite de Montreal
Learning Deep Architectures
Whereas theoretical work suggests that deep architectures might be more efficient at representing highly-varying functions, training deep architectures was unsuccessful until the recent advent of algorithms based on unsupervised pre-training of each level of a hierarchically structured model. Several unsupervised criteria and procedures were proposed for this purpose, starting with the Restricted Boltzmann Machine (RBM), which when stacked gives rise to Deep Belief Networks (DBN). Although the partition function of RBMs is intractable, inference is tractable and we review several successful and efficient learning algorithms that have been proposed. In addition to being impressive as generative models, DBNs have made an impact by being used to initialize deep supervised neural networks. To better understand why this unsupervised pre-training is so successful, we review several other unsupervised approaches for deep architectures, such as sparse coding, denoising auto-encoders, similarity preserving transformations, and slow features. Finally, we attempt to understand the unsupervised pre-training effect through a large set of simulations exploring the apparently conflicting hypotheses that unsupervised pre-training acts like a regularizer or that it helps optimizing a difficult non-convex criterion fraught with local minima.
Yoshua Bengio (PhD'1991, McGill University) is professor at the Department of Computer Science and Operations Research, Universite de Montreal, and Canada Research Chair in Statistical Learning Algorithms, as well as NSERC-CGI Chair, and Fellow of the Canadian Institute for Advanced Research. He was program co-chair for NIPS'2008 and is general co-chair for NIPS'2009. His main ambition is to understand how learning can give rise to intelligence. He has been an early proponent of deep architectures and distributed representations as tools to bypass the curse of dimensionality and learn complex tasks. He contributed to many machine learning areas: neural networks, recurrent neural networks, graphical models, kernel machines, semi-supervised learning, unsupervised learning and manifold learning, pattern recognition, data-mining, natural language processing, machine vision, and time-series models.
James Robins, Harvard School of Public Health
From Robust Tests of No Direct Effects in Medical Experiments to Powerful Nonparametric Causal Search from Data with No Apparent Structure
In a 'reinforcement learning' randomized experiment designed to find the optimal joint dosing strategy for two drugs, the probability a subject is randomize to doses (D1, D2) of drug 1 and 2 on day t depends on the subject's history of clinical response to earlier treatment. In 1999, I constructed a nonparametric test of the null hypothesis that drug 1 has no direct causal effect not through drug 2 on mortality (or some other outcome) based on the data from such an experiment. The test was a test of marginal independence of mortality and the dose history of drug 1 in a weighted distribution in which a subject?' weight equals the inverse probabilty of having his actual observed drug 2 treatment history. I noted that the same inverse probability weighting approach could, in principle, be used to extend the FCI search algorithm of Spirtes, Glymour and Scheines to discover structure in distributions with nonindependence constraints (ie so-called Verma constraints ) but no independencies.
Thomas Richardson and I have been (slowly) working together on this approach ever since. I will describe an example we constructed that demonstrates the remarkable (perhaps too remarkable!) ability of the proposed approach to discover causal structure. Finally I describe recent joint work by Thomas Richardson, Ilya Shpitser, Steffen Lauritzen, and myself to transform "in principle" into a theorem, if not yet, an algorithm.