Discovery Science 2014

Joint Invited Speaker for DS 2014 and ALT 2014

Zoubin Ghahramani

University of Cambridge, UK

zoubin@eng.cam.ac.uk

Building an Automated Statistician

> slides <

We will live an era of abundant data and there is an increasing need for methods to automate data analysis and statistics. I will describe the "Automated Statistician", a project which aims to automate the exploratory analysis and modelling of data. Our approach starts by defining a large space of related probabilistic models via a grammar over models, and then uses Bayesian marginal likelihood computations to search over this space for one or a few good models of the data. The aim is to find models which have both good predictive performance, and are somewhat interpretable. Our initial work has focused on the learning of unknown nonparametric regression functions, and on learning models of time series data, both using Gaussian processes. Once a good model has been found, the Automated Statistician generates a natural language summary of the analysis, producing a 10-15 page report with plots and tables describing the analysis. I will discuss challenges such as: how to trade off predictive performance and interpretability, how to translate complex statistical concepts into natural language text that is understandable by a numerate non-statistician, and how to integrate model checking. This is joint work with James Lloyd and David Duvenaud (Cambridge) and Roger Grosse and Josh Tenenbaum (MIT).

Bio

Zoubin Ghahramani is Professor of Information Engineering at the University of Cambridge, where he leads a group of about 30 researchers. He studied computer science and cognitive science at the University of Pennsylvania, obtained his PhD from MIT in 1995, and was a postdoctoral fellow at the University of Toronto. His academic career includes concurrent appointments as one of the founding members of the Gatsby Computational Neuroscience Unit in London, and as a faculty member of CMU's Machine Learning Department for over 10 years. His current research focuses on nonparametric Bayesian modelling and statistical machine learning. He has also worked on applications to bioinformatics, econometrics, and a variety of large-scale data modelling problems. He has published over 200 papers, receiving 25,000 citations (an h-index of 68). His work has been funded by grants and donations from EPSRC, DARPA, Microsoft, Google, Infosys, Facebook, Amazon, FX Concepts and a number of other industrial partners. In 2013, he received a $750,000 Google Award for research on building the Automatic Statistician. He serves on the advisory boards of Opera Solutions and Microsoft Research Cambridge, on the Steering Committee of the Cambridge Big Data Initiative, and in a number of leadership roles as programme and general chair of the leading international conferences in machine learning: AISTATS (2005), ICML (2007, 2011), and NIPS (2013, 2014). More information can be found at http://mlg.eng.cam.ac.uk .

Invited Speaker for DS 2014

Michel Dumontier

Stanford University, Stanford, USA

michel.dumontier@stanford.edu

Semantic approaches for biomedical knowledge discovery

> slides <

With its focus on investigating the basis for the sustained existence of living systems, modern biology has always been a fertile, if not challenging, domain for formal knowledge representation and automated reasoning. With thousands of databases and hundreds of ontologies now available, there is a salient opportunity to integrate these for discovery. In this talk, I will discuss our efforts to build a rich foundational network of ontology-annotated linked data, develop methods to intelligently retrieve content of interest, uncover significant biological associations, and pursue new avenues for drug discovery. As the portfolio of Semantic Web technologies continue to mature in terms of functionality, scalability, and an understanding of how to maximize their value, researchers will be strategically poised to pursue increasingly sophisticated knowledge discovery projects aimed at improving our overall understanding of human health and disease.

Bio

Dr. Michel Dumontier is an Associate Professor of Medicine (Biomedical Informatics) at Stanford University. His research aims to find new treatments for rare and complex diseases. His research interest lie in the publication, integration, and discovery of scientific knowledge. Dr. Dumontier serves as a co-chair for the World Wide Web Consortium Semantic Web in Health Care and Life Sciences Interest Group (W3C HCLSIG) and is the Scientific Director for Bio2RDF, a widely used open-source project to create and provide linked data for life sciences.

Invited Tutorial Speaker for DS 2014

Anuška Ferligoj

University of Ljubljana, Faculty of Social Sciences, Ljubljana, Slovenia

anuska.ferligoj@fdv.uni-lj.si

Social Network Analysis

> slides <

Social network analysis has attracted considerable interest from social and behavioral science community in recent decades. Much of this interest can be attributed to the focus of social network analysis on relationship among units, and on the patterns of these relationships. Social network analysis is a rapidly expanding and changing field with broad range of approaches, methods, models and substantive applications. In the talk special attention will be given to:

1. General introduction to social network analysis:

What are social networks?
Data collection issues.
Basic network concepts: network representation; types of networks; size and density.
Walks and paths in networks: length and value of path; the shortest path, k-neighbours; acyclic networks.
Connectivity: weakly, strongly and bi-connected components; contraction; extraction.

2. Overview of tasks and corresponding methods:

Network/node properties: centrality (degree, closeness, betweenness); hubs and authorities.
Cohesion: triads, cliques, cores, islands.
Partitioning: blockmodeling (direct and indirect approaches; structural, regular equivalence; generalised blockmodeling); clustering and community detection.
Statistical models.

3. Software for social network analysis (UCINET, PAJEK)

Bio

Anuška Ferligoj is a Slovenian mathematician, whose work in network analysis research is internationally recognized. Her interests include multivariate analysis (constrained and multicriteria clustering), social networks (measurement quality and blockmodeling), and survey methodology (reliability and validity of measurement). She is a fellow of the European Academy of Sociology. She is a professor of Multivariate statistical methods at the University of Ljubljana and head of the graduate program on Statistics at the University of Ljubljana. She is also editor of the journal Advances in Methodology and Statistics (Metodoloski zvezki) since 2004 and is a member of the editorial boards of the Journal of Mathematical Sociology, Journal of Classification, Social Networks, Statistic in Transition, Methodology, Structure and Dynamics: eJournal of Anthropology and Related Sciences. She was a Fulbright scholar in 1990 and visiting professor at the University of Pittsburgh. She was awarded the title of Ambassador of Science of the Republic of Slovenia in 1997.

Invited Speaker for ALT 2014

Luc Devroye

McGill University, Montreal, Canada

lucdevroye@gmail.com

Cellular Tree Classifiers

Suppose that binary classification is done by a tree method in which the leaves of a tree correspond to a partition of d-space. Within a partition, a majority vote is used. Suppose furthermore that this tree must be constructed recursively by implementing just two functions, so that the construction can be carried out in parallel by using "cells": first of all, given input data, a cell must decide whether it will become a leaf or internal node in the tree. Secondly, if it decides on an internal node, it must decide how to partition the space linearly. Data are then split into two parts and sent downstream to two new independent cells. We discuss the design and properties of such classifiers. This is joint work with Gerard Biau .

Bio

Luc P. Devroye is a Belgian computer scientist/mathematician and a James McGill Professor in the School of Computer Science of McGill University in Montreal, Canada. He studied at Katholieke Universiteit Leuven and subsequently at Osaka University and in 1976 received his PhD from University of Texas at Austin under the supervision of Terry Wagner. Devroye specializes in the probabilistic analysis of algorithms, random number generation and enjoys typography. Since joining the McGill faculty in 1977 he has won numerous awards, including an E.W.R. Steacie Memorial Fellowship (1987), a Humboldt Research Award (2004), the Killam Prize (2005) and the Statistical Society of Canada gold medal (2008). He received an honorary doctorate from the Université catholique de Louvain in 2002, and he received an honorary doctorate from Universiteit Antwerpen on March 29, 2012.

Invited Tutorial Speaker for ALT 2014

Eyke Hüllermeier

Department of Computer Science, University of Paderborn, Germany

eyke@upb.de

Online Preference Learning and Ranking

> slides <

A primary goal of this tutorial is to survey the field of preference learning, which has recently emerged as a new branch of machine learning, in its current stage of development. Starting with a systematic overview of different types of preference learning problems, methods to tackle these problems, and metrics for evaluating the performance of preference models induced from data, the presentation will focus on theoretical and algorithmic aspects of ranking problems. In particular, recent approaches to preference-based online learning with bandit algorithms will be covered in some depth.

Bio

Eyke Hüllermeier is a full professor in the Department of Computer Science at the University of Paderborn, Germany, where he heads the Intelligent Systems group. He studied mathematics and business computing, received his PhD in computer science from the University of Paderborn in 1997, and a Habilitation degree in 2002. Prior to returning to Paderborn in 2014, he spent two years as a post-doctoral researcher at the IRIT in Toulouse (France) and held professorships at the Universities of Dortmund, Magdeburg and Marburg. His research interests are centered around methods and theoretical foundations of intelligent systems, with a specific focus on machine learning and reasoning under uncertainty. He has published more than 200 articles on these topics in top-tier journals and major international conferences, and several of his contributions have been recognized with scientific awards. Professor Hüllermeier is Co-Editor-in-Chief of Fuzzy Sets and Systems, one of the leading journals in the field of Computational Intelligence, and serves on the editorial board of several other journals, including Machine Learning, the International Journal of Approximate Reasoning, and the IEEE Transactions on Fuzzy Systems. He is a coordinator of the EUSFLAT working group on Machine Learning and Data Mining and head of the IEEE CIS Task Force on Machine Learning. In recent years, he was involved in the organization of several international conference. Currently, he is PC Co-Chair of ECML/PDKK-2014, the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (Nancy, France).