For our next meetup, we have the nice folks at UCL hosting us. We'll have three shorter talks by UCL stats & ML researchers. See you there!
Title: Nodes from the Underground: Modelling Traffic Patterns and Shock Effects in the London Transportation Network
Speaker: Ricardo Silva (http://www.homepages.ucl.ac.uk/~ucgtrbd/), Statistical Science, UCL
Transportation networks play a major role in the economy and social structure of large cities. In such complex systems, local unplanned disruptions happen due to a variety of unforeseen events such as power and mechanical failures. The effect of disruptions on passenger behaviour is of fundamental interest. For the passenger's point of view, this translates into delays, station closures or line closures. In particular, it is important to predict changes on how passengers leave the system at different locations within the affected regions. This is a first and essential step at assessing which components of the system are prone to overcrowding, and how they are jointly affected. For instance, dynamic changes to the bus system might potentially be implemented to compensate for increases of passenger evasion along several points in a route.
In this talk, we discuss how we can model disruptions in the London fast train network, namely the system comprised of the Underground, Overground and DLR. We measure effects by passengers' exit rates, modelling them via a combination of parametric and nonparametric models. Crucially, due to the relative sparseness of disruption events, we show how to leverage the data obtained under the "natural regime" of the system, where no disruptions take place, as as features that are predictive of the anomalous behaviour. We show the effectiveness of our findings on data collected from smart card users of the London Tube during 70 days spread across 2011 and 2012. The relationship is remarkably robust across different shocks, and easily expressible as a linear model. Our framework is however very general, and provides a recipe for similar analyses in any large urban transport network.
Title: Efficient and Effective Learning to Rank
Speaker: Emine Yilmaz (http://research.microsoft.com/en-us/people/eminey/), Computer Science, UCL
Most current information retrieval systems are machine learning algorithms that are designed to optimize for evaluation metrics measuring user satisfaction, a process referred to as learning to rank. Two important problems are raised during the learning to rank process: (1) Which evaluation metric should be used as the objective in optimization?, and (2) How to reduce the large number of judgments needed to create the training data for learning to rank? In the first half of this talk, I will first focus on the effect of evaluation metrics used as objectives in learning to rank. I will first show that in contrast to the common belief, the target metric used in optimization is not necessarily the metric that evaluates user satisfaction. I will then describe an information theoretic framework that can be used to analyze the informativeness of evaluation metrics and show that more informative metrics should be used as objectives during learning to rank, independent of the measure that best captures user satisfaction. In the second half of the talk, I will focus on techniques that can be used to reduce the number of judgments needed for training and of retrieval systems. I will describe a method based on sampling and statistical inference that can be used to devise learning to rank algorithms that can be trained with significantly smaller training datasets.
Title: The Shogun Machine Learning Toolbox
Speaker: Heiko Strathmann (http://herrstrathmann.de/Shogun), Gatsby Unit, UCL
We present the Shogun Machine Learning Toolbox ( http://www.shogun-toolbox.org/ ), a unified framework for efficient machine learning (ML) with extensive bindings to play together with other software, computing languages, and operating systems. The library was initially made public in 2004 and remained under heavy development henceforth. Apart from the resulting mature core-framework, Shogun offers state-of-the-art features. Development is driven by a vibrant community and has seen a steady increase in momentum over the past years, fostered by 29 Google Summer of Code (GSoC) projects since 2011. We believe that providing a neutral and modular ground for ML algorithms, interfaces to most computing languages, OS, and file formats, and providing integration with other open-source projects (such as ipython-notebook or mloss.org), is what makes Shogun a valuable contribution to the open-source community - and beyond. In this talk, we give an introduction to the core functionality of Shogun. This includes illustrations of solving basic ML tasks and some of the more advanced features, such as last year's GSoC projects. Finally, we will outline the current development model of Shogun, some community aspects, and pointers to future directions.