• Züri ML #34: Deep Learning for Poker

    ETH Zurich, Main building (HG Building), F3

    DeepStack: Expert-Level Artificial Intelligence in No-Limit Poker Viliam Lisy, Czech Technical University in Prague, Czech Republic Abstract: Artificial intelligence has seen several breakthroughs in recent years, with games often serving as milestones. A common feature of these games is that players have perfect information. Poker is the quintessential game of imperfect information, and a longstanding challenge problem in artificial intelligence. We introduce DeepStack, an algorithm for imperfect information settings. It combines recursive reasoning to handle information asymmetry, decomposition to focus computation on the relevant decision, and a form of intuition that is automatically learned from self-play using deep learning. In a study involving 44,000 hands of poker, DeepStack defeated with statistical significance professional poker players in heads-up no-limit Texas hold'em. The approach is theoretically sound and is shown to produce more difficult to exploit strategies than prior approaches. https://arxiv.org/pdf/1701.01724.pdf This event is in coordination with the Mad Scientist Festival Zurich, which is happening on the following day in Zentrum Karl der Grosse, check it out! http://www.karldergrosse.ch/veranstaltung/mad-scientist-festival/

    9
  • Züri ML #33: Directions in Convolutional Neural Networks Research

    ETH Zurich, Main building (HG Building), E5

    Do Deep Neural Networks Suffer from Crowding? Anna Volokitin, ETH Zürich Abstract: Crowding is a visual effect suffered by humans, in which an object that can be recognised in isolation can no longer be recognised when other objects are placed close to it. In this work, we study the effect of crowding in artificial Deep Neural Networks (DNNs) for object recognition. We investigate whether this effect is also present in both Convolutional Neural Networks and a multi-scale extension, called eccentricity-dependent networks. Eccentricity-dependent networks have recently been proposed for modeling the feedforward path of the primate visual cortex, and have scale invariance built into the architecture. We show that the eccentricity-dependent network trained on objects in isolation can recognize objects in clutter under certain conditions, whereas the standard convolutional networks cannot. CNNs in Video analysis - An overview, biased to fast methods Michael Gygli, ETH Zürich Automatic video analysis has become increasingly popular in the recent past. This talk will focus on new developments in using Convolutional Neural Networks (CNN) and present recent advances to create fast automatic video analysis algorithms that can be used in production systems. The talk consists of three parts. First, I will discuss C3D, a spatio-temporal neural network that is widely used for video analysis tasks such as action recognition. In comparison to competing approaches, C3D directly operates on raw pixel inputs, allowing it to run at close to 400 FPS on a modern GPU. Second, I will present my recent method for shot boundary detection with fully convolutional CNNs. Its model architecture is similar to C3D, but more compact and fully convolutional in time. Thanks to these changes, the shot detection runs at more than 230x-real-time speed (5800 FPS), thus it can analyze full-length movies in less than half a minute. Finally, the presentation closes with our approach to automatically find highlights in videos. Our system first detects shots, which are then scored by a combination of C3D, audio features and a feed-forward neural network (FNN). References: Learning Spatiotemporal Features with 3D Convolutional Networks (C3D)( https://arxiv.org/abs/1412.0767 ) Ridiculously Fast Shot Boundary Detection with Fully Convolutional Neural Networks ( https://arxiv.org/abs/1705.08214 ) Video2GIF: Automatic Generation of Animated GIFs from Video ( https://arxiv.org/abs/1605.04850 )

    11
  • Züri ML #32: Frontiers of Recurrent Neural Networks

    ETH Zurich, Main building (HG Building), E3

    Multi-dimensional LSTM Networks for Image Analysis Wonmin Byeon, ETH Zürich Abstract: Long Short-term Memory (LSTM) recurrent neural networks have initially been introduced for single dimensional sequence learning like handwriting and speech recognition. The extension, Multi-dimensional LSTM (MD-LSTM) networks accesses more than one dimension and allows to learn long-range contexts of multi-dimensional data such as images and videos. The networks learn directly from raw pixel values and take the complex spatial dependencies of each pixel into account. In this talk, I will present two different MD-LSTM models for 2D and 3D data, 2D-LSTM and Pyramid-LSTM. 2D-LSTM networks are directly applied to scene images and show improvement over other state-of-the-art methods. Pyramid-LSTM, a variant of MD-LSTM rearranges the original architecture, resulting in easier parallelization on GPU and fewer computations overall. At the end, I will show pixel-wise image segmentation on 3D biomedical volumetric images as an application of Pyramid-LSTM. Reference: https://arxiv.org/abs/1506.07452 Making RNNs deep per time-step - Recurrent Highway Networks Julian Zilly, ETH Zürich Training Recurrent Neural Networks (RNNs) has historically been a challenging task due to the vanishing and exploding gradient problem. Long Short-Term Memory (LSTM) networks provide a solution to this challenge. However, LSTMs have commonly only been “deep" across many time-steps but do not make use of multiple neural network layers per-time step. To extend LSTMs to greater network depth per time-step, I will discuss Recurrent Highway Networks (RHNs) which use Highway layers to create recurrent networks with multiple neural network layers per time-step. I will then highlight the state-of-the-art performance and trade-offs involved in using RHNs for language modeling. Finally, future directions for deep RNNs are outlined. Reference: https://arxiv.org/abs/1607.03474 Github: https://github.com/julian121266/RecurrentHighwayNetworks (https://github.com/julian121266/RecurrentHighwayNetworks)

    15
  • Züri ML #31: Machine Learning at Google Zurich

    Sorry, this event is fully booked! Agenda • 18:00 - Entrance • 18:35 - Welcome • 18:40 - Emmanuel Mogenet (confirmed): "ML Research @ Google ZH" • 19:30 - Katja Filipova (confirmed): "Sentence compression by all means" • 20:00 - Drinks & Discussion • 21:00 - end Emmanuel Mogenet onIntroduction on Machine intelligence at Google and its research organization in Europe Bio: Emmanuel Mogenet is an Engineering Director who currently leads the ZRH-based Google European Research Lab. Prior to his current role, Emmanuel led a team of 200+ engineers focused on improving various aspects of Google's search engine. Before joining Google to work on Research and Search problems, Emmanuel spent most of his career working on solving 3D computer graphics and image processing problems for the film special effects industry. In particular, Emmanuel worked until 2006 at Apple Computers in California where he was part of the advanced image progressing group. Emmanuel was born in 1967 in a small town in the south-east of France. He earned his Master’s degree in Computer Science and Artificial Intelligence in 1990 from the School of Mines of St-Etienne. During the course of his career, Emmanuel lived and worked in Paris, Singapore, Tokyo, Los Angeles, and finally Zürich. Katja Filippova on Sentence compression by all means: From pruning syntactic trees to generating zero-one sequences Abstract: Text summarization has been a popular topic in the NLP research community because of the promise it holds for real-world applications and also because of its complexity: there are numerous scenarios where one would benefit from a technology to express the gist in a few words but doing so seems to require a lot of linguistic and world knowledge. In this talk I will give an overview of how sentence-level summarization, aka sentence compression, has been approached by our team in the past years and will describe an evolution from a syntax-based optimization algorithm to a syntax-free deep neural network. Bio: Katja Filippova is a research scientist at Google. She holds a Ph.D. from the Technical University of Darmstadt (2009) and a MA from the University of Tübingen (2005). During her Ph.D. she was supported by the Klaus Tschira Foundation and was affiliated with the EML Research in Heidelberg (now the Heidelberg Institute for Theoretical Studies). She has worked on applying statistical methods to text understanding and generation.

    24
  • Züri ML #30: Large Scale Data in Practice

    ETH Zurich, Main building (HG Building), E3

    Machine Learning at Scale: Working at the Interface between System and Algorithm. Celestine Dünner, IBM Abstract: In machine learning the design of the hardware system on one hand and the learning algorithm on the other hand is often conducted independently. As a consequence, in practice, available resources may not be efficiently utilized by the algorithm, which can drastically degrade its performance. In this talk I will focus on the challenge of designing algorithms that run efficiently on large scale machine learning systems. I will present results of a recent performance study of Spark and MPI with the goal to illustrate that the optimal parameters of a distributed learning algorithm highly depend on the characteristics of the system as well as the software framework it is implemented on. Reference: https://arxiv.org/pdf/1612.01437v1.pdf Applying Machine Learning on Healthcare Data Diego Saldana Miranda, Novartis The diversity of data sources that need to be analyzed in healthcare and outcomes research is growing more and more. However, analyzing such diverse data sources is a non-trivial task, requiring new approaches in data preparation, modelling, and reporting results. The size of some of these datasets also means that specialized infrastructure, able to cope with such data sources is required. In this presentation, we will introduce the audience to the diversity of datasets that are increasingly found in healthcare. We will cover topics such as randomized clinical trial, health insurance claims, as well as smart device data. How their usage differs from one another, what are some of the machine learning methods that can be applied today and what are the perspectives for the future.

    7
  • Züri ML #29: Oxbridge Machine Learning Mafia. :-)

    ETH Zurich, Main building (HG Building), E3

    Data Summarization with Kernel Methods • Jovana Mitrovic (https://uk.linkedin.com/in/jovana-mitrovic-3b62823a), PhD student, University of Oxford With the ever-increasing size and complexity of modern datasets, the need for efficient data summarization methods is constantly growing.While there exist many heuristics for this task, there are only few principled general-purpose approaches for the selection or construction of summary statistics. In addition to that, many of the available heuristics are domain- and problem-specific thus severely restricting their generality. In this talk, we discuss the use of kernel methods for data summarization purposes. In particular, we present a novel kernel-based framework that automatically constructs problem-specific data summaries that enable more accurate parameter inference. This approach is particularly well suited for highly structured datasets, e.g. time series, geospatial data, as it efficiently encodes the underlying dependence structure. We showcase the computational and statistical efficiency of our method on complex population dynamics models from ecology. Reference: http://jmlr.org/proceedings/papers/v48/mitrovic16.html Fully Character-Level Neural Machine Translation without Explicit Segmentation • Jason Lee (https://ch.linkedin.com/in/jasonleeinf), PhD student, ETH Zurich Most existing machine translation systems operate at the level of words, relying on explicit segmentation to extract source tokens. We introduce a neural machine translation (NMT) model that maps a source character sequence to a target character sequence without any segmentation. Our character-to-character model outperforms a recently proposed baseline with a subword-level encoder on WMT’15 DE-EN and CS-EN, and gives comparable performance on FI-EN and RU-EN. We then demonstrate that it is possible to share a single characterlevel encoder across multiple languages by training a model on a many-to-one translation task. Reference: https://arxiv.org/abs/1610.03017

    1
  • Züri ML #28: Learning Sound: All things Audio.

    ETH Zurich, CAB Building, Room G11

    • Audio Signals Quality Diagnostics with Image Analysis. by Vasily Tolkachev, ZHAW In this technical talk I present a number of interesting findings from a project with our industrial partner. The goal was to build a decent discriminative model which is able to distinguish between working and broken sound emitters based on the sound files produced by them. We approached the problem with various image analysis tools by applying different classifiers on spectrograms of these files. A technique called t-SNE, which led to the key findings in the project, is going to be introduced. Having faced a number of data artefacts such as erroneous labels and class imbalance, sufficiently good performance was already achieved with Random Forest after a number of important transformations. In conclusion, a comparison to variational autoencoders will be exemplified. • Audio Based Bird Species Identification Using Deep Learning Techniques by Elias Sprengel, ETHZ Accurate bird species identification is essential for biodiversity conservation and acts as an important tool in understanding the impact of cities and commercial areas on ecosystems. Therefore, many attempts have been made to identify bird species automatically. These attempts usually rely on audio recordings because images of birds are harder to obtain. They work well when the number of bird species is low and the recordings contain little background noise, but they quickly deteriorate when employed in any real world scenario. In this talk, we present a new audio classification approach based on recent advances in the domain of deep learning. With novel pre-processing and data augmentation methods, we train a neural network on the biggest publicly available dataset. This dataset contains crowd-sourced recordings of 999 bird species, providing an excellent way of evaluating our system in a more realistic scenario. Our convolutional neural network is able to surpass current state of the art results and won this year’s international BirdCLEF 2016 Recognition Challenge.

    7
  • Züri ML #27: Zurich meets USA: Clarifai and tag.bio

    ETH Zurich, Main building (HG Building), E3

    • Matt Zeiler (https://twitter.com/mattzeiler), CEO, Clarifai (https://www.clarifai.com/) (via Skype) • Jesse Paquette (https://www.linkedin.com/in/jessepaquette), CSO, tag.bio (http://tag.bio/) (live) • Clarifai Matthew Zeiler is an artificial intelligence expert with a Ph.D. in machine learning from NYU. His groundbreaking research in visual recognition, alongside renowned machine learning pioneers Geoff Hinton and Yann LeCun, has propelled the image recognition industry from theory to real-world practice. As the founder of Clarifai, Matt is applying his award-winning research to create the best visual recognition solutions for businesses and developers and power the next generation of intelligent apps. Reach him @MattZeiler. • tag.bio Making an idea machine: modular architecture for a scaleable exploratory data analysis platform in genomics, sports and beyond Exploratory data analysis has been a core facilitator of discovery in genomics research over the last 20 years. The critical advancement in the field has been to put data analysis tools in the hands of domain experts: non-programmers and non-statisticians that have the capacity to hypothesize and discover in their field. Jesse Paquette and tag.bio have identified key aspects from the discovery process in genomics and used them to develop a modular, scalable exploratory data analysis platform that can be configured for a wide variety of data domains. Case studies presented will focus on applications of the tag.bio platform for discovery in sports.

    1
  • Züri ML #26: Fashion and Finance

    ETH Zurich, CAB Building, Room G11

    confirmed schedule: • Deep learning with an eye for fashion Matthias Dantone and Lukas Bossard, Fashwell (http://tech.fashwell.com/) • Machine learning for quantitative trading Martin Froehler, Quantiacs (http://www.quantiacs.com/) Abstracts: • Deep learning with an eye for fashion Fashwell is a deep learning startup with a focus on fashion from Zurich, Switzerland. Our vision is to bring fashion content and commerce together and ultimately to make every fashion image on the web shoppable with the help of our algorithms. In this talk, we give an overview on how to data bootstrap your machine learning startup and introduce the algorithms, that fuel the recognition of fashion products in consumer images. • Machine learning for quantitative trading Quantitative trading is the methodical way of trading. It's a $300b industry. Hedge Funds employing quantitative trading are considered to be the elite of the industry. Today, with more and better data, machine learning methods have become increasingly popular among quantitative traders. This talk will address the challenges of applying machine learning methods on financial data and how to avoid the most common pitfalls.

    10
  • Züri ML #25: Image Analysis: De-Blurring and City Structures

    ETH Zurich, CAB Building, Room G11

    confirmed schedule: • The State of Affairs of Blind Deconvolution Paolo Favaro (http://www.cvg.unibe.ch/paolo%20favaro.html), University of Bern • Learning City Structures from Online Maps Pascal Kaiser (https://www.linkedin.com/in/pascal-kaiser-b9858050), ETH Zürich Abstracts: • The State of Affairs of Blind Deconvolution In the past decade a renewed major effort has been devoted to the blind deconvolution problem, with particular attention for the case of motion blur. Many of current approaches are essentially built on an iterative alternating energy minimization. At each step either the sharp image or the blur function are reconstructed. Much of the success of these algorithms has been attributed to the use of sparse gradient priors. However, recent work has showed analytically that this class of algorithms is ineffective as it should trivially return the original input as the sharp image. In contrast, one can observe experimentally that these alternating minimization algorithms converge to the desired solution. We will show both analysis and experiments to resolve this paradox. We also introduce our own adaptation of this algorithm and show that, in spite of its extreme simplicity, it is very robust and achieves a performance on par with the state of the art. Short bio: Paolo Favaro is full professor at the University of Bern, Switzerland, where he heads the Computer Vision Group. He received the Laurea degree (B.Sc.+M.Sc.) from Università di Padova, Italy in 1999, and the M.Sc. and Ph.D. degree in electrical engineering from Washington University in St. Louis in 2002 and 2003 respectively. He was a postdoctoral researcher in the computer science department of the University of California, Los Angeles and subsequently in Cambridge University, UK. Between 2004 and 2006 he worked in medical imaging at Siemens Corporate Research, Princeton, USA. From 2006 to 2011 he was Lecturer and then Reader at Heriot-Watt University and Honorary Fellow at the University of Edinburgh, UK. His research fields include computer vision, computational photography, machine learning, and signal and image processing. • Learning City Structures from Online Maps The automated generation of maps from satellite images has long been a challenging task. Nowadays huge amounts of very-high resolution satellite images are publicly available from online map providers like Google Maps. This freely available map data can be used to train deep neural networks to effectively recognize buildings and streets in satellite images on a per-pixel level. In this talk we will first have a look at fully convolutional networks (FCNs), a form of multi-layer neural network, which are able to output object label predictions for each pixel of a given input image. Then we will discuss FCNs that were trained on publicly available map data and see how well these models predict buildings and streets.

    11