- UGent Data Science Seminar: Prof. Krzysztof Dembczynski
UGent Data Science Seminar Speaker: Prof. Krzysztof Dembczynski (Poznań University of Technology) http://www.cs.put.poznan.pl/kdembczynski/ Title: Label tree algorithms for extreme classification Abstract: Extreme classification (XC) is a multi-class or multi-label problem with an extremely large output space consisting of even millions of labels. Examples of real-life problems of this scale can be found in image and document tagging, ranking and recommender systems, or web advertising. In this talk we will first discuss applications and challenges faced in XC. In the second part we will discuss a family of algorithms based on label trees, which includes hierarchical softmax (HSM) and probabilistic label trees (PLTs). The former is a well-known approach for reducing the time complexity of multi-class classification used, for example, in word2vec and fastText. The latter is a non-regret generalization of HSM to multi-label classification. The PLT model has been recently used in extremeText which extends fastText to deal with multi-label data and in Parabel, being currently one of the best XC algorithms. Bio: I am an assistant professor at Poznań University of Technology (Poland), in the laboratory of Intelligent Decision Support Systems headed by Prof. Roman Słowiński. My research interests span the fields of machine learning and decision support. In particular, I was working on decision rule models, boosting and preference learning. Currently, my main research activity concerns multi-label classification and structured output prediction. After the seminar, a sandwich lunch will be provided for registered participants.
- UGent Data Science Seminar: dr. Thomas Hamelryck
UGent Data Science Seminar Speaker: dr. Thomas Hamelryck (University of Copenhagen, https://www1.bio.ku.dk/binf/) Title: Probabilistic Programming for protein structure analysis, prediction and design Abstract: Probabilistic Programming is an emerging technology in machine learning, after Deep Learning and Big Data Analytics. The main idea is: 1. Use a computer language augmented with statistical operators 2. ...to formulate a suitable probabilistic model for a given data set and 3. ...to perform automated statistical inference by executing the program. This approach has been made possible by recent breakthroughs in automated inference (including advanced sampling methods and variational inference) and numerical computing (including software such as PyTorch and TensorFlow). I will outline what probabilistic programming is all about and discuss how it can impact protein structure prediction, analysis and design. Specifically, I will present a Bayesian model of protein superposition implemented in the deep probabilistic programming language Pyro. This model could potentially serve as a suitable likelihood function for Bayesian structure prediction using deep probabilistic programming.
- UGent Data Science Seminar: dr. Arpit Mittal
UGent Data Science Seminar Speaker: Dr. Arpit Mittal (Amazon Research Cambridge) (https://www.linkedin.com/in/arpit-mittal-71a789b/?originalSubdomain=uk) Title: Large-scale Fact Extraction and Verification Abstract: With billions of individual pages on the web providing information on almost every conceivable topic, we should have the ability to collect facts that answer almost every conceivable question. However, only a small fraction of this information is contained in structured sources (Wikidata, Freebase, etc.) – we are therefore limited by our ability to transform free-form text to structured knowledge. There is, however, another problem that has become the focus of a lot of recent research and media coverage: false information coming from unreliable sources. In this talk I will talk about our recent work on fact extraction and verification. I will also present the new publicly available FEVER dataset we built for this task. Finally I will discuss top entries from the FEVER challenge and its new variant FEVER 2 based on ‘Build it, break it, fix it’ paradigm. Bio: Dr. Arpit Mittal is a Senior Machine Learning Scientist at Amazon Research Cambridge. He is currently working on projects involving knowledge extraction, information retrieval and question answering. Before joining Amazon, Arpit worked on augmented reality (AR) and made fundamental contributions to an industrial AR SDK: Vuforia. He received his PhD from the University of Oxford in Computer Vision and Machine Learning. Within Amazon, Arpit manages the research internship program for their Cambridge UK office. Coffee and tea will be served after the seminar.
- UGent Data Science Seminar: Prof. Padhraic Smyth
UGent Data Science Seminar Speaker: Prof. Padhraic Smyth (UC Irvine) Chancellor’s Professor Department of Computer Science and Department of Statistics University of California, Irvine, USA (https://www.ics.uci.edu/~smyth/) Title: Deep Learning and Statistics: Connections Abstract: Deep learning techniques have received widespread attention in recent years for their impressive performance across a range of prediction problems in areas such as computer vision, machine translation, and speech processing. Much of this work has occurred outside of statistics - yet, several of the key ideas underlying deep learning have statistical foundations. In this talk we will explore several connections between the two fields, discussing how some of the key concepts in deep learning are related to traditional ideas in statistical modeling and estimation, covering topics such as model representation, regularization and generalization, and model calibration. We will also highlight key differences, both technically and culturally, in terms of how data analysis problems are approached by researchers in both fields. Bio: Padhraic Smyth is a Chancellor's Professor in the Department of Computer Science at UC Irvine, with joint appointments in the Department of Statistics and in the Department of Education. His research interests include machine learning, artificial inteligence, pattern recognition, and applied statistics and he has published over 180 papers on these topics. He is an ACM Fellow (2013), a AAAI Fellow (2010), and a recipient of the ACM SIGKDD Innovation Award (2009). He is co-author of the text Modeling the Internet and the Web: Probabilistic Methods and Algorithms (with Pierre Baldi and Paolo Frasconi in 2003) and Principles of Data Mining, MIT Press (with David Hand and Heikki Mannila in 2001), and he served as program chair of the ACM SIGKDD 2011 conference and the UAI 2013 conference. He was the founding director of the UCI Center for Machine Learning and Intelligent Systems from 2007 to 2014 and founding director from 2014 to 2018 of the UCI Data Science Initiative. Padhraic has served in editorial and advisory positions for journals such as the Journal of Machine Learning Research, the Journal of the American Statistical Association, and the IEEE Transactions on Knowledge and Data Engineering. While at UC Irvine he has received research funding from agencies such as NSF, NIH, IARPA, NASA, NIST, and DOE, and from companies such as Google, eBay, Adobe, IBM, Microsoft, SAP, Xerox, and Experian. In addition to his academic research he is also active in industry consulting, working with companies such as eBay, Toshiba, Samsung, Oracle, Nokia, and AT&T, as well as serving as scientific advisor to local startups in Orange County. He also served as an academic advisor to Netflix for the Netflix prize competition from 2006 to 2009. Padhraic received a first class honors degree in Electronic Engineering from National University of Ireland (Galway) in 1984, and the MSEE and PhD degrees (in 1985 and 1988 respectively) in Electrical Engineering from the California Institute of Technology. From 1988 to 1996 he was a Technical Group Leader at the Jet Propulsion Laboratory, Pasadena, and has been on the faculty at UC Irvine since 1996. A sandwich lunch will be served after the seminar for registered participants.
- UGent Data Science Seminar: Prof. Maarten de Rijke
UGent Data Science Seminar Speaker: Prof. Maarten de Rijke (University of Amsterdam) (https://staff.fnwi.uva.nl/m.derijke/) Based on joint work with Harrie Oosterhuis Title: Differentiable Unbiased Online Learning to Rank Abstract: Online Learning to Rank (OLTR) methods optimize rankers based on user interactions. State-of-the-art OLTR methods are built specifically for linear models. Their approaches do not extend well to non-linear models such as neural networks. We introduce an entirely novel approach to OLTR that constructs a weighted differentiable pairwise loss after each interaction: Pairwise Differentiable Gradient Descent (PDGD). PDGD breaks away from the traditional approach that relies on interleaving or multileaving and extensive sampling of models to estimate gradients. Instead, its gradient is based on inferring preferences between document pairs from user clicks and can optimize any differentiable model. We prove that the gradient of PDGD is unbiased w.r.t. user document pair preferences. Our experiments on the largest publicly available Learning to Rank (LTR) datasets show considerable and significant improvements under all levels of interaction noise. PDGD outperforms existing OLTR methods both in terms of learning speed as well as final convergence. Furthermore, unlike previous OLTR methods, PDGD also allows for non-linear models to be optimized effectively. Our results show that using a neural network leads to even better performance at convergence than a linear model. In summary, PDGD is an efficient and unbiased OLTR approach that provides a better user experience than previously possible. Bio (adapted from wikipedia): Maarten de Rijke studied philosophy (MSc 1989) and mathematics (MSc 1990) and wrote a PhD thesis, defended in 1993, on extended modal logics, under the supervision of Johan van Benthem. He worked as a postdoc at the Centrum Wiskunde & Informatica, before becoming a Warwick Research Fellow at the University of Warwick. He joined the University of Amsterdam in 1998, and was appointed professor of Information Processing and Internet at the Informatics Institute of the University of Amsterdam in 2004. He leads the Information and Language Processing group at the University of Amsterdam, the Intelligent Systems Lab Amsterdam and the Center for Creation, Content and Technology. During the first ten years of his scientific career Maarten de Rijke worked on formal and applied aspects of modal logic. At the start of the 21st century, his research focus shifted to information retrieval. He has since worked on XML retrieval, question answering, expert finding and social media analysis. De Rijke was elected a member of the Royal Netherlands Academy of Arts and Sciences in 2017. He was awarded the Tony Kent Strix award in 2017. His work is supported by grants from the Nederlandse Organisatie voor Wetenschappelijk Onderzoek (NWO), public-private partnerships, and the European Commission (under the Sixth and Seventh Framework programmes).
- UGent Data Science Seminar: Prof. Sören Auer
IMPORTANT NOTE: a sandwich lunch will be served after 13:00 in the foyer of the 12th floor of the iGent tower -- i.e. BEFORE and not AFTER the seminar. --- Speaker: Prof Sören Auer (Leibniz Information Centre for Science and Technology and University Library) Director TIB, Head of research group Data Science and Digital Libraries https://www.tib.eu/en/research-development/data-science-digital-libraries/staff/soeren-auer/ Title: Towards Knowledge Graph based Representation, Augmentation and Exploration of Scholarly Communication Abstract: Despite an improved digital access to scientific publications in the last decades, the fundamental principles of scholarly communication remain unchanged and continue to be largely document-based. The document-oriented workflows in science have reached the limits of adequacy as highlighted by recent discussions on the increasing proliferation of scientific literature, the deficiency of peer-review and the reproducibility crisis. We need to represent, analyse, augment and exploit scholarly communication in a knowledge-based way by expressing and linking scientific contributions and related artefacts through semantically rich, interlinked knowledge graphs. This should be based on deep semantic representation of scientific contributions, their manual, crowd-sourced and automatic augmentation and finally the intuitive exploration and interaction employing question answering on the resulting scientific knowledge base. We need to synergistically combine automated extraction and augmentation techniques, with large-scale collaboration to reach an unprecedented level of knowledge graph breadth and depth. As a result, knowledge-based information flows can facilitate completely new ways of search and exploration. The efficiency and effectiveness of scholarly communication will significant increase, since ambiguities are reduced, reproducibility is facilitated, redundancy is avoided, provenance and contributions can be better traced and the interconnections of research contributions are made more explicit and transparent. In this talk we will present first steps in this direction in the context of our Open Research Knowledge Graph initiative and the ScienceGRAPH project. Bio: Following stations at the universities of Dresden, Ekaterinburg, Leipzig, Pennsylvania, Bonn and the Fraunhofer Society, Prof. Auer was appointed Professor of Data Science and Digital Libraries at Leibniz Universität Hannover and Director of the TIB in 2017. Prof. Auer has made important contributions to semantic technologies, knowledge engineering and information systems. He is the author (resp. co-author) of over 100 peer-reviewed scientific publications. He has received several awards, including an ERC Consolidator Grant from the European Research Council, a SWSA ten-year award, the ESWC 7-year Best Paper Award, and the OpenCourseware Innovation Award. He has led several large collaborative research projects, such as the EU H2020 flagship project BigDataEurope. He is co-founder of high potential research and community projects such as the Wikipedia semantification project DBpedia, the OpenCourseWare authoring platform SlideWiki.org and the innovative technology start-up eccenca.com. Prof. Auer was founding director of the Big Data Value Association, led the semantic data representation in the Industrial/International Data Space, is an expert for industry, European Commission, W3C and member of the advisory board of the Open Knowledge Foundation. Before the seminar, at 12 noon, a sandwich lunch will be served for registered participants, in the foyer at 12th floor of iGent tower.
- UGent Data Science Seminar: Dr. Sander Dieleman
UGent Data Science Seminar Speaker: Dr. Sander Dieleman (Google DeepMind) (http://benanne.github.io/about/) Title: Generating music in the raw audio domain Abstract: Realistic music generation is a challenging task. When machine learning is used to build generative models of music, typically high-level representations such as scores, piano rolls or MIDI sequences are used that abstract away the idiosyncrasies of a particular performance. But these nuances are very important for our perception of musicality and realism, so we embark on modelling music in the raw audio domain. I will discuss some of the advantages and disadvantages of this approach, and the challenges it entails. Bio: Sander Dieleman is a Research Scientist at DeepMind in London, UK, where he has worked on the development of AlphaGo and WaveNet. He was previously a PhD student at Ghent University, where he conducted research on feature learning and deep learning techniques for learning hierarchical representations of musical audio signals. During his PhD he also developed the Theano-based deep learning library Lasagne and won solo and team gold medals respectively in Kaggle's "Galaxy Zoo" competition and the first National Data Science Bowl. In the summer of 2014, he interned at Spotify in New York, where he worked on implementing audio-based music recommendation using deep learning on an industrial scale. After the seminar a sandwich lunch will be served. If you wish to take part in this, registration is required.
- UGent Data Science Seminar: Prof. Pieter Abbeel
Speaker: Prof. Pieter Abbeel (UC Berkeley) http://people.eecs.berkeley.edu/~pabbeel/ Title: Deep Learning to learn Abstract: Reinforcement learning and imitation learning have seen success in many domains, including autonomous helicopter flight, Atari, simulated locomotion, Go, robotic manipulation. However, sample complexity of these methods remains very high. In contrast, humans can pick up new skills far more quickly. To do so, humans might rely on a better learning algorithm or on a better prior (potentially learned from past experience), and likely on both. In this talk I will describe some recent work on meta-learning for action, where agents learn the imitation/reinforcement learning algorithms and learn the prior. This has enabled acquiring new skills from just a single demonstration or just a few trials. While designed for imitation and RL, our work is more generally applicable and also advanced the state of the art in standard few-shot classification benchmarks such as omniglot and mini-imagenet. Bio: Pieter Abbeel is Professor and Director of the Robot Learning Lab at UC Berkeley [2008- ], Co-Founder of covariant.ai [2017- ], Co-Founder of Gradescope [2014- ], Advisor to OpenAI, Founding Faculty Partner AI@TheHouse, Advisor to many AI/Robotics start-ups. He works in machine learning and robotics. In particular his research focuses on making robots learn from people (apprenticeship learning), how to make robots learn through their own trial and error (reinforcement learning), and how to speed up skill acquisition through learning-to-learn (meta-learning). His robots have learned advanced helicopter aerobatics, knot-tying, basic assembly, organizing laundry, locomotion, and vision-based robotic manipulation. He has won numerous awards, including best paper awards at ICML, NIPS and ICRA, early career awards from NSF, Darpa, ONR, AFOSR, Sloan, TR35, IEEE, and the Presidential Early Career Award for Scientists and Engineers (PECASE). Pieter's work is frequently featured in the popular press, including New York Times, BBC, Bloomberg, Wall Street Journal, Wired, Forbes, Tech Review, NPR. After the seminar a sandwich lunch will be served. If you wish to take part in this, registration is required. This seminar is supported by imec, and by the WOG "Guiding networked societies, linking data science and modelling".
- 7th UGent Data Science Seminar: Prof. Panayiotis Tsaparas
7th UGent Data Science Seminar Speaker: Prof. Panayiotis Tsaparas (University of Ioannina) (http://www.cs.uoi.gr/~tsap/) Title: Maximizing and moderating opinions in social networks Abstract: The process of opinion formation through synthesis and contrast of different viewpoints has been the subject of many studies in economics and social sciences. Today, online social networks and social media have become the primary forum for people to create relationships, express opinions, and engage in discussions and debates. This has enabled the systematic analysis of opinion dynamics at a global scale, and has raised new research challenges. In our work, we adopt a well-established model for social opinion dynamics, and we study the following problems: (1) The Campaign problem, where the goal is to identify a set of target individuals whose positive opinion will maximize the overall positive opinion in the social network. (2) The Moderate problem, where the goal is to identify a set of target individuals whose moderate opinion will minimize the polarization in the network. In the course of our work, we uncover an interesting connection between the opinion formation process and random walks with absorbing nodes, and we propose a novel metric for measuring polarization in social networks. Bio: Panayiotis Tsaparas completed his undergraduate studies at Computer Science Department of University of Crete, Greece in 1995. He continued his graduate studies at University of Toronto, where he received his M.Sc. and Ph.D degree, under the supervision of Allan Borodin. After graduation, he worked as a post-doctoral fellow at University of Rome, “La Sapienza”, as a researcher at University of Helsinki, and most recently as a researcher at Microsoft Research. Since 2011 he joined the Department of Computer Science and Engineering of University of Ioannina, where he is now an Associate Professor. His research interests include Social Network Analysis, Algorithmic Data Mining, Web Mining and Information Retrieval. From 13:00 a sandwich lunch will be served for registered participants to continue discussions.
- 6th UGent Data Science Seminar: Prof. Michel Dumontier
6th UGent Data Science Seminar Speaker: Prof. Michel Dumontier (Maastricht University) (http://dumontierlab.com/) Title: Are we FAIR yet? Abstract: The FAIR Principles propose key characteristics that all digital resources (e.g. datasets, repositories, web services) should possess to be Findable, Accessible, Interoperable, and Reusable by people and machines. The Principles act as a guide that researchers should expect from contemporary digital resources, and in turn, the requirements on them when publishing their own scholarly products. As interest in, and support for the Principles has spread, the diversity of interpretations has also broadened, with some resources claiming to already “be FAIR”. This talk will elaborate on what FAIR is, why we need it, what it entails, and how we should evaluate FAIRness. I will describe new social and technological infrastructure to support the creation and evaluation of FAIR resources, and how FAIR fits into institutional, national and international efforts. Finally, I will discuss the merits of the FAIR principles (and what we ask of people) in the context of strengthening data-driven scientific inquiry. Bio: Dr. Michel Dumontier is a Distinguished Professor of Data Science at Maastricht University. His research focuses on the development of computational methods for scalable integration and reproducible analysis of FAIR (Findable, Accessible, Interoperable and Reusable) data. His group combines semantic web technologies with effective indexing, machine learning and network analysis for drug discovery and personalized medicine. Previously at Stanford University, Dr. Dumontier now leads a new inter-faculty Institute for Data Science at Maastricht University with a focus on accelerating scientific discovery, improving health and well-being, and strengthening communities. He is a Principal Investigator for the NCATS Biomedical Data Translator, a co-Investigator for the NIH Data Commons, and a co-Investigator for the NIH BD2K Center for Expanded Data Annotation and Retrieval (CEDAR). He is a founding member of the FAIR (Findable, Accessible, Interoperable, Re-usable) initiative, a member of the Dutch Techcenter for Life Sciences, and is the scientific director for Bio2RDF, an open source project to generate Linked Data for the Life Sciences. He is the editor-in-chief for the IOS press journal Data Science and an associate editor for the IOS press journal Semantic Web. After the seminar, a sandwich lunch will be provided for registered participants.