Two more talks are coming up, mark your agenda's. The usual recipe: one industry speaker, one academic speaker and drinks afterwards.
Bouke Huurnink (Netherlands Institute for Sound and Vision (http://www.beeldengeluid.nl/)) - Integrating Automatic Annotations in Audiovisual Search
The Netherlands Institute for Sound and Vision stores 70% of all Dutch national audio-visual heritage. New television and radio broadcasts are ingested daily, and the archive grows at a rapid pace. Until the end of 2014, text metadata was manually added to items in the archive for search and access. Now, with increasing costs and a limited budget, in-house manual annotation is no longer an option. Ensuring the content of the archive remains accessible is an ongoing challenge.
In this talk I will discuss the introduction of automatic annotation into the Institute’s archive. As part of a coping strategy for dealing with the lack of manual annotation, the Institute has invested in both automatic speaker labelling on the basis of audio signals, and automatic term extraction on the basis of subtitles. These methods can be set to have high precision, but are still naturally noisy. This makes it a challenge to integrate their results into a production environment, where users are accustomed to extremely accurate human-created text. I will outline the implementation of the methods with the help of start-ups, their integration into the search engine and search interface, and the organisational structures used to achieve this.
Bouke Huurnink manages the Development Department at the Netherlands Insitute for Sound and Vision. In this role he is responsible for incorporating innovative methods and applications in the Institute’s production landscape. Bouke holds a PhD in Computer Science from the University of Amsterdam. His work focuses on user-centered product innovation, especially in the areas of information access and retrieval.
Ivan Titov (Institute for Logic, Language and Computation (https://www.illc.uva.nl/)) - Learning Shallow Semantics with Little or No Supervision
Inducing meaning representations from text is one of the key objectives of natural language processing. Most existing statistical semantic analyzers rely on large human-annotated datasets, which are expensive to create and exist only for a very limited number of languages. Even then, they are not very robust, cover only a small proportion of semantic constructions appearing in the labeled data, and are domain-dependent. We investigate approaches which do not use any labeled data but induce shallow semantic representations (i.e. semantic roles and frames) from unannotated texts. Unlike semantically-annotated data, unannotated texts are plentiful and available for many languages and many domains which makes our approach particularly promising. I will contrast the generative framework (incl. our non-parametric Bayesian model) and a new approach called reconstruction-error minimization (REM) for semantics. Unlike the more traditional generative framework, REM lets us effectively train expressive feature-rich models in an unsupervised way. Moreover, it allows us to specialize our representations to be useful for (basic forms of) semantic inference. We show that REM achieves state-of-the-art results on the unsupervised semantic role labeling task (across languages without any language-specific tuning) and significantly outperforms generative counterparts on the unsupervised relation discovery task.
Joint work with Ehsan Khoddam, Alex Klementiev and Diego Marcheggiani.
Ivan Titov joined the faculty of the University of Amsterdam in April 2013. Before that he was at the Saarland University as a junior faculty and head of a research group (2009 - 2013), following a postdoc at the University of Illinois at Urbana-Champaign. He received his Ph.D. in Computer Science from the University of Geneva in 2008 and his master's degree in Applied Mathematics from the St. Petersburg State Polytechnic University (Russia) in 2003.
His research interests are in statistical natural language processing (models of syntax, semantics and sentiment) and machine learning (structured prediction methods, latent variable models, Bayesian methods). He has recently received prestigious VIDI (2015) and ERC Starting grant (2015). His research is also supported by Google Focused award [masked]) and more recently funding from Yandex, SAP and Amazon.