Pre-HUG agenda (no registration required)
15.30 - 16.30 : An Open Repository of Web Data - Lisa Green
HUG agenda (registration required)
17.00 - 18.00 : Socializing, drinks, pizza
18.00 - 18.15 : Welcome
18.15 - 18.45 : What you can do with user comments - Carsten Eickhoff, TUD
19.00 - 19.30 : Social media based apps - Manos Tsagkias, UvA
19.30 - 20.30 : Socializing
Building apps from big data
Web crawls, Twitter streams, blogs, click logs, ontologies - big data is omnipresent, some of it still kept under lock, but an increasing amount publically accessible. We have the technology to deal with it, structured or unstructred, in streams or in large batches: Hadoop, NoSQL, Storm, S4, and other fun tools. But once you have your tools and you get your data - what can you do?
This meetup will give examples of tools that have been and are still being build around large datasets at universities in The Netherlands. Manos (UvA) and Carsten (TUD) are data scientists who will tell you about tools they've built in the course of their research.
Special guest: Lisa Green, director of Common Crawl
Lisa Green is the Director of the Common Crawl Foundation where she oversees the foundation’s mission of building, maintaining and openly disseminating a comprehensive crawl of the web. Common Crawl’s 130TB corpus of over 8 billion web pages enables innovation in education, research, and business. Prior to Common Crawl, she was the Chief of Staff at Creative Commons. Lisa holds a PhD in physical chemistry from the University of California Berkeley, lives in San Francisco, and is passionate about open systems and big data.
Lisa is in The Netherlands to discuss Common Crawl data with Sara and researchers in The Netherlands. She will also give a colloqium talk on open data on Oct 4th, at 3.30pm, and join us for the HUG.
Talk 1: What you can do with user comments - Carsten Eickhoff (researcher at TUD)
Carsten Eickhoff is a PhD student at Delft University of Technology. His research interests include information retrieval, statistical NLP and crowdsourcing. In particular, his thesis work has been focused on user-centric retrieval scenarios.
In his talk, Carsten will address the usefulness of user comments for retrieval tasks such as (1) automatically identifying child-friendly videos on the popular content sharing platform YouTube, (2) mitigating performance losses for sparsely-annotated data, or, (3) enhancing blog post retrieval.
Talk 2: Social media based apps - Manos Tsagkias (@samanos, researcher at UvA)
Manos Tsagkias has just submitted his PhD thesis on mining social media. His research focuses on tracking content in social media, and predicting behavior using signals from different information channels. In this talk, Manos will demonstrate the power of social media using three use cases: forecast the mood of the blogosphere, predict the number of comments on news articles, and predict IMDb movie ratings using signals from Twitter and YouTube.