Past Meetup

NL-Hadoop User Group - Data-science: how to get value from your data

Hosted by Netherlands Hadoop User Group

Public group

This Meetup is past

49 people went

Science Park Amsterdam

Science Park 105 · Amsterdam

How to find us

Location image of event venue


Detailed agenda:

17:15 - 17:50 - Come in, socialize, grab pizza and beer(s)
17:50 - 18:00 - Welcome (Evert Lammerts, SARA)
18:00 - 18:40 - Data Science & sensor data (Joaquin Vanschoren, LIACS)
18:45 - 19:25 - Large-scale Data Processing for Information Retrieval (Edgar Meij, UvA ILPS)
19:30 - 20:30 - More socializing, eating, and drinking

Data-science: how to get value from your data

Collecting data and setting up a platform to process it are necessary (and fun!) steps towards generating knowledge and value. But with data and dev(op)s you are only halfway – when you have your data and infrastructure in place, how do you generate value?

This meetup will focus on the knowledge needed to turn data into value. We will hear two stories. One will focus on large amounts of non-/semi-/structured textual data and statistical methods for information retrieval systems, the other will focus on analytics for sensor data.

Talk 1: Data Science & sensor data

The ever increasing deployment of large sensor networks calls for novel algorithms for time series analysis on a terabyte or petabyte scale. MapReduce/Hadoop systems can scale to such large amounts of data, but require novel approaches to process sensor data in the MapReduce programming model. In this work, we show how MapReduce can indeed be used to efficiently perform highly scalable sensor data analysis.

Dr. Joaquin Vanschoren works on the InfraWatch project ( at the Leiden Institute for Advanced Computer Science, LIACS)

Talk 2: Large-scale Data Processing for Information Retrieval

Modern web search engines are making increasing use of signals other than mere textual statistics. While documents used to be matched to keyword queries based on term counting alone, modern information retrieval systems incorporate and learn from a large number of features pertaining to the query, user, documents, entities, sessions, etc. In particular, a document ranking generated by a web search engine involves combining signals from rich representations of users (including their location, browser, device, profile, history, etc.), semantics (ranging from simple spell-checking to recognizing entities), popularity, social networking, and more. All of these features need to be computed at an increasingly large scale and call for Big Data storage and analytics methods. In this talk I will give some examples of current IR research being done at the University of Amsterdam, leaning heavily on MapReduce and related programming paradigms.

Dr. Edgar Meij works as a postdoc at the Information and Language Processing Systems group of the University of Amsterdam (