Munich DataGeeks - June 2015 Edition

This is a past event

97 people went

Hubert Burda Media

Arabellastr 17, München · München

How to find us

At the entrance, ask for the Bootcamp location. Its at the first floor.

Location image of event venue

Details

Format:

- 2 presentations (each ca. 30-40 min incl. discussion)
- Of course time for networking + food + drinks before, in between and especially after the presentations
- Talks are held in English

The Line-Up (detailed description of the talks below):

Bernhard Slominski - Theme planets

Konark Modi - Designing NRT(NearRealTime) stream processing systems: Using Storm

---

Speaker: Bernhard Slominski

Title: Theme planets - Visualising trend and future research

Abstract: The idea of theme planets is to collect information from various data soucres and group this information into themes and show the correlation of the themes similar to the gravity of planets. We want to visualize these correlations with 3 D gaming engines and display it with the planetarium technology of Zeiss.

Bio: Bernhard is the CEO and founder of the Munich based company factfish.
factfish is a portal for all kind of statistics, like population, economy and environment. So far it is the biggest country database in the world.
Bernhard has more than 15 years experience in the area of web development and data analysis and worked for various startup companies and was in the management on one of the largest Online Shops in Germany.

--

Speaker: Konark Modi

Title: Designing NRT(NearRealTime) stream processing systems: Using Storm

Abstract: The essence of near-real-time stream processing is to compute huge volumes of data as it is received. This talk will focus on creating a pipeline for collecting huge volumes of data and processing near-real time using Storm.

Storm is a high-volume, continuous, reliable stream processing system developed at BackType and open-sourced by Twitter. Storm is being widely used in lot of organizations and has variety of uses-cases like:

* Realtime analytics
* Distributed RPC
* ETL etc.

During the course of 40 minutes using an example of Real-time Wikipedia edit we will try and understand:

* Basic concepts of stream-processing.
* High level understanding of components involved in Storm.
* Writing producer in Python which will will push in Queue the real-time edit feed from Wikipedia.
* Write storm topologies in python to consume feed and process real-time metrics like:
* Number of articles edited.
* Category wise count of articles being edited.
* Distinct people editing the articles
* GeoLocation counters etc.
* Technological challenges revolving around near-real time stream processing systems:
* Achieve low latency for processing as compared to batch processing.
* State-management in workers to maintain aggregated counts like counting edits for same category of articles.
* Handling failures and crashes
* Deployment Strategies.

Bio: Konark loves art and keyboards - whether they make music, or an interesting data point for what they call it to be Big Data. Konark has worked with one of the largest OTA's in India, and coming into the world of data engineering with a dev-ops background has been an incredible journey for him. He likes to contributing to the community in whatever way he can - be it through organizing conferences for like-minded people or just disrupting social causes through technology.