Comsysto Reply Insights from Spark

Name: Comsysto Reply Insights from Spark
Start: 2018-07-25T19:00:00+02:00
End: 2018-07-25T21:00:00+02:00
Location: Austin Fraser GmbH

Hosted By

Lars H. and Katherina S.

Details

Talk: Parallel processing for natural texts with Apache Spark`

Processing of massive amounts of texts written by humans is a non trivial task due to computational complexity of underlying algorithms.
We present our first insights from using Spark for solving this task using different approaches for parallelization.
Since many observations cannot be reproduced across the boundaries of linguistic units we have to employ basic NLP techniques to extract necessary features from texts.
Finally we show how it can help to build a text classification system.

BIO Andrei:
Andrei Beliankou works as a Data Engineer at Comsysto Reply GmbH. He mainly focuses on data pipelines and automation for massive sensor data processing.

Talk: Skewed data - the silent killer of parallelism in Spark

We often tune our Spark/Hadoop environment as well as our Spark code for performance and forget sometimes the (changing) structure of data we are processing. But imbalance in datasets (like skewed join keys) can lead to massive performance issues. In this talk I would like to show, how to diagnose such issues, as well as walk through some solution strategies.

BIO Dieter:
Dieter Kling is Data Engineer at Comsysto Reply for the last 3 years working on such big data topics as building Data Lakes and implementing Spark data pipelines.

Events in München, DE

AI Performance Engineering Meetup (Munich)

See more events

AI Performance Engineering Meetup (Munich)

public group

Wednesday, July 25, 2018
7:00 PM to 9:00 PM CEST

Austin Fraser GmbH

Lenbachplatz 1 · München

AI Performance Engineering Meetup (Munich)

public group

Comsysto Reply Insights from Spark