Apache Spark + AI Munich - GeoIp-Location + Scaling Deep Learning #Mar2020

Name: Apache Spark + AI Munich - GeoIp-Location + Scaling Deep Learning #Mar2020
Start: 2020-03-25T18:30:00+01:00
End: 2020-03-25T21:30:00+01:00
Location: Alexander Thamm GmbH

Hosted By

Henning K.

Apache Spark + AI Munich - GeoIp-Location + Scaling Deep Learning #Mar2020

Details

In this Meetup, we are very happy to have Marcin Wojtyczka from Alexander Thamm as well as Michael Shtelma from Databricks talking about GeoIp-Location & distributed image preprocessing, training and hyperparameter search with Apache Spark.

This event will be hosted by Alexander Thamm Gmbh together with two great speakers from the community:

Marcin Wojtyczka (Alexander Thamm Gmbh)
Michael Shtelma (Databricks)

Talk 1: GeoIp-Location with Spark for Thermomix IoT Devices

Talk 2: Scaling Deep Learning: Distributed Image Preprocessing, Training and Hyperparameter Search with Apache Spark

TALKS:

Title:
GeoIp-Location with Spark for Thermomix IoT Devices

Abstract:
GeoIP-location, the process used to determine the physical location of an IP address, makes it possible to see where the users come from and provide better personalized services. Non-personal geo-location data is used in variety of use cases at Vorwerk for the purpose of improving the Thermomix IoT device and its services, such as personalized recipe recommendations and usage analysis. In this talk I will outline data processing pipeline architecture and dive into Geo IP enrichment design and implementation with Spark.

Bio:
Marcin is a Lead Data Architect at Alexander Thamm Gmbh. He specializes in distributed data architecture and cloud computing and is a big fan of handling things pragmatically. He has been consulting the customers worldwide to design and build data platforms in order to improve products you might have used and hopefully enjoyed. As a passionate sea-going sailor he loves to travel the world on sailing boats.

Title:
Scaling Deep Learning: Distributed Image Preprocessing, Training and Hyperparameter Search with Apache Spark

Abstract:
Computer Vision usually requires an enormous number of images. Before training, all these images must be pre-processed (converted, resized, etc). After the dataset is ready for training, data scientists usually have to try a number of possible model architectures and hyperparameters combinations. This talk focuses on the ways of automating this process and parallelizing all stages in order to enable data scientists to iterate faster.

Apache Spark can be used as a backbone for distributing all these tasks. Using image dataset and pandas_udf we can distribute image preprocessing and augment images as needed. Horovod and Petastorm allow us to access training data written by Spark and distribute training among multiple GPUs and nodes. HyperOpt will help us automatically training a number of different neural network architectures and hyperparameter combinations parallel on Spark and MLflow will help us tracking parameter combinations HyperOpt has tried.

This talk includes a live demonstration of full training pipeline including image preprocessing, distributed training and hyperparameter tuning.

Bio:
Michael Shtelma is a Databricks Solutions Architect and ex-Teradata Data Engineer passionate about all data-related topics, especially data engineering and data science in the cloud. He loves to code in Scala and Python. Currently, Michael is working at Databricks in Frankfurt, Germany.

HOUSEKEEPING:

• 2 talks (each ca. 40 min incl. discussion)

• Networking, food & drinks

• Language: English

• There will be photos taken

• A list of registered users will be provided to the host

• Please bring your ID

Events in München