Spark on AWS - Best practices & lessons learned

Name: Spark on AWS - Best practices & lessons learned
Start: 2018-05-23T19:00:00+02:00
End: 2018-05-23T21:00:00+02:00
Location: Austin Fraser GmbH

Hosted by AI Performance Engineering Meetup (Munich)

AI Performance Engineering Meetup (Munich)

Details

A fresh new talk about best practices & lessons learned using Spark on AWS.
Main topics are:

Spark & S3
AWS Datapipeline
Zeppelin: Setup & workarounds
Connecting AWS Sagemaker & Spark
Metadata management: AWS Glue

We talk about some general considerations to structure your data when storing & reading from S3. How to use AWS Datapipeline to circumvent some of the current limitations due to Hadoop's S3 library and S3's eventual consistency as well as improvements by the recent release of the Hadoop library.
We present our approach to use Zeppelin in a multi-user environment and how we bootstrap and stabilize it.
With AWS Sagemaker an interesting new service focused on Machine Learning started this year. We will show how to connect its Jupyter notebooks to Spark on EMR and discuss differences to Zeppelin.
In the end we will have a quick glance at another new service „AWS Glue" and why you should use it.

Bio:
Lars Haferkamp works as a Data Engineer at comSysto Reply. Since 3 years he works in teams focused on analyzing massive amounts of sensor data with Spark on AWS and building platforms for Data Scientists

AI Performance Engineering Meetup (Munich)

Spark on AWS - Best practices & lessons learned

AI Performance Engineering Meetup (Munich)

Details

Related topics

You may also like