Modular Spark: "Spark + AI Summit" & "DataEngConf" preview with Albert Franzi


Details
It looks like autumn is coming with lots of conferences! And from the Barcelona Spark Meetup we're so happy to see that some of the community members are speaking in two of the leading European summits on the field: London's "Spark + AI Summit" (SAIS from now on) & Barcelona's "DataEngConf":
- Albert Franzi will be speaking in both SAIS & DataEngConf
- Ricardo Fanjul will be speaking in both SAIS & DataEngConf
- Iker Martinez de Apellaniz will be speaking in the DataEngConf
- Liana Napalkova will be speaking in the DataEngConf
We've got two good news:
The first one is that if you want to join them attending the confs, we received pretty neat discount codes especially for you:
- Register DataEngConf with an exclusive 30% discount code "SparkBCN30": https://www.dataengconf.com/no-bullshit
- Register SAIS with a special 20% discount code "SAISEurope18": https://databricks.com/sparkaisummit/europe
The second one is that Albert agreed to do a preview of the talk that he will do at the SAIS, where he will explain his proposal on making Spark code more maintainable. Don't miss it! https://databricks.com/session/modular-apache-spark-transform-your-code-in-pieces
See you next Thursday, the 20th of September at 19:00. Like the previous event, we're doing it at Trovit Search offices. Thanks for the venue & networking guys!
Title:
MODULAR APACHE SPARK: TRANSFORM YOUR CODE IN PIECES
Abstract:
Divide and you will conquer Apache Spark. It's quite common to develop a papyrus script where people try to initialize spark, read paths, execute all the logic and write the result. Even, we found scripts where all the spark transformations are done in a simple method with tones of lines. That means the code is difficult to test, to maintain and to read. Well, that means bad code. We built a set of tools and libraries that allows developers to develop their pipelines by joining all the Pieces. These pieces are compressed by Readers, Writers, Transformers, Aliases, etc. Moreover, it comes with enriched SparkSuites using the Spark-testing-base from Holden Karau. Recently, we start using junit4git in our tests, allowing us to execute only the Spark tests that matter by skipping tests that are not affected by the latest code changes. This translates into faster builds and fewer coffees. By allowing developers to define each piece on its own, we enable to test small pieces before having the full set of them together. Also, it allows to re-use code in multiple pipelines and speed up their development by improving the quality of the code. The power of "Transform" method combined with Currying, creates a powerful tool that allows fragmenting all the Spark logic. This talk is oriented to developers that are being already introduced to the world of Spark and that want to discover how developing iteration by iteration and in small steps helps to produce great code with less effort.
Bio:
Albert Franzi is a Software Engineer who fell so in love with data that ended up as Data Engineer for the Schibsted Media Group. He believes in a world where distributed organizations can work together to build common and reusable tools to empower their users and projects. Albert cares deeply about unified data and models as well as data quality and enrichment. He also has a secret plan to conquer the world with data, insights, and penguins.

Modular Spark: "Spark + AI Summit" & "DataEngConf" preview with Albert Franzi