Toronto Apache Spark #3


Details
If you can't make it to the event; you could still participate by following my periscope account @mehrdad_pazooki
I will broadcast the event.
An interactive and engaging event driven by the community. Live Q&A Session with Databricks, 3 Short Community Driven Talks,Initiation of our Monthly 5 bullet points
NOTE: The space is limited to 75 individuals. Please make sure that your RSVP status reflects your actual plan.
Agenda:
6:30PM to 6:45PM - Networking(Refreshments provided)
6:45PM to 6:55PM - Agenda, Organization updates
7:00PM to 7:30PM - Live QA session with Databricks by Denny Lee
7:35PM to 8:15PM - 3 Short Community Driven Talks
8:15PM to 8:30PM - Monthly 5 bullet points
8:30PM to 9:00PM - Networking(Refreshments provided)
Live Q&A Session with Databricks: (7:00PM to 7:30PM)
As I promised you in the last event we are going to have a
live Q&A session with Databricks about Apache Spark.
Please submit your questions using the following link:
Live Q&A with Databricks Submission Form (http://goo.gl/forms/Z1MkjWqWCs)
3 Short Community Driven Talks: (7:35PM to 8:15PM)
You can submit your talk using the following Google Forms link
http://goo.gl/forms/wRFRPhz0Ks
Submitted Talks:
--------
Speaker: Sean McIntyre (https://ca.linkedin.com/in/seancmcintyre)
Title: Continuous Integration for Spark Apps
Description:
We've been writing open-source Spark libraries at Uncharted for several years, and CI has always been a difficult thing to achieve with the Spark runtime. This talk focuses on the architecture we've implemented for running tests within a Spark runtime environment on Travis CI, as well as for measuring code coverage with Coveralls.
++++
Speaker: Younes Abouelnagah (https://ca.linkedin.com/in/younosnaga)
Title: A year of using Spark at Flipp
Description:
How is it like to use Spark in production? After a year of using spark at Flipp, we would like to share our experience with the community. We have migrated from Hadoop and Pig to Spark, and we use it both from Scala and Python. We currently have many batch processing jobs for ETLs as well as building MLLib models, and one Spark streaming job.
++++
Speaker: Oliver Meyn (https://ca.linkedin.com/in/olivermeyn)
Title: Duplicate Detection and Linking with Spark (Discussion)
Description:
At GBIF (http://www.gbif.org) we have 600 million "occurrence" records - roughly tuples of species, date, location, and collector name. Because of aggregation and mistakes we think there are many duplicates in our dataset. I will describe the problem in more detail and spend the bulk of the "talk" asking the audience for their suggestions on ideas for solving the problem with Spark. The hope is that we'll have a community discussion that all will benefit from.
---------
Monthly 5 bullet points of Toronto Apache Spark: (8:15PM to 8:30PM)
Every month we are going to provide you content about five essential developments in Spark Eco-system or related fields that would be of interest to any user or student of Spark.
----------------------------------------------------------------
Sponsors:
Shopify is sharing their office space with us for the event

Toronto Apache Spark #3