Toronto Apache Spark #7


Details
An interactive and engaging event driven by the community. We will have three short Community Driven Talks.
Agenda:
6:30PM to 7:00PM - Networking (refreshments provided)
7:00PM to 8:30PM - 2 Short Community Driven Talks
8:30PM to 8:45PM - Break
8:45PM to 9:45PM - 1 Short Community Driven Talk
You can submit your community talk proposal using the following Google Forms link:
--------------------------------------------
Submitted Community Talks:
Speaker: Sean McIntyre (https://www.linkedin.com/in/seancmcintyre)
Title: Towards Sustainable Spark Development
Description:
Code written for Apache Spark often straddles the line between script and software; an assemblage of functional logic which naturally grows more complex and transitions into production use without well-defined strategies for maintenance and extension. Enhancing and sustaining productivity on this platform necessitates implementing Spark scripts in a more modular, reusable and composable fashion. This talk will focus on strategies developed by myself and others at Uncharted Software which have been critical to our ability to reduce boilerplate code, facilitate reuse between teams, and productionize agilely-crafted ETL and ML logic while exposing it for safe iteration and improvement. Time will be left for discussion, questions and a live demonstration via the Databricks notebook.
-----------
Speaker: Nick Evans (https://www.linkedin.com/in/nicolasevans)
Title: Realtime Risk Management Using Kafka, Python, and Spark Streaming
Description:
At Shopify, we underwrite credit card transactions, exposing us to the risk of losing money. We need to respond to risky events as they happen, and a traditional ETL pipeline just isn’t fast enough. Spark Streaming is an incredibly powerful realtime data processing framework based on Apache Spark. It allows you to process realtime streams like Apache Kafka using Python with incredibly simplicity.
-----------
Speaker: Jacek Laskowski (https://www.gitbook.com/@jaceklaskowski)
Title: The Niceties of DataFrame and Pipeline APIs
Description:
This is a live coding session to show you the niceties of the latest and greatest release of Spark 2.0.0-SNAPSHOT with DataFrame and Pipeline APIs (Pipelines are used to showcase how DataFrames helped MLlib to be so nice to use).
--------------------------------------------
Sponsors:
Shopify is sharing their office space with us for the event

Toronto Apache Spark #7