Productionalizing Spark Streaming and Kafka Applications


Details
"If nothing else, data is probably even more front and center in 2018, in both business and personal conversations."
--- "Great Power, Great Responsibility: The 2018 Big Data & AI Landscape" - Matt Turck
********************************************************************************
Schedule:
6:00 PM to 6:30 PM - Registration Open & Networking Time & Serve Pizza
6:30 PM to 7:15 PM - Session #1 Speaker - Joon Kim , Cloudera - Overview of Spark and Hadoop
7:15 PM to 7:30 PM - Break. (10 mins break ideal. But will keep to 15 mins in case if speaker needs some more time to wrap up or Q/A takes little longer)
7:30 PM to 8:15 PM - Speaker #2 Speaker - Robert Sanders - Productionalizing Spark Streaming and Kafka Applications
8:15 PM to 9:00 PM - Networking Time
Session 1:
Spark has established itself as the most active open source project in the world by providing intuitive and flexible API for big distributed data processing and available in popular programming languages. In this session, we will talk about fundamentals of Spark and brief overview of Spark sub-projects/ ecosystems, Spark SQL, Spark Streaming and Spark ML and their use cases.
Speaker:
Joon Kim
Systems Engineer, Cloudera.
Joon Kim has Computer Engineering degree from University of Waterloo. He has been working in the Big Data industry last 6+ years.
Joon has been working as systems engineers at Cloudera 3.5 years and has been working with customers in various industries to consult and architect Big Data solutions using open source Apache Hadoop ecosystem and Spark to meet business requirements and help customers implement new business use cases using Hadoop open source system.
*********************************************************************************
Session 2:
Spark Streaming has quickly established itself as one of the more popular Streaming Engines running on the Hadoop Ecosystem. Not only does it provide integration with many type of message brokers and stream sources, but it also provides the ability to leverage other major modules in Spark like Spark SQL and MLib in conjunction. This allows for businesses and developers to make use out of data in ways they couldn’t hope to do in the past.
However, while building a Spark Streaming pipeline, it’s not sufficient to only know how to express your business logic. Operationalizing these pipelines and running the application with high uptime and continuous monitoring has a lot of operational challenges. Fortunately, Spark Streaming makes all that easy as well. In this talk, we’ll go over some of the main steps you’ll need to take to get your Spark Streaming application ready for production, specifically in conjunction with Kafka. This includes steps to gracefully shutdown your application, steps to perform upgrades, monitoring, various useful spark configurations and more.
Speaker:
Robert Sanders
Engineering Manager, Big Data Practice, Clairvoyant
Robert Sanders is a Big Data Manager, Engineer, and Architect at Clairvoyant. He primarily works with clients to build out Big Data solutions on the Hadoop Ecosystem. Robert has deep background in enterprise systems, working on fullstack implementations and then focusing on Data management platforms.

Sponsors
Productionalizing Spark Streaming and Kafka Applications