2 Talks on Streaming - Apache Apex & Spark Streaming


Details
*Note, expedite check in at Galvanize; register here (https://www.eventbrite.com/e/sf-data-science-meetup-enteprise-grade-streaming-on-hadoop-in-under-2ms-tickets-21569585210)
Talk 1: Intro to Streaming (with Spark Streaming)
As realtime / near-realtime data analysis on big data has evolved as an important industry use case, it has become more critical for managers and practitioners to understand streaming technologies. This talk explains the basics of streaming analysis frameworks using Spark Streaming as an example. Discussion will be on defining and integrating batch and streaming data, the various time horizons of streaming data, the details of the Spark framework, and code examples.
Prerequisites:
Previous knowledge of streaming applications is NOT required, and prior knowledge of the Spark core API is appreciated, but also NOT required.
Meet the Speaker:
Aaron Merlob is an instructor for Galvanize's 12 week Data Engineering Immersive. (http://www.galvanize.com/courses/data-engineering/)
Talk 2: Enteprise Grade Streaming on Hadoop in Under 2ms
More than ever, there’s a fundamental need for dynamic and flexible machine learning powered by huge amounts of data in a real-time distributed environment. To tackle this challenge, the Capital One Vault 8 team launched a thorough investigation of available technologies with a focus on the business goals and requirements. Ilya Ganelin, Capital One Data Engineer, will present the team's findings, with an intimate journey into how requirements ultimately translate into technology selection of Apache Apex as the underlying platform, and of course, some fantastic results!
About Apache Apex:
Apache Apex is an open source stream processing platform incubating at the Apache Software Foundation. It was designed to run natively on YARN and HDFS within the Hadoop ecosystem. What sets Apex apart from other stream processing platforms? Apex was architected for scalability and low-latency processing, high availability and operability. The Apache Apex Platform Architect Pramod Immaneni will go over its pipeline processing architecture for real-time and batch processing.
Meet Your Speaker:
Ilya Ganelin (Capital One Data Engineer) is a roboticist turned data engineer. After a few years at the University of Michigan building self-discovering robots and another few years working on embedded DSP software with cell phones and radios at Boeing, he landed in the world of big data at the Capital One Data Innovation Lab.
Pramod Immaneni (DataTorrent Architect) is a senior architect at DataTorrent Inc, where he works on the Apex platform and specializes in big data applications. Prior to DataTorrent he was a founder of technology startups. He was CTO of Leaf Networks, a company he co-founded and was acquired by Netgear Inc. He built products in the core networking space and holds patents in peer-to-peer VPNs. Before that he was involved in starting a company where he architected a dynamic content customization engine for mobile devices.

2 Talks on Streaming - Apache Apex & Spark Streaming