BDAM 3/29: ECA for IoT Rules Engine, Spark Processing & Big Data App Performance


Details
Shoutout to Cask (http://cask.co/) for kindly sponsoring and hosting this meetup!
Cask will also be giving away a BB-8 App-Enabled Droid (http://www.sphero.com/starwars). Enter the raffle on the day of the event for a chance to win!
AGENDA
6:00 - 6:30 - Socialize over food and beer(s)
6:30 - 8:00 - Talks
TALKS
Talk #1: Building an ECA Rules Engine for IoT with CDAP, by Bhooshan Mogal, Cask
Talk #2: Demonstrating the Benefits of Hyper-Acceleration for both Batch and Streaming Spark Processing by Roop Ganguly, BigStream
Talk #3: Improving Application and Cluster Performance for Big Data Stacks, by Kunal Agarwal, Unravel Data
ABSTRACTS
Talk #1: Building an ECA Rules Engine for IoT with CDAP, by Bhooshan Mogal, Cask
Event-condition-action (ECA) rules, where actions are triggered by events, under specific conditions, are the basis of many IoT use cases. In this talk, Bhooshan will explain the fundamentals of a Apache Hadoop-based ECA framework. The framework allows continuous ingestion of any kind of data e.g. from device sensors. It contains a dynamic, distributable rules engine, which can apply rules upon incoming data in real-time. Users can then take actions (which are pluggable) upon applying these rules. Bhooshan will demonstrate Cask's solution, which leverages Spark Streaming as the real-time engine and offers REST API’s for easily building custom applications. His demo will also show how to send events into the system, create rules easily and code-free, execute rules, and send notifications.
Talk #2: Demonstrating the Benefits of Hyper-Acceleration for both Batch and Streaming Spark Processing, by Roop Ganguly, BigStream
Today full-stack developers, BI/analytics teams and IT ops are discovering that a growing number of Apache Spark workloads are requiring maximum compute capacity and performance. At this year’s Spark Summit, Spark creator Matei Zaharia cited a “compute bottleneck” for Spark applications and that acceleration strategies in hardware
(FPGAs/GPUs) and software must become “first class resources” in any Spark project. In this presentation, we will present two case studies that demonstrate viable acceleration strategies for Spark applications.
The first demonstrates performance increases for the well-known TPC-DS decision support benchmark using software-based acceleration on Amazon EMR. Software acceleration is able to provide 2-4x speedup for these benchmarks,
without a single line of code change. The second case presents configurable a FPGA-based acceleration strategy
applied to an online adtech ETL application. This approach generates speedups of nearly 7x, while simultaneously
solving an unbounded delay issue for the unaccelerated code.
Talk #3: Improving Application and Cluster Performance for Big Data Stacks, by Kunal Agarwal, Unravel Data
Long gone are the days of the piecemeal approach to Big Data… at least we’d like them to be. Logs are hard to interpret, and they don’t provide a comprehensive record of what happens on the Big Data stack. And each application on the stack may create a hundred tasks, and hence a hundred data streams. Logs are also historical by nature and difficult to use to pinpoint problems in real-time, while infrastructure monitoring can provide low-level visibility. For example, they can give you a CPU utilization graph, or a network I/O graph, but won’t tell you a lot about applications. What’s missing is an intelligent 360 degree “view” into the entire Big Data stack. In this session, we will explore new ways to monitor and optimize your Big Data app performance, resource utilization and data management in the age of DataOps. The session will include use cases from companies that made improvements using these techniques including Box and Autodesk.
SPEAKER BIOS
• Bhooshan Mogal is Product Manager at Cask, where he is working on making data application development fun and simple. Before Cask, he worked on a unified storage abstraction for Hadoop at Pivotal and personalization systems at Yahoo.
• Bishwa Roop Ganguly is a solutions architect at Bigstream. He has a PhD in Electrical Engineering from MIT, and an MS and BS in Computer Science from University of Illinois and University of California at Berkeley, respectively. He has published extensively in the field of parallel processing and computer networks. He also has 5 years experience as a Data Scientist using Hadoop, Spark and SQL.
• Kunal Agarwal is the CEO and Co-Founder of Unravel Data. He also founded Yuuze.com in 2010, a pioneer in personalized shopping recommendations. Prior to this, Kunal was a business development manager for Oracle products. Kunal holds a Bachelors in Computer Engineering from Valparaiso University, Indiana and an M.B.A from The Fuqua School of Business, Duke University, NC.
ARRIVAL AND PARKING
Cask HQ is a few minutes walk from the California Avenue Caltrain Station.
Also, Cask HQ has its own parking lot, but it will certainly not accommodate all guests. Please use parking lots available nearby:
https://a248.e.akamai.net/secure.meetupstatic.com/photos/event/5/b/2/f/600_438983343.jpeg

BDAM 3/29: ECA for IoT Rules Engine, Spark Processing & Big Data App Performance