Getting Hot and Cold with Spark and Big Data


Details
Are you working on IoT or web scale clickstream processing solutions? Got megabytes, tens of megabytes or even hundreds of megabytes of small data coming at you? Per second? If you answered yes to any of these, awesome, this session is for you.
We will introduce the Lambda Architecture for Big Data, and walk thru a cloud based reference architecture that answers the questions of: How best should you ingest all that data? What can you do with the data in near real-time now that you have it (the hot path), and how you should go about keeping it for larger scale and future analysis (the cold path)?
While you are on your way to building the next twitter or solving the world’s energy crisis with a massively successful IoT platform, understanding how to approach storage with data lakes, multi-consumer queues, block storage and HDFS is table stakes.
Having all the data is one thing, but being able to apply the right computation at the right latency is another challenge all together. This is where Apache Spark shines. We’ll show how you can leverage Spark in three different latency contexts:
· The hot path processing data in micro batches to performing streaming analytics, and you need results in seconds
· The cool where you perform interactive queries using SQL to explore and shape your data, and want results in minutes
· The cold path where you combine imperative parallel processing programs with SQL queries to churn thru massive data sets, and can wait for results to take hours.
Because Spark offers a framework for distributed processing (Spark Core), and augments that with SQL based querying (Spark SQL) as well as stream micro batch processing (Spark Streaming) and provides integrated machine learning (Spark ML & Spark MLlib)-- it is Spark that is at the heart of your lambda architecture.
With all of these great components, understanding the solution and how the parts fit together is paramount. Choose poorly and your solution will cost too much, be a burden on your developers or will ultimately collapse underneath the volume of data. Choose wisely, and you are well on your way to data processing nirvana. Choose wisely and attend this session.
Presenters:
Peter Chen (Director of Data Science, Algebraix Data)
Zoiner Tejada (Chief Technology Officer, Algebraix Data)
Date & Time:
Tuesday August 16th, 2016
6:00pm - 8:00pm
Pizza, Beverages, and Conversation 6:00-6:30pm
Presentations 6:30-7:30pm
Questions and Conversation 7:30 - 8:00
Location:
Mintz, Levin, Cohn, Ferris, Glovsky and Popeo, P.C.
3580 Carmel Mountain Road, Suite 300 | San Diego, CA 92130 (https://www.google.com/maps/place/Mintz+Levin+Cohn+Ferris/@32.9203569,-117.2281407,16z/data=!4m8!1m2!2m1!1s3580+Carmel+Mountain+Road,+Suite+300+%7C+San+Diego,+CA+92130!3m4!1s0x80dc0645a7732129:0xd3cc3dba18cd57c8!8m2!3d32.9194628!4d-117.2343922)

Getting Hot and Cold with Spark and Big Data