Hadoop disrupted decades of data management practices and technologies by introducing an Open Source massively parallel processing framework. The Hadoop community and the component ecosystem it has developed have been an unqualified success.
The widely anticipated Apache Spark project is the newest addition to that ecosystem.
"The Spark buzz keeps increasing; almost everybody I talk with expects Spark to win big, probably across several use cases."
-- Monash Research 3/17/14
"Spark is on the rise, to an even greater degree than I thought last month"
-- Monash Research 4/30/14
The Spark software stack includes:
Spark - the core data-proccessing engine
Shark - interface for interactive querying
Spark Streaming - for streaming data analysis
MLib - for machine learning
GraphX - for graph analysis
Spark is quickly establishing itself as a leading environment for doing fast, iterative in-memory and streaming analysis.
This talk will give an introduction to the Spark stack, explain how Spark achieves lighting fast results, and how it complements your existing Apache Hadoop investment.
We're pleased to welcome back our good friend Keys Botzum for this talk. Keys is Senior Principal Technologist with MapR Technologies, where he wears many hats. His primary responsibility is interacting with customers in the field, but he also teaches classes, contributes to documentation, and works with engineering teams. He has over 15 years of experience in large scale distributed system design. Previously, he was a Senior Technical Staff Member with IBM, and a respected author of many articles on the WebSphere Application Server as well as a book.
6:00 - Food, socializing, networking...
6:30 - Presentation
8:00 - More networking at a location TBD