How do you rapidly derive complex insights on top of really big data sets in Cassandra? This session draws upon Evan's experience building a distributed, interactive, columnar query engine on top of Cassandra and Spark. We will start by surveying the existing query landscape of Cassandra and discuss ways to integrate Cassandra and Spark. We will dive into the design and architecture of a fast, column-oriented query architecture for Spark, and why columnar stores are so advantageous for OLAP workloads. I will present a schema for Parquet-like storage of analytical datasets onCassandra. Find out why Cassandra and Spark are the perfect match for enabling fast, scalable, complex querying and storage of big analytical data.
About the Speaker:
Evan loves to design, build, and improve bleeding edge distributed data and backend systems using the latest in open source technologies. He has led the design and implementation of multiple big data platforms based on Storm, Spark, Kafka, Cassandra, and Scala/Akka, including a columnar real-time distributed query engine. He is an active contributor to the Apache Spark project and co-creator of the open-source Spark Job Server. He is a big believer in GitHub, open source, and meetups, and have given talks at various conferences including the Spark Summit and Cassandra Summit. He has Bachelor's and Master's degrees in Electrical Engineering from Stanford University.
Use the 4th ave entrance, up to 3rd floor and take elevator to 16
6:15 to 6:45 : Networking / food & drinks
6:45 to 7:30 : Main session
7:30 to 8:00 : Open Q&A & Wrap up