Deep Dive: Spark SQL+DataFrames Cassandra Connector Directly from DataStax


Details
****************************************************
NOTE: THIS EVENT WILL BE RECORDED AND LIVE STREAMED.
https://livestream.com/sparktc/datastax092115
****************************************************
Scheduled the night before the start of the Cassandra Summit.
(New) Location: DataStax's Santa Clara office just 1.5 mi from the old location at the Santa Clara Convention Center.
Given their prices on food and booze, the Convention Center does not appear to be affected by the economic crisis in China like the rest of the world... so thanks, DataStax, for saving us a ton of USD!!
Overview
We've asked the DataStax authors of the spark-cassandra-connector (https://github.com/datastax/spark-cassandra-connector) to join us to give a deep dive into the details of this implementation of the Spark SQL Data Sources API.
Many DataStax engineers will be in town, so please come prepared with questions, concerns, and general mockery.
Highlights of the spark-cassandra-connector
-
Token-ring aware data locality when co-located with Spark Worker nodes
-
Pushdown filter support for optimal performance and participation in the advanced Spark SQL Catalyst Query Optimizer
-
Spark 1.4, Spark 1.5 DataFrame support
Rough Agenda
7-7:15pm: Introductions and Announcements
7:15-7:45pm: Spark SQL Data Sources API Overview (Fregly)
7:45-8:30pm: Details of the spark-cassandra-connector Data Sources API implementation (Russell, Ryan, and others)
Related Links
- Overview of the Data Sources API
https://www.youtube.com/watch?v=uxuLRiNoDio
- The spark-cassandra-connector (https://github.com/datastax/spark-cassandra-connector) is an implementation of the Spark SQL DataSources API similar to the following:
https://github.com/databricks/spark-csv
https://github.com/databricks/spark-avro
- Examples of the spark-cassandra-connector in action:
https://github.com/killrweather/killrweather
https://github.com/fluxcapacitor/pipeline
- Spark SQL Data Sources API
http://blog.madhukaraphatak.com/anatomy-of-spark-dataframe-api/

Deep Dive: Spark SQL+DataFrames Cassandra Connector Directly from DataStax