[SF]Deep Dive: Spark SQL+DataFrames+Data Sources API+Parquet+Cassandra Connector

![[SF]Deep Dive: Spark SQL+DataFrames+Data Sources API+Parquet+Cassandra Connector](https://secure.meetupstatic.com/photos/event/6/0/c/b/highres_527184779.jpeg?w=750)
Details
Overview
Come join us for a deep dive into the details of the spark-cassandra-connector (https://github.com/datastax/spark-cassandra-connector).
This implementation of the Spark SQL Data Sources API is one of the most advanced and performance-tunable connectors available.
Highlights of the spark-cassandra-connector
-
Token-ring aware data locality for co-location with Spark Worker nodes
-
Pushdown filter support for optimal performance and participation in the advanced Spark SQL Catalyst Query Optimizations
-
Spark 1.4, Spark 1.5 DataFrame support
-
Enables single Cassandar data store to serve both your transactional and analytics needs (pros and cons to this)
Rough Agenda
7-7:15pm: Introductions and Announcements
7:15-7:30pm: Highlights from Strata NYC
7:30-8:00pm: Spark SQL Data Sources API Overview
8:00-8:30pm: Details of the spark-cassandra-connector Data Sources API implementation
Related Links
- Overview of the Data Sources API
https://www.youtube.com/watch?v=uxuLRiNoDio
- The spark-cassandra-connector (https://github.com/datastax/spark-cassandra-connector) is an implementation of the Spark SQL DataSources API similar to the following:
https://github.com/databricks/spark-csv
https://github.com/databricks/spark-avro
- Examples of the spark-cassandra-connector in action:
https://github.com/killrweather/killrweather
https://github.com/fluxcapacitor/pipeline
- Spark SQL Data Sources API
http://blog.madhukaraphatak.com/anatomy-of-spark-dataframe-api/
http://www.river-of-bytes.com/2014/12/filtering-and-projection-in-spark-sql.html
https://github.com/spirom/LearningSpark
http://www.river-of-bytes.com/2014/12/external-data-sources-in-spark-120.html
http://www.river-of-bytes.com/2014/12/filtering-and-projection-in-spark-sql.html

[SF]Deep Dive: Spark SQL+DataFrames+Data Sources API+Parquet+Cassandra Connector