Skip to content

Deep Dive: Spark SQL+DataFrames Cassandra Connector Directly from DataStax

Photo of Chris Fregly
Hosted By
Chris F.
Deep Dive:  Spark SQL+DataFrames Cassandra Connector Directly from DataStax

Details

****************************************************

NOTE: THIS EVENT WILL BE RECORDED AND LIVE STREAMED.

https://livestream.com/sparktc/datastax092115

****************************************************

Scheduled the night before the start of the Cassandra Summit.

(New) Location: DataStax's Santa Clara office just 1.5 mi from the old location at the Santa Clara Convention Center.

Given their prices on food and booze, the Convention Center does not appear to be affected by the economic crisis in China like the rest of the world... so thanks, DataStax, for saving us a ton of USD!!

Overview

We've asked the DataStax authors of the spark-cassandra-connector (https://github.com/datastax/spark-cassandra-connector) to join us to give a deep dive into the details of this implementation of the Spark SQL Data Sources API.

Many DataStax engineers will be in town, so please come prepared with questions, concerns, and general mockery.

Highlights of the spark-cassandra-connector

  1. Token-ring aware data locality when co-located with Spark Worker nodes

  2. Pushdown filter support for optimal performance and participation in the advanced Spark SQL Catalyst Query Optimizer

  3. Spark 1.4, Spark 1.5 DataFrame support

Rough Agenda

7-7:15pm: Introductions and Announcements

7:15-7:45pm: Spark SQL Data Sources API Overview (Fregly)

7:45-8:30pm: Details of the spark-cassandra-connector Data Sources API implementation (Russell, Ryan, and others)

Related Links

  1. Overview of the Data Sources API

https://www.youtube.com/watch?v=uxuLRiNoDio

  1. The spark-cassandra-connector (https://github.com/datastax/spark-cassandra-connector) is an implementation of the Spark SQL DataSources API similar to the following:

https://github.com/databricks/spark-csv

https://github.com/databricks/spark-avro

  1. Examples of the spark-cassandra-connector in action:

https://github.com/killrweather/killrweather

https://github.com/fluxcapacitor/pipeline

  1. Spark SQL Data Sources API

http://blog.madhukaraphatak.com/anatomy-of-spark-dataframe-api/

Photo of AI Performance Engineering Meetup (San Francisco, Global) group
AI Performance Engineering Meetup (San Francisco, Global)
See more events