Skip to content

[SF]Deep Dive: Spark SQL+DataFrames+Data Sources API+Parquet+Cassandra Connector

Photo of Chris Fregly
Hosted By
Chris F.
[SF]Deep Dive: Spark SQL+DataFrames+Data Sources API+Parquet+Cassandra Connector

Details

Overview

Come join us for a deep dive into the details of the spark-cassandra-connector (https://github.com/datastax/spark-cassandra-connector).

This implementation of the Spark SQL Data Sources API is one of the most advanced and performance-tunable connectors available.

Highlights of the spark-cassandra-connector

  1. Token-ring aware data locality for co-location with Spark Worker nodes

  2. Pushdown filter support for optimal performance and participation in the advanced Spark SQL Catalyst Query Optimizations

  3. Spark 1.4, Spark 1.5 DataFrame support

  4. Enables single Cassandar data store to serve both your transactional and analytics needs (pros and cons to this)

Rough Agenda

7-7:15pm: Introductions and Announcements

7:15-7:30pm: Highlights from Strata NYC

7:30-8:00pm: Spark SQL Data Sources API Overview

8:00-8:30pm: Details of the spark-cassandra-connector Data Sources API implementation

Related Links

  1. Overview of the Data Sources API

https://www.youtube.com/watch?v=uxuLRiNoDio

  1. The spark-cassandra-connector (https://github.com/datastax/spark-cassandra-connector) is an implementation of the Spark SQL DataSources API similar to the following:

https://github.com/databricks/spark-csv

https://github.com/databricks/spark-avro

  1. Examples of the spark-cassandra-connector in action:

https://github.com/killrweather/killrweather

https://github.com/fluxcapacitor/pipeline

  1. Spark SQL Data Sources API

http://blog.madhukaraphatak.com/anatomy-of-spark-dataframe-api/

http://www.river-of-bytes.com/2014/12/filtering-and-projection-in-spark-sql.html

https://github.com/spirom/LearningSpark

http://www.river-of-bytes.com/2014/12/external-data-sources-in-spark-120.html

http://www.river-of-bytes.com/2014/12/filtering-and-projection-in-spark-sql.html

Photo of AI Performance Engineering Meetup (San Francisco, Global) group
AI Performance Engineering Meetup (San Francisco, Global)
See more events