Skip to content

Advanced Cassandra: Intro to PySpark

Photo of Lina Tran
Hosted By
Lina T.
Advanced Cassandra: Intro to PySpark

Details

For this meetup we will be joined by Jon Haddad, Technical Evangelist for Apache Cassandra at DataStax. Jon will go over Cassandra+ PySpark.

What You'll Learn At This Meetup:

If you're already using Cassandra you're already aware of it’s strengths of high availability and linear scalability. The downside to this power is less query flexibility. For an OLTP system with an SLA this is an acceptable tradeoff, but for a data scientist it’s extremely limiting.

Enter Apache Spark. Apache spark complements an existing Cassandra cluster by providing a means of executing arbitrary queries, filters, sorting and aggregation. It’s possible to use functional constructs like map, filter, and reduce, as well as SQL and DataFrames.

In this presentation I’ll show you how to process Cassandra data in bulk or through a Kafka stream using Python. Then we’ll visualize our data using iPython notebooks, leveraging Pandas and matplotlib.

This is an advanced talk. We will assume existing knowledge of Cassandra and CQL.

About Jon Hadded:

Jon has 10 years experience in both development and operations working at startups in southern California. For the last 2 years he's been a committer to cqlengine, the Python object mapper for Cassandra. He's now a Technical Evangelist at Datastax, continuing to focus on advancing Cassandra in the Python community.

  • Food and drink will be served, hope to see you all there!

*Big thank you to Hulu for hosting

Photo of Los Angeles Cassandra Users group
Los Angeles Cassandra Users
See more events
Hulu
2500 Broadway St. Suite 200 · Santa Monica, CA