Advanced Cassandra: Intro to PySpark


Details
For this meetup we will be joined by Jon Haddad, Technical Evangelist for Apache Cassandra at DataStax. Jon will go over Cassandra+ PySpark.
What You'll Learn At This Meetup:
If you're already using Cassandra you're already aware of it’s strengths of high availability and linear scalability. The downside to this power is less query flexibility. For an OLTP system with an SLA this is an acceptable tradeoff, but for a data scientist it’s extremely limiting.
Enter Apache Spark. Apache spark complements an existing Cassandra cluster by providing a means of executing arbitrary queries, filters, sorting and aggregation. It’s possible to use functional constructs like map, filter, and reduce, as well as SQL and DataFrames.
In this presentation I’ll show you how to process Cassandra data in bulk or through a Kafka stream using Python. Then we’ll visualize our data using iPython notebooks, leveraging Pandas and matplotlib.
This is an advanced talk. We will assume existing knowledge of Cassandra and CQL.
About Jon Hadded:
Jon has 10 years experience in both development and operations working at startups in southern California. For the last 2 years he's been a committer to cqlengine, the Python object mapper for Cassandra. He's now a Technical Evangelist at Datastax, continuing to focus on advancing Cassandra in the Python community.
- Food and drink will be served, hope to see you all there!
*Big thank you to Hulu for hosting

Advanced Cassandra: Intro to PySpark