In this talk, Dan Swain, a Data Engineer at Pandora, will give an overview of Apache Spark. Spark is an open-source distributed general-purpose cluster-computing framework. One of Spark’s greatest strengths is that it provides a consistent programming framework across many compute platforms and storage mechanisms. Spark code looks the same whether you’re running it on Hadoop, Kubernetes, AWS, or your laptop.
This talk will include examples in both Scala and Python (PySpark).