Hello, Spark! An introduction to Apache Spark using PySpark in the Cloud


Details
During the next event in our “Hello, “ series of introductory Open Source Software and Big Data topic meetups, we’re going to introduce you to Apache Spark using PySpark, which is an interface for using Spark from the Python programming language. We won’t assume that you’re already steeped in the world of large-scale data analytics or that you are familiar with the difference between DataFrames and Datasets (if you can rattle off the pros and cons of DataFrames vs Datasets, this session isn't for you).
This presentation will cover the basics of what problems Apache Spark was designed to solve, why and when to use Spark given the universe of problems you may encounter on a daily basis performing data engineering and data science tasks, and give you an overview of a few of the key abstractions that make Spark perform well for both efficient use of computing hardware and a programmer or data scientists’ time. We’ll also touch on what makes running Spark on kubernetes a great way to effectively maximize the utilization of cloud computing resources.
We’ll show you, live, a demonstration of how easy it is to run a PySpark job in the public cloud using the Data Science Workbench and Cloudera Data Engineering Products.
Join Field Chief Technical Officer Carolyn Duby and Solutions Engineer Suyash Ramineni, both of Cloudera, and get acquainted with Apache Spark. We are looking forward to seeing you there!
This is still a tricky time for public gatherings, but Future of Data is committed to providing great tech content & facilitating discussions in the "Big Data" space. Our group in Boston, Massachusetts is holding this event; in order to do our part to fight the spread of COVID-19's Omicron variant, this will be an exclusively online event originating in Eastern Standard Time (the event time displayed on this page will reflect the equivalent local time). We thought it might be of interest to our wider membership (you are welcome to sign up for it here).

Sponsors
Hello, Spark! An introduction to Apache Spark using PySpark in the Cloud