Spark 2.0 (pySpark) made easy - Hands on Code
Details
Hello again!
We are announcing the next meetup on Wed, April 19th.
This time we are going to be diving into Spark 2.0, especifically:
-
Loading data from Azure Storage and Azure Data Lake Store, what's best.
-
Dataframes Functions vs Spark SQL
-
Data formats: CSV vs PARQUET vs JSON vs ORC
-
Persistance of Hive Tables using Azure SQL as the metastore
-
Example of an ETL process with a public dataset
You can bring your own laptop and use your Azure account to follow the example and get the most of it.
See you there!