Skip to content

New Functions and Workflow Examples with Spark DataFrames

Photo of George Chow
Hosted By
George C. and Rich I.
New Functions and Workflow Examples with Spark DataFrames

Details

The use of DataFrames in Spark (introduced in Spark 1.3) is becoming more powerful with every subsequent release. While the use of RDDs is still a very effective way of processing data in Spark, the use of DataFrames confers advantages such as improved readability and faster processing. The functions work very well in pipelines and are easy to reason about since they provide better semantic information about data transformations. The speed benefit arises from DataFrame operations internally representing logical plans. Before execution, such plans are optimized by Catalyst and transformed into physical plans. Is it likely that DataFrames will become the primary API for most users, so it’s now a great time to dive into processing data with DataFrames in Spark.

Rich Iannone will provide an overview of many of the functions introduced from Spark versions 1.3 to 1.6. Because examples are a great way to quickly gain understanding, I’ll provide plenty of those. These will be practical examples that are easily applicable to a variety of data transformation tasks. There will be a live-coding session to see Spark DataFrames used in step-by-step analyses. This will provide an opportunity for Q&As and other participation amongst the Meetup members. I’m really excited to show you all the great new additions to the DataFrame API so I hope to see you all at the session!

Photo of Vancouver Apache Spark Meetup group
Vancouver Apache Spark Meetup
See more events
"The Aquarium" at Plenty of Fish
25th Floor, 555 West Hastings (Harbour Centre) · Vancouver, BC