Toronto Apache Spark #20


Details
Agenda:
6:30PM to 7:00PM - Opening and networking
7:00PM to 8:30PM - RDD, DataFrame and Dataset – Comparing API & Performance Benchmarks by Eyal Edelman (http://www.linkedin.com/in/eedelman)
8:30PM to 9:00PM - Networking
Title: RDD, DataFrame and Dataset – Comparing API & Performance Benchmarks
Description:
Spark is continually advancing in capabilities. Originally the only distributed collections offered by Spark was RDD. Since then DataFrame and Dataset were also introduced.
In this presentation we will review the three Spark distributed collations, review their API differences and also compare their performance in Spark versions 1.6 and 2.1.
Target audience: Data Scientist, Data Engineer, Data Analyst and Spark Developers
Level: Intermediate to Advance
Speaker: Eyal Edelman (http://www.linkedin.com/in/eedelman) is the Big Data Practice Lead and senior consultant at SWI. He is a Big Data Architect, Spark expert and a Microsoft Certified Solution Expert in Business Intelligence. Eyal has extensive experience in optimizing both SQL and Big Data solutions. He holds a Bachelor’s degree in Computer Science, a Master’s in Business Administration (MBA) and a PMP Project Management certification. With over 25 years of experience in architecting and implementing Data Systems, Eyal’s extensive background allows him to be an effective liaison between Business and Technology and deliver top notch technical solutions that provide real business value.
Sponsor:
https://secure.meetupstatic.com/photos/event/b/5/2/d/600_460846381.jpeg

Toronto Apache Spark #20