Skip to content

Toronto Apache Spark #20

Photo of Mehrdad Pazooki
Hosted By
Mehrdad P.
Toronto Apache Spark #20

Details

Agenda:

6:30PM to 7:00PM - Opening and networking

7:00PM to 8:30PM - RDD, DataFrame and Dataset – Comparing API & Performance Benchmarks by Eyal Edelman (http://www.linkedin.com/in/eedelman)

8:30PM to 9:00PM - Networking

Title: RDD, DataFrame and Dataset – Comparing API & Performance Benchmarks

Description:

Spark is continually advancing in capabilities. Originally the only distributed collections offered by Spark was RDD. Since then DataFrame and Dataset were also introduced.

In this presentation we will review the three Spark distributed collations, review their API differences and also compare their performance in Spark versions 1.6 and 2.1.

Target audience: Data Scientist, Data Engineer, Data Analyst and Spark Developers

Level: Intermediate to Advance

Speaker: Eyal Edelman (http://www.linkedin.com/in/eedelman) is the Big Data Practice Lead and senior consultant at SWI. He is a Big Data Architect, Spark expert and a Microsoft Certified Solution Expert in Business Intelligence. Eyal has extensive experience in optimizing both SQL and Big Data solutions. He holds a Bachelor’s degree in Computer Science, a Master’s in Business Administration (MBA) and a PMP Project Management certification. With over 25 years of experience in architecting and implementing Data Systems, Eyal’s extensive background allows him to be an effective liaison between Business and Technology and deliver top notch technical solutions that provide real business value.

Sponsor:

https://secure.meetupstatic.com/photos/event/b/5/2/d/600_460846381.jpeg

Photo of PipelineAI Advanced Spark and TensorFlow Meetup (Toronto) group
PipelineAI Advanced Spark and TensorFlow Meetup (Toronto)
See more events
Ramada Plaza Toronto
300 Jarvis Street, Toronto, Ontario, M5B 2C5 · Toronto, ON