What we're about

The main objective of this meetup, is to help professionals and beginners in Big Data, understand the internals of Apache Spark which will enable them to debug and optimize their Spark Jobs.

We will be organising webinars that would take a deep dive into different aspects of the distributed systems, implemented in Spark like

- Distributed memory management
- DAG Execution
- Remote Procedure Call
- Job scheduling
- Compression codecs
- Partitioning of data
- Resource allocation for the Spark Jobs

And many more ...

Upcoming events (1)

Spark's Cluster Manager

Needs a location

Hello Everyone, We invite you to learn and share knowledge bytes that will help the community to build expertise in Apache Spark. Speaker: Shad (Software Engineer - Big Data @Expedia) Title: Spark’s Cluster Manager Abstract: Spark has almost taken over the world of Big Data and this is a great time to learn how Spark achieved it. In this webinar, we will take a deep dive into the resource manager component of Spark, which is responsible for providing resources for execution of Jobs in a fault tolerant manner. Spark has in-built cluster manager, and it also supports other cluster managers like YARN, Mesos and Kubernetes. This meetup will only cover the Spark’s Standalone Cluster/Resource Manager. Agenda for the Meetup: 11:00 — Overview of Spark’s infrastructure 11:10 — Spark’s Runtime Model - Executors, Job, Stages & Tasks 11:25 — Resource Scheduling Algorithm 11:30 — Demo (Covers Spark Standalone cluster manager) 11:45 — Questions and Answers Location: Online webinar. The link for the webinar will be shared on 2nd August, 2019 here. Cost: FREE Timing: 11:00 AM to 12:15 PM

Photos (4)