Big data meetup hosted by LinkedIn


Details
Date: Oct 12th, 2023
Time: 3pm - 5pm IST
Venue: This is a virtual event. Please join the event via - [https://linkedin.zoom.us/s/93704902295](https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flinkedin.zoom.us%2Fs%2F93704902295&data=05%7C01%7Cddevaiah%40linkedin.com%7C2273f1b852c3427c6cca08dbc0c08960%7C72f988bf86f141af91ab2d7cd011db47%7C0%7C0%7C638315705394124152%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=cGbh01o2SgaHgmTlf%2BhvyVBhNDZUiOL7aDgViVa6Uro%3D&reserved=0)
Contact: Srikanth Sundarrajan (srsundarrajan@linkedin.com)
Agenda:
3:00 - Event opens / Check-in
3:10 - Data Ingestion @ LinkedIn (Bhupendra Kumar Jain, LinkedIn)
3:40 - Unlocking The Power Of Spark On Kubernetes With Yunikorn (Krishna Birla & Amogh Desai, Cloudera)
4:05 - Operating Kafka At Uber Scale (Abhijeet Kumar & Nikin Raagav, Uber)
4:30 - Offline Compute Infrastructure @ LinkedIn (Varun Saxena, LinkedIn)
5pm - Event closes
Data Ingestion @ LINKEDIN
This talk will navigate through various data ingestion pipelines enabling continuous stream of data available into offline data lake at LinkedIn. The talk will deep dive into the current data ingestion ecosystem at LinkedIn , the various challenges around operating at exabyte scale of data and ensuring data quality & compliance. It also talks about our plan to revamp the data ingestion ecosystem to meet the ever growing data needs.
Speaker: Bhupendra Kumar Jain (Sr Staff Engineer, LinkedIn)
Unlocking the Power of Spark on Kubernetes with Yunikorn
Learn how Apache Yunikorn helped solving various challenges with running Spark on Kubernetes.
Speaker: Krishna Birla (Sr Software Engineer, Cloudera), Anmogh Desai (Software Engineer, Cloudera)
Operating Kafka at Uber Scale
Uber operates one of the world's largest Kafka deployments, with numerous nodes serving various purposes. These Kafka clusters handle trillions of messages daily, resulting in substantial data ingestion. Kafka plays essential roles in inter-service messaging, database changelog transport and data lake management, storing critical business data such as billing and payment records.
Kafka is categorized as a tier-0 technology and guarantees a 99.99% data durability service level objective (SLO). The availability of Kafka services depends directly on cluster node availability. Currently, the clusters use outdated and deteriorating old SKU nodes, leading to frequent disk failures and node replacements. Simultaneous node failures can create offline partitions, potentially risking data loss. Additionally, these old SKUs are nearing the end of their operational life. To ensure uninterrupted business operations, it's vital to migrate topics and partitions to more reliable and high-performance servers. In this talk, we will go over our recent experience of upgrading Kafka Infrastructure at Uber.
Speakers: Abhijeet Kumar (Staff Engineer, Uber), Nikin Raagav (Sr Software Engineer, Uber)
Offline Compute Infrastructure @ LinkedIn
At LinkedIn, every day we're processing several petabytes of data in millions of containers and solving challenging problems of scale and efficiency. This talk is about offline compute infrastructure at LinkedIn and how we scale, run and operate one of the largest compute clusters in the world.
Speakers: Krishan Goyal (Staff Engineer, LinkedIn), Aditya Sharma (Staff Engineer, LinkedIn)

Big data meetup hosted by LinkedIn