DoK Town Hall : Running Batch Data Workloads in Kubernetes at Dish Network


Details
We are the Data on Kubernetes Community (DoKC), where end users go to share best practices for running data workloads on Kubernetes.
We're excited to host our monthly DoKC Town Hall virtual event in August 2024! This is an event to bring the community together to meet each other, share end-user journey stories, DoK-related projects and technologies, and keep you up-to-date on community events and ways to participate. Meetings will be held on the third Thursday of each month at 10am PT.
AGENDA
[10:00 AM]
Welcome and Community Updates
Presented by Paul Au, Head of Community
[10:05 AM]
Running Batch Data Workloads in Kubernetes
Fast and reliable data processing is critical for data-driven business decision-making. The exponential growth of data volume presents significant challenges in terms of resource efficiency, cost-effectiveness, and fault tolerance. This talk covers an approach to optimizing batch data processing at scale on Amazon Elastic Kubernetes Service (EKS).
I will cover and demo a high-level overview of an architecture that leverages a tight relationship between the following cutting-edge technologies and how they collectively optimize resource allocation, storage performance, and job scheduling:
- Apache Spark Operator for managing spark applications
- Apache YuniKorn for scheduling all the pods of a Spark application simultaneously.
- Karpenter for node autoscaling
- Argo Workflows for Spark job orchestration
[10:55 AM]
DoK Quiz

DoK Town Hall : Running Batch Data Workloads in Kubernetes at Dish Network