Skip to content

Details

We are the Data on Kubernetes Community (DoKC), where end users go to share best practices for running data workloads on Kubernetes.

We're excited to host our monthly DoKC Town Hall virtual event in August 2024! This is an event to bring the community together to meet each other, share end-user journey stories, DoK-related projects and technologies, and keep you up-to-date on community events and ways to participate. Meetings will be held on the third Thursday of each month at 10am PT.

AGENDA
[10:00 AM]

Welcome and Community Updates
Presented by Paul Au, Head of Community

[10:05 AM]
Running Batch Data Workloads in Kubernetes

Fast and reliable data processing is critical for data-driven business decision-making. The exponential growth of data volume presents significant challenges in terms of resource efficiency, cost-effectiveness, and fault tolerance. This talk covers an approach to optimizing batch data processing at scale on Amazon Elastic Kubernetes Service (EKS).

I will cover and demo a high-level overview of an architecture that leverages a tight relationship between the following cutting-edge technologies and how they collectively optimize resource allocation, storage performance, and job scheduling:

  • Apache Spark Operator for managing spark applications
  • Apache YuniKorn for scheduling all the pods of a Spark application simultaneously.
  • Karpenter for node autoscaling
  • Argo Workflows for Spark job orchestration

[10:55 AM]
DoK Quiz

Containers
Microservices
Data Center and Operations Automation
DevOps
Kubernetes

Members are also interested in