India Open Source Data Infrastructure Meetup - March 2024


Details
- Are you interested in learning more about open-source data technologies? ✅
- Do you want to network with local and international tech professionals in a fun, relaxed environment? ✅
Then join us on March 2nd, for inspiring conversations and exciting talks. Venue: Zeta Suite, Domlur, Bangalore.
Agenda:
- 11:00 - 11:30 Welcome: Networking & refreshments
- 11:30 - 11:40 Kickoff: Welcome from Aiven
- 11:40 - 12:00 Saving a Million Dollars with ClickHouse: Zomato’s Logging Migration Journey - Anmol Virmani & Palash Goel, Zomato
- 12:05 - 12:25 Choosing Argo Workflows Over Airflow in a Distributed Environment - Ekansh Gupta, Zeta Suite
- 12:30 - 14:00 Lunch & networking
---
Talk 1: Saving a Million Dollars with ClickHouse: Zomato’s Logging Migration Journey
In a bold and transformative move, Zomato undertook the migration of their logging setup from ELK to ClickHouse. The results were nothing short of extraordinary. Not only did Zomato successfully handle a staggering 150 million logs per minute and an impressive 50 terabytes of uncompressed logs per day, but they also achieved remarkable cost savings exceeding a million dollars per year.
What’s truly remarkable is that despite this massive scale, Zomato achieved exceptional performance. Ingestion lag was reduced to less than 5 seconds, ensuring near real-time access to critical log data. Moreover, the P99 query time clocked in at less than 10 seconds, guaranteeing swift and responsive access to insights.
With an impressive 12,000 queries served in a day, Zomato’s migration to ClickHouse has truly redefined their logging capabilities, all while unlocking substantial cost savings.
Talk 2: Choosing Argo Workflows Over Airflow in a Distributed Environment
The constant evolution of microservice architecture in our distributed environment as an organisation has emphasised the importance of workflow orchestration tools such as Argo Workflows or Apache Airflow. Although every workflow tool covers all the basic necessities and features, handling these at scale is a different problem statement altogether. Kubernetes makes it easy to use numerous packages for large data jobs in distributed environments, and Argo Workflows is the better way to run pipelines over Kubernetes. This session intends to demonstrate how to orchestrate pipeline jobs with Argo Workflows, from the architecture to resource and workflow definitions. Additionally, I'll show how Argo Workflows and Kubernetes provide distinct scaling advantages for any data pipeline users by running some example jobs. I will also discuss why Argo Workflows is the go-to choice for expanding teams distributed over multiple regions and zones, removing the dependency over a particular tech stack.

India Open Source Data Infrastructure Meetup - March 2024