Traveloka: How We Run Cloud-Scale Apache Spark in Production Since 2017

Name: Traveloka: How We Run Cloud-Scale Apache Spark in Production Since 2017
Start: 2019-05-29T18:45:00+08:00
End: 2019-05-29T21:00:00+08:00
Location: Traveloka Services Pte. Ltd.

Hosted by Arseny C. and Omar

Ecosystem Community for Private Equity Backed Companies

Details

Weather so hot, lah! Prevailing inter-monsoon conditions! So-o-o-o... let's gather indoors at our host, one of top South-East Asia's Unicorn Startups, Traveloka!

Two talks:

"How Traveloka's Runs Cloud-Scale Apache Spark in Production Since 2017" - in this Level 301 knowledge transfer, Traveloka's Data Engineering and Data Science team will share how the staff submit their cloud-scale Spark jobs today. Discussion of pros/cons, integration of Apache Spark with CI/CD components, Schedulers, Airflow, Key Management Systems (KMS), templates. Journey will start at historic event of a self-managed Spark cluster on-premise, and talk through adoption of AWS EMR, Qubole, Databricks, and Dataproc. How multiple back-end data sets helped transform Traveloka from meta-search engine to fully integrated On-Line Travel Booking agency, and one of top Indonesian Unicorn startups!
"Building Robust Production Data Pipelines with Databricks Delta" - (optional hands-on experience: prepare laptop with Chrome/Firefox browser and register on Databricks Community Edition). Following open-source announcement of Delta Lake, this walk-through will prove insights on how Delta.io employs co-designed compute and storage and how it is compatible with Spark API’s. Delta Lakes power high data reliability and query performance to support big data use cases, from batch and streaming ingests, fast interactive queries to machine learning. This tutorial will discuss requirements of modern data pipelines, the challenges data engineers face when it comes to data reliability and performance and how Delta can help. Through presentation, code examples and notebooks will be shared.

Speakers for Talk #1:
• Nisrina Luthfiyati joined Traveloka Data Team as as Software Engineer in 2014. She has been (and still is) working on the various infrastructures, platforms, and libraries that make up Traveloka's data processing pipelines and storages.
• Andri Lauw joined Traveloka in 2018. He is currently working on various data infrastructure and platform in Traveloka and also worked on similar thing within his previous role. His main interests lie in distributed systems, and at present deal extensively on end-to-end general data development/processing platform.
• Didik Achmadi joined Traveloka in 2017. He currently manage few teams working on a number of areas, ranging from cloud management, data infrastructure, and data engineering works related to all business units within traveloka. Previously, he was working on various engineering team for visual effects/animation, telco, and healthcare industry.

Speaker for Talk #2: Arseny Chernov, joined Databricks in 2018, and is APJ leader for Partner Solutions Architecture, based out of Singapore. Acting as a customers’ conduit to Corporate Headquarters of Databricks, Arseny supports tactical and strategic initiatives in complex data and cloud environments with all available resources and knowledge, - for the business benefits and the best user experiences.

Ecosystem Community for Private Equity Backed Companies

Traveloka: How We Run Cloud-Scale Apache Spark in Production Since 2017

Ecosystem Community for Private Equity Backed Companies

Details

Related topics

You may also like