Apache Kylin: Achieve Exact COUNT DISTINCT with Sub-Second Latency at PB Scale


Details
Virtual Session: Please see zoom info for joining below
To kick things off, Chloe Kwon and Kevin Horecka will be telling you briefly, but enthusiastically, about Data in the News! What models, discoveries, and tools were in the news during the month of April and May? Come to the event and find out!
Then, for our main event, we'll hear from Kaige. Kaige is a senior solutions architect at Kyligence, where he works on building the next-generation big data analytics platform. Previously, he worked on the OpenStack and Bluemix team at IBM, focusing on cloud computing and virtualization technology. Kaige loves the open source community and is an active Apache Kylin committer.
With over 450 million customers, Didi (world’s largest rideshare company) conducts complex user behavior analysis on huge datasets daily. Exact Count Distinct is one of Didi’s most critical metrics, but it is known for being computationally heavy and notoriously slow. The difference between exact Count Distinct and approximate Count Distinct can cost Didi millions of dollars. In this talk, Kaige Liu of the Apache Kylin project will explain how Didi uses Apache Kylin to return exact Distinct Count on billions of rows of data with sub-second latency to generate the most accurate picture of its business.
You will also learn about the latest development in modern OLAP technologies. Kaige will share how Didi and Truck Alliance (a truck-hailing company that processes $100 billion worth of goods yearly) use Apache Kylin to power their analytics platforms that allow 100s of analysts to achieve sub-second latency on petabyte-scale data.
We welcome inputs from interested Data Science enthusiasts and professionals on these ideas, techniques, and their implementation as well as any other questions or comments. Our goal is to make this event fun and informative!
Agenda:
7:00 - 7:15 - Data in the News
7:15 - 8:00 - Talk + Q&A
8:00 - 8:30 - Discussion !
The event is free, and everyone is welcome.
Zoom info:
Join from Browser: https://walmart.zoom.com/wc/join/3310261107
iPhone one-tap: (US Toll): +13462487799,3310261107# or +16699006833,3310261107#
Meeting URL: https://walmart.zoom.us/j/3310261107
Join by Telephone
To avoid Toll/Toll-free and Call-back audio bridge charges, please join this meeting using these two options:
• “Call Using Computer” option with your headset
• Video Conferencing devices in the meeting rooms
Dial:
+1 346 248 7799 (US Toll)
+1 669 900 6833 (US Toll)
+1 253 215 8782 (US Toll)
+1 301 715 8592 (US Toll)
+1 312 626 6799 (US Toll)
+1 646 558 8656 (US Toll)
888 475 4499 (US Toll Free)
877 853 5257 (US Toll Free)
Meeting ID: 331 026 1107

Apache Kylin: Achieve Exact COUNT DISTINCT with Sub-Second Latency at PB Scale