Skip to content

Apache Kylin on Parquet: Introduction to the New Storage Engine

S
Hosted By
Siddharth A. and Chaitanya D.
Apache Kylin on Parquet: Introduction to the New Storage Engine

Details

Apache Kylin is an open source distributed analytical data warehouse for big data. It was designed to provide OLAP (Online Analytical Processing) capability in the big data era.

By renovating the multi-dimensional cube and pre-calculation technology on Hadoop and Spark, Kylin is able to achieve near-constant query speed regardless of the ever-growing data volume. Reducing query latency from minutes to sub-second, Kylin brings online analytics back to big data.

Previously, Apache Kylin stored cubes in HBase, which has proven to be mature and stable. However, due to the characteristics and limitations of HBase, this solution has some shortcomings. For example, query performance is not as good when dealing with complex business scenarios. You have to put in extra effort to convert cubes to HFile and load them to HBase, which makes building jobs slow. It's always a big challenge to maintain a stable HBase cluster.

Apache Kylin proposed a new storage engine based on Parquet. In this session, Kaige is going to dive deep into how Apache Kylin designed and implemented this new storage engine and analyze the pros and cons. He will also share some benchmark results between the HBase engine and the new Parquet engine.

SPEAKER: Kaige
SPEAKER BIO: Kaige is a Senior Solutions Architect at Kyligence where he works on building the next-generation big data analytics platform. Previously, he worked on the OpenStack and Bluemix team at IBM, focusing on cloud computing and virtualization technology. Kaige loves the open source community and is an active Apache Kylin committer.

Big Data Bellevue Meetup was created by Intelius and takes place in downtown Bellevue. Intelius provides the only centralized service for delivering comprehensive information about people, places, organizations, and their connection to each other. Our state-of-the-art big data technology platform is utilized across a wide range of industries to implement specific solutions.

On the third Wednesday of each month, we invite an industry leader in Big Data to give a presentation followed by a lively discussion on big data technology and its impact on business world. Past speakers include researchers from the University of Washington, as well as senior members of various companies, such as Microsoft, Amazon, eBay, IBM, MapR and inome.

The online event link will be provided closer to the event date.

Password for online event link is 249611

Photo of Big Data Bellevue (BDB) group
Big Data Bellevue (BDB)
See more events