Skip to content

Kudu: New Apache Hadoop Storage for Fast Analytics on Fast Data with Todd Lipcon

Photo of Myung Ho Yun
Hosted By
Myung Ho Y.
Kudu: New Apache Hadoop Storage for Fast Analytics on Fast Data with Todd Lipcon

Details

이번에 새롭게 발표한 저장시스템인 Kudu에 관련한 Meetup입니다. 발표자는 Kudu 프로젝트의 책임자이자 메인 설계자인 Todd Lipcon이 진행 예정입니다. 오는 12월 8일 저녁 7시 부터 진행 예정입니다.

7pm ~ 7:30pm: Networking (또는 발표 - 미정)
7:30pm ~ 8:20pm: Todd Lipcon의 Kudu 발표 및 Q&A
8:20pm ~ : Networking, Q&A, 피자/치킨/맥주..

Todd는 Kudu뿐아니라 다수의 오픈소스 프로젝트에 참여하고 있으며, Apache Thrift, HBase, Hadoop Core PMC/Committer이기도 합니다. Todd의 LinkedIn 프로파일을 참조하시기 바랍니다.

https://www.linkedin.com/in/toddlipcon

Over the past several years, the Hadoop ecosystem has made great strides in its real-time access capabilities, narrowing the gap compared to traditional database technologies. With systems such as Impala and Apache Spark, analysts can now run complex queries or jobs over large datasets within a matter of seconds. With systems such as Apache HBase and Apache Phoenix, applications can achieve millisecond-scale random access to arbitrarily-sized datasets.

Despite these advances, some important gaps remain that prevent many applications from transitioning to Hadoop-based architectures. Users are often caught between a rock and a hard place: columnar formats such as Apache Parquet offer extremely fast scan rates for analytics, but little to no ability for real-time modification or row-by-row indexed access. Online systems such as HBase offer very fast random access, but scan rates that are too slow for large scale data warehousing workloads.

This talk will investigate the trade-offs between real-time transactional access and fast analytic performance from the perspective of storage engine internals. It will also describe Kudu, the new addition to the open source Hadoop ecosystem with out-of-the-box integration with Apache Spark, that fills the gap described above to provide a new option to achieve fast scans and fast random access from a single API.

Photo of Korea Big Data Think Tank group
Korea Big Data Think Tank
See more events
CNN thebiz 강남교육연수센터 301호
서울시 강남구 테헤란로 1길 48번지 (역삼동 619-16번지) · Seoul