Introduction to Apache Tajo: Future of Data Warehouse


Details
Introduction to Apache Tajo: Future of Data Warehouse
Presented by Jihoon Son, PMC member and a committer for Apache Tajo
https://kr.linkedin.com/in/jihoonson
Abstract: Apache Tajo is a data warehouse system for Web-scale data. It provides virtual integration of a multitude of diverse data sources, thereby facilitating easy and rapid data integration which has been regarded as an essential, but heavy step in business intelligence. In addition, it has a fault-tolerable distributed query engine for accelerating query speed. With the “query federation” and “distributed processing” capacities, Tajo is capable of providing users with reliable and efficient analysis of Web-scale data spread on multiple sources. I will introduce Apache Tajo including its overall architecture, current state and challenges, and discuss advantages what Tajo can bring to users. In addition, I will give a demo of integrated data analysis with Tajo.

Introduction to Apache Tajo: Future of Data Warehouse