Skip to content

Introduction to Apache Tajo: A Big Data Warehouse on Hadoop

Photo of Sara Asher
Hosted By
Sara A.
Introduction to Apache Tajo: A Big Data Warehouse on Hadoop

Details

Summary:

Apache Tajo is one of the SQL-on-hadoop systems. It aims at a data warehouse system for Web-scale data. Basically, it provides distributed scalable batch and interactive query processing via ANSI SQL. It also provides the virtual integration of a multitude of diverse data sources, thereby facilitating easy and rapid data integration which has been regarded as an essential, but heavy step in big data warehouse. My talk will introduce Apache Tajo including its
overall architecture, current state and challenges, and discuss the advantages what Tajo can bring to users. In addition, I will give a demo of federated query analysis on multiple data sources with Tajo.

Speaker Bio:
Hyunsik Choi, Ph.D., is one of committer and PMC members on Apache Tajo. He is a director of research at Gruter which is a big data company located in Palo Alto, and he have contributed to query plan optimizer and vectored query engine using modern hardware for Tajo. Recently, he has interests in runtime query compilation techniques
using LLVM and modern hardware features.

Agenda

6:00 pm -- 6:30 pm Networking

6:30 pm -- 6:35 pm Introduction and announcement

6:35 pm -- 7:45 pm Main talk and Q & A

7:45 pm -- 8:15 pm networking and closing

8:30 pm door close

Photo of SF Big Analytics group
SF Big Analytics
See more events
Alpine Data Labs
1550 Bryant Street · San Francisco, CA