Special HUG event at Linkedin

Join us at LinkedIn for a special event for Hadoop Users
Group showcasing two new Apache Incubator projects: Tajo and Samza. Tajo, a low-latency SQL query engine, and Samza, a distributed, reliable stream processing framework are both built atop Apache Hadoop YARN.

Doors open at six with socializing until 6:30. Pizza and beverages.
Look for the signs to the Unite presentation hall in building 2025.

Schedule:
6:00 doors open
6:30 Apache Tajo: A Big Data Warehouse on Hadoop
7:20 Apache Samza: Reliable Stream Processing with YARN and Kafka


Overview of the talks:

Apache Tajo: A Big Data Warehouse on Hadoop.
It is designed for low-latency and scalability, ad-hoc queries, and ETL on large-scale data sets. Tajo takes advantages of both advanced database techniques and MapReduce without sharing their shortcomings. It makes use of HDFS as a primary storage and it¹s own distributed query execution engine instead of MapReduce.

This talk is about an introduction to Tajo project and internal
architecture of Tajo. Tajo supports ANSI SQL and user-defined
functions. Tajo has the cost-based join optimizer and extensible query rewrite engine to find better plans. In terms of distributed
processing, it uses the DAG-based execution framework with various shuffle methods, such as range and hash. For scheduling, Tajo is designed to consider disk volumes of each node to improve scan throughput on disks significantly. Also, we will introduce more
planning opportunities led by a combination of various physical
operators and shuffle methods. Next, we will introduce Tajo's roadmap. Finally, as a case study, Jeong-shik will share some of findings related to Tajo's performance in an on-going project with SKT, a telco in Korea.

Apache Samza: Reliable Stream Processing with YARN and Kakfa
LinkedIn has recently open sourced Samza, its stream processing framework built on top of Apache Hadoop and YARN. Samza provides the ability to process infinite streams of data. Samza has a simple API, provides state management for the tasks, ensures falt tolerance and is very pluggable.


Speaker bios: 

Hyunsik Choi, Ph.D., is one of committer and PPMC members on Apache Tajo. He is a director of research at Gruter which is a big data company located in South Korea, and he have contributed to query plan optimizer and vectored query engine using modern hardware for Tajo. Recently, he has interests in runtime query compilation techniques using LLVM and modern hardware features.
http://kr.linkedin.com/in/hyunsikchoi

Jeong-shik Jang is VP at Gruter. He previously worked at Yahoo for five and a half years in Asia Search Engineering. Since he joined Gruter 3 years ago, he has been enjoying various and exciting roles as a project manager, a hands-on engineer, and a decision maker for management.

Chris Riccomini is a staff software engineer at LinkedIn where he has been the principal developer of Samza, contributed to the Hadoop ecosystem, worked on LI's RPC system and built the internal analytics/reporting tool.
http://www.linkedin.com/in/riccomini

Join or login to comment.

  • Jakob H.

    Thanks everyone who came. It was a very successful evening. I very much appreciate Hyunsik and Jeong-Shik's introduction. If Samza piqued your interest, please reach out to me. LinkedIn is hiring and we're looking to expand the Samza team.

    1 · November 7, 2013

  • Hyunsik C.

    I've uploaded the slide to slideshare. Thank you for your comments! http://www.slideshare.net/gruter/bay-area

    1 · November 6, 2013

  • Dan K.

    I would appreciate seeing the slidesets posted or e-mailed.

    November 6, 2013

  • Mahesh G.

    The presentation on Samza was very good.

    The presentations on Tajo were also good and informative. However, it was some difficult to understand the first speaker. But, overall, it was still good to learn about Tajo. I appreciate the efforts by presenters.

    2 · November 6, 2013

  • Martin S.

    Great content and atmosphere

    1 · November 6, 2013

  • A former member
    A former member

    Both the talks are impressive. I greatly appreciate the effort Hyunsik made, as non-native English speaker, to speak in front of native english speaking audience and be able to communicate the vision of the project. Bravo.

    5 · November 6, 2013

  • Var

    I don't understand the logic beyond this Notify me feature. Why don't you accept people from Wait list using their position in Wait list rather than notifying them? I got 10 notices so far and they are all gone by the time I tried to accept it. So 10 disappointments so far. Most of the meetups filled by ranking on the wait list if some one cancel their spot, this includes your monthly meet up if I remember correctly.

    November 1, 2013

    • Var

      Another Notification and couldn't get in even though I checked it in 5 minutes since I got an email. :) Is meetup owner going to check these messages and do something to fix this issue?

      November 4, 2013

    • Jakob H.

      Sorry about that. Just to be clear, this is a setting for the entire BAHUG group, not something we at LinkedIn have requested. I'd recommend pinging the Yahoo! organizers about this.

      November 5, 2013

  • Timothy W.

    I totally agree, I think you should allow those who are in the wait list to be added automatically. I feel the same as Var. Other meetups add those from wait list accordingly to the time they signed up to the wait list.

    1 · November 2, 2013

  • fuad

    Hi all! I am looking for a Sr. Data Scientist ; for more info please email me @ [masked]

    October 17, 2013

Our Sponsors

  • Yahoo! Inc.

    Meeting space, pizza and drinks are sponsored by the Yahoo! Hadoop team.

People in this
Meetup are also in:

Sometimes the best Meetup Group is the one you start

Get started Learn more
Katie

I'm surprised by the level of growth I've seen since becoming an organizer, it's given me more confidence in my abilities.

Katie, started NYC ICO

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy