Exploring Enron Email Dataset with Kiji and Hive; Apache YARN and Apache Tez

Exploring Enron Email Dataset with Kiji and Hive

Lee Sheng,  WibiData

Apache Hive is a data warehousing system for large volumes of data stored in Hadoop that provides SQL based access for exploring datasets. KijiSchema provides evolvable schemas of primitive and compound types on top of HBase. The integration between these provides the best aspects of both worlds (ad hoc SQL based querying on top of datasets using evolvable schemas containing complex objects). This talk will present an examples of queries utilizing this integration to do exploratory analysis of the Enron email corpus. Delving into topics such as email responder pairs and sentiment analysis can expose many of the interesting points in the rise and fall of Enron. 

Bio:

Lee is an engineer at WibiData who works on building tools for building Big Data Applications. He holds a BS in Computer Science from Carnegie Mellon University. Previous stints include developing systems for making strategic buying decisions at Amazon.com as well as distributed simulation frameworks for the Department of Defense.


Apache YARN & Apache Tez

Tom McCuch Technical Director, Hortonworks

Apache Hadoop has become synonymous with Big Data and powers large scale data processing across some of the biggest companies in the world. Hadoop 2 is the next generation release of Hadoop and marks a pivotal point in its maturity with YARN - the new Hadoop compute framework. YARN - Yet Another Resource Negotiator - is a complete re-architecture of the Hadoop compute stack with a clean separation between platform and application. This opens up Hadoop data processing to new applications that can be executed IN Hadoop instead of outside Hadoop, thus improving efficiency, performance, data sharing and lowering operation costs. The Big Data ecosystem is already converging on YARN with new applications like Apache Tez being written specifically for YARN. Apache Tez aims to provide high performance and efficiency out of the box, across the spectrum of low latency queries and heavy-weight batch processing. The talk will provide a brief overview of key Hadoop 2 innovations, focusing in on YARN and Tez - covering architecture, motivational use cases and future roadmap. Finally, the impact of YARN on the Hadoop community will be demonstrated through running interactive queries with both Hive on Tez and with Hive on MapReduce, and comparing their performance side-by-side on the same Hadoop 2 cluster.


Bio:

Tom McCuch drives the field architecture and engineering for Hortonworks in the Northeast region. Tom has over twenty five years of experience in software engineering. At Hortonworks, Tom helps guide enterprise customers through their adoption of Apache Hadoop. He has deep experience across the Financial Services, Insurance, Life Sciences, Retail, and Telecommunications industries. Before coming to Hortonworks, Tom has served in many different roles across Enterprise Architecture, Product Engineering, Professional Services, and Sales Engineering of mission-critical solutions based on Java and open source software.

Schedule

6:00-7:00 - Networking

7:00-7:15 - Announcements

7:15-8:00 - Lee Sheng on Kiji

8:00-8:15 - Break

8:15-9:00 - Yarn and Tez

Join or login to comment.

  • Mike

    Could Tom McCuch post his slides as well? Thanks

    1 · January 8, 2014

  • Bobby P.

    Great presentations.

    January 8, 2014

  • Mike

    Interesting frameworks, maybe a few too many questions taken by the presenter during the second half of the presentation which made us a little short on time. Overall, well done.

    January 8, 2014

  • Ralph H.

    Both good, informative presentations.

    January 8, 2014

  • A former member
    A former member

    Thank you!

    January 7, 2014

  • A former member
    A former member

    For those that may be interested: The slides for my Enron talk are here: http://www.slideshare.net/wibidata/exploring-the-enron-email-dataset-with-kiji-and-hive

    3 · January 7, 2014

  • Mercedes B.

    I am very excited to be exposed to this technology. Are there locations that are training folks in Hadoop and Apache Hive ??

    Will slides or handouts be made available ?

    January 7, 2014

  • Chandra K.

    Can administrator plz confirm - if this event is still on for today?

    January 7, 2014

    • Naresh K.

      Yes the event is still on

      1 · January 7, 2014

  • Naresh K.

    Yes the event is still on

    January 7, 2014

  • John M.

    Won't be able to make this one guys see you all in the next one.

    January 7, 2014

  • Sumeet

    Is this still being held considering the weather situation?

    January 6, 2014

  • will m.

    Why are all DC area meetups always on the same day at the same time? geeeeez

    January 6, 2014

  • Chandra K.

    I like to be current in bigdata technology

    January 2, 2014

  • Alex K.

    The CityGML + ARML Workshop will be recorded and downloadable as a podcast from www.rev-mac.com . I have seen the Enron email set explored with Python, R and Neo4J's Cypher. It will be fun to see it explored with a user-friendly SQL syntax!

    January 2, 2014

  • suraj

    is this session being recorded as well?

    January 1, 2014

  • Alex K.

    Please also check out my CityGML + ARML Workshop: http://www.meetup.com/CityGML-ARML-Working-Group/
    Thanks!
    --Alex

    January 1, 2014

  • SRINIVASA V.

    Want to see how to use Kiji framework

    December 21, 2013

  • A former member
    A former member

    Hello!

    December 19, 2013

  • Ruhollah F.

    This should be a good talk!

    December 6, 2013

Our Sponsors

  • Tetra Concepts

    Thank you to Tetra Concepts for sponsoring this meetup.

  • BAE Systems

    Thank you to BAE Systems for sponsoring this meetup.

  • Booz Allen

    Thank you to Booz Allen for sponsoring this meetup.

People in this
Meetup are also in:

Create a Meetup Group and meet new people

Get started Learn more
Henry

I decided to start Reno Motorcycle Riders Group because I wanted to be part of a group of people who enjoyed my passion... I was excited and nervous. Our group has grown by leaps and bounds. I never thought it would be this big.

Henry, started Reno Motorcycle Riders

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy