Past Meetup

Strata/Hadoop World NYC HUG

Hosted by New York Hadoop User group

Public group

This Meetup is past

326 people went

Location image of event venue


With 2 Speakers:

Predictive Analytics with Hadoop
Presented by Robert Chu

"Essentially, all models are wrong, but some are useful." - George E. P. Box

Predictive modeling is an iterative process of defining a model and then evaluating its usefulness. This process can easily become drawn out and cumbersome when building models with big data sets. KijiExpress is designed to make predictive modeling easier by providing much needed tooling and allowing users to define MapReduce steps through Scalding jobs. Using Enron's email data set as a use case, this talk will demonstrate how to define, train and validate predictive models using KijiExpress.


Robert is a member of the engineering team at WibiData. He develops tools that enable data scientists and engineers to seamlessly develop and deploy real-time predictive models using machine learning and natural language processing. He graduated with a BS in Computer Engineering from the University of Washington.


Sentry - File-Grained Role Based Authorization for Hive
Presented by Arvind Prabhakar

Until recently the only solution for securing data in Apache Hive is through the use HDFS File Permissions and Impersonation via Hive Server 2. Such a solution is not granular enough to fulfill Enterprise Data Warehouse security requirements. Moreover, the use of files as a dataset abstraction breaks down in face of higher order logical constructs such as indexes and partitions. With the introduction of Apache Sentry (incubating), we now have an enterprise grade role-based access control solution that is robust and extensible. Sentry also provides multi-tenant administration capabilities to allow various parts of an enterprise to effectively safeguard their data in the Hive warehouse using role-based access control for authenticated users. This talk will introduce you to Sentry, its features and architecture along with tips on how to secure your Hive data warehouse.


Arvind Prabhakar is an Engineering Manager and Technical Lead at Cloudera, as well as a Committer, PPMC Member, and Mentor for Apache Sentry; an ASF Member; a PMC Member and Chair for Apache Sqoop; and a PMC Member and Chair for Apache Flume.