Apache Spark(SQL) Fine Grained Security with Apache Ranger and SparkR updates

Name: Apache Spark(SQL) Fine Grained Security with Apache Ranger and SparkR updates
Start: 2017-02-06T17:30:00-05:00
End: 2017-02-06T20:30:00-05:00
Location: NY Microsoft Office

Hosted by Artem E.

Future of Data: New York

Details

Title: Apache Spark(SQL) Fine Grained Security with Apache Ranger and SparkR update

Level: all levels

Agenda:

5:30 PM - 6:00 PM Food, drinks, mingling

6:00 PM - 6:15 PM Artem Ervits Announcements, call for presenters, future events

6:15 PM - 8:30 PM Vinay Shukla and Yanbo Liang, Hortonworks, Inc.

Fine Grained Security to SparkSQL

So far SparkSQL only provides coarse grained security. With coarse grained security users & groups access is either allowed or denied to a table. Often a finer control over security is needed. SparkSQL & Ranger integration allows controlling access to SparkSQL down to a row or column and other advanced controls such as masking. This session walks through this integration and shows a demo of the feature.

Integrate SparkR with existing R packages to accelerate data science workflows

R is the de-facto programming language for data science with nearly 10,000 packages in single-machine era. However, native R is burdened by numerous scalability challenges as the dataset increasing. SparkR provided many scalable statistic functions and distributed machine learning algorithms which can help users overcome the scaling bottlenecks. Could we integrate the better scalability of SparkR and function diversity of existing R packages? The answer is yes. In this talk, we will summarize the efforts related to integrate SparkR with existing R packages such as: user-defined function, apply function parallel, virtual environment for third-party R library, performance improvement of Spark DataFrame and local R DataFrame conversion, etc. Then we will demonstrate how to solve several typical data science tasks leverages these features. At last, we will shortly introduce the community efforts in progress on SparkR in the coming releases.

Speakers:

Vinay Shukla is the director of product management for Spark, Zeppelin, and Agile analytics at Hortonworks. Previously, Vinay worked as a developer and security architect. Vinay has been a frequent speaker at many conferences, including Hadoop Summit, Apache Big Data, JavaOne, and Oracle World. Vinay enjoys being on a yoga mat or on a hiking trail. You can follow him on his blog (http://www.vinayshukla.com/)

Yanbo Liang is an Apache Spark Committer working on MLlib and SparkR at Hortonworks. His main interests center around machine learning, data science and distributed system. He is an active Apache Spark contributor(top 15), delivered the implementation of some major MLlib algorithms. Prior to Hortonworks, he was a software engineer at Yahoo! and France Telecom working on personalized recommendation and machine learning.

Future of Data: New York

Cloudera

Pivotal

Microsoft

IBM

Apache Spark(SQL) Fine Grained Security with Apache Ranger and SparkR updates

Future of Data: New York

Details

Related topics

Sponsors

Cloudera

Pivotal

Microsoft

IBM

You may also like