Parallel scikit-learn on YARN and Real Secure Hadoop


Details
At last, another Hadoop Meetup through this group. This time, we're happy to announce talks about using the YARN resource manager for your own applications and another talk is to be announced soon.
We thank Cupenya (http://www.cupenya.com/about.html) for kindly hosting us at Rockstart Spaces. Also, thanks again to GoDataDriven (http://www.godatadriven.com/careers.html) for taking care of pizza and drinks.
Agenda
18.00: Arrive, drink, eat
18.45: Presentations
Distributed Computing on YARN: an attempt at massive scikit-learn, Niels Zeilemaker, Big Data hacker @ GoDataDriven
In this talk, Niels will outline the process of using the YARN resource manager for your own distributed computing needs. The first part of this talk will introduce the YARN resource manager, explain how to write custom distributed applications using it and what caveats apply. In the second part of the talk, we'll have a close look at a experimental implementation of distributing the popular scikit-learn machine learning toolkit using YARN.
About Niels:
Niels is a Big Data Hacker at GoDataDriven. Niels works for a wide range of companies where he engineers features and builds models. Before joining GDD, Niels finished his PhD thesis at the Technical University of Delft. During 4 years he researched into P2P systems, primarily focussing on privacy and cooperation. Applying encryption and anonymization techniques in the P2P domain.
Hadoop Security from the Trenches, Bolke de Bruin, Head of Advanced Analytics Technology @ ING
Setting up a secure Hadoop cluster involves a magic combination of Kerberos, Sentry, Ranger, Knox, Atlas, LDAP and possibly PAM. Add encryption on the wire and at rest to the mix and you have, at the very least, a interesting configuration and installation task.
Nonetheless, the fact that there are a lot of knobs to turn, doesn't excuse you from the responsibility of taking proper care of your customers' data. In this talk, we'll detail how the different security components in Hadoop interact and how easy it actually can be to setup thing correctly, once you understand the concepts and tools. We'll outline a successful secure Hadoop setup with one of ING's production clusters as leading example.
About Bolke:
Bolke is Head of Advanced Analytics Technology at ING Commercial Bank. Apart from running a team of data scientists and engineers, he also likes to keep his hands dirty with technology. This has resulted in code contributions to several Hadoop ecosystems, such as Apache Spark, Apache Ambari and Airflow.
20.30: Some more drinks, socialize
Later: Everybody out!

Parallel scikit-learn on YARN and Real Secure Hadoop