Using Apache Spark for Machine Learning


Details
Abstract:
Machine learning data has grown beyond traditional measures, where one could easily offload their data to a small data science environment. What's needed is a single environment where rapid prototyping, testing, and productizing is possible on vast volumes of data. This is possible using the Hadoop platform, specifically Apache Spark and its plugin MLlib.
In this presentation we’ll cover an overview of Hadoop and its Machine Learning capabilities. These will be exemplified with a few live demos (accessible via Apache Zeppelin notebooks) and cover different Machine Learning algorithms.
Agenda:
• Machine Learning in Hadoop Ecosystem
• Overview of Apache Spark MLlib
• Demo using Hortonworks Data Cloud (HDCloud)
Get started with the technology:
• HDCloud Video Overview (3 min) - http://hor.tn/hwx-aws-video
• HDCloud Setup Docs - http://hor.tn/HDCloud-docs
Speaker Bio: Robert Hryniewicz has over 10 years of experience working on Machine Learning, AI, Robotics, cloud products and more. Currently he's a Data Scientist and Advocate at Hortonworks (a 100% open-source public Big Data company). Previously, Robert has been a principal consultant at TiVo, CTO at a Singularity Labs company, Sr. Engineer at Cisco, NASA, Concurrent et al. Robert has been developing in Apache Spark since 2014. As a consultant he developed several interesting products including a Graph Analytics platform, as well as multiple Machine Learning and IoT prototypes.
Co-hosted w/ the Future of Data Meetup Group: https://www.meetup.com/futureofdata-siliconvalley/

Using Apache Spark for Machine Learning