Skip to content

Hands-on Intro to Machine Learning with Apache Spark and Apache Zeppelin

Photo of Future of Data
Hosted By
Future of D.
Hands-on Intro to Machine Learning with Apache Spark and Apache Zeppelin

Details

Tentative Agenda

6:30 PM - 7:00 PM: Food, drinks, mingling

7:00 PM - 7:45 PM: Lecture

7:45 PM - 9:00 PM: Hands-on Lab

Apache Spark is a unified framework for big data analytics. Spark provides one integrated API for use by developers, data scientists, and analysts to perform diverse tasks that would have previously required separate processing engines such as batch analytics, stream processing and statistical modeling. Spark supports a wide range of popular languages including Python, R, Scala, SQL, and Java. Spark can read from diverse data sources and scale to thousands of nodes.

The lecture will be followed by a demo and a hands-on lab. There will be a short lecture on Spark’s Machine Learning (ML) module where we will cover both an older MLlib library as well as a newer Spark ML library for pipelining ML jobs. We will overview several supervised and unsupervised algorithms (K-Means, Random Forest etc.) followed by a hands-on lab in Apache Zeppelin. Zeppelin provides a notebook style environment for data exploration, analytics and more - it’s a modern Data Science studio.

Users have 2 options to follow along with the hands-on labs. You can use the:

  • Hortonworks Sandbox on a Virtual Machine (VM). No data center, no cloud service, and no internet connection needed! Full control of the environment. http://hortonworks.com/products/sandbox/#install

  • HDP 2.4 on Microsoft’s Azure with Hortonworks Sandbox. Try Hortonworks Sandbox on Windows Azure. It’s FREE for the the first month, and there’s no need to download the VM!

http://hortonworks.com/hadoop-tutorial/deploying-hortonworks-sandbox-on-microsoft-azure/

Robert Hryniewicz has over 15 years of experience working on Machine Learning, AI, Robotics, cloud products and more. He’s been a principal consultant at TiVo, CTO at a Singularity Labs company, Sr. Engineer at Cisco, NASA, Concurrent et al. Robert has been developing in Apache Spark since 2014. As a consultant he developed several interesting products including a Graph Analytics platform, as well as multiple Machine Learning and IoT prototypes. Robert’s interests range anywhere from distributed systems to advanced analytics, deep learning, NLP, general AI, robotics, vertical farms, and blockchain related technologies. He comes up with best ideas when hiking in Yosemite and other Nor Cal parks.

http://photos2.meetupstatic.com/photos/event/e/9/5/d/600_449699741.jpeg

Please check in at the lobby

Photo of Future of Data: Silicon Valley group
Future of Data: Silicon Valley
See more events