Hands-On Intro to Big Data Analytics using Apache Spark and Apache Zeppelin


Details
This is a 100-200 level talk on Spark and Zepplin. It is not meant to be a deep-dive on any one of these technologies.
6:00 PM- 6:30 PM: Food, drinks, mingling
6:30 PM - 6:45 PM: Ketur Shah and Artem Ervits Announcements, call for presenters, future events
6:45 PM: Alex Zeltov - Hands-On Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
This workshop will provide an introduction to Big Data Analytics using Apache Spark and Apache Zeppelin.
There will be a short lecture that includes an introduction to Spark, the Spark components.
Spark is a unified framework for big data analytics. Spark provides one integrated API for use by developers, data scientists, and analysts to perform diverse tasks that would have previously required separate processing engines such as batch analytics, stream processing and statistical modeling. Spark supports a wide range of popular languages including Python, R, Scala, SQL, and Java. Spark can read from diverse data sources and scale to thousands of nodes.
The lecture will be followed by demo . There will be a short lecture on Hadoop and how Spark and Hadoop interact and compliment each other. You will learn how to move data into HDFS using Spark APIs, create Hive table, explore the data with Spark and SQL, transform the data and then issue some SQL queries. We will be using Scala and/or PySpark for labs.
Users have 2 options to follow along with the demo labs. You can use the:
-
Hortonworks Sandbox on a VM No data center, no cloud service and no internet connection needed! Full control of the environment. http://hortonworks.com/products/hortonworks-sandbox/#install
-
HDP 2.4 on Azure with Hortonworks Sandbox. Try Hortonworks Sandbox on Windows Azure. It’s free for the the first month, and there’s no need to download the VM!
http://hortonworks.com/blog/hortonworks-sandbox-with-hdp-2-3-is-now-available-on-microsoft-azure-gallery/
Alex Zeltov is a Solutions Engineer / Software Engineer / Programmer Analyst / Data Scientist with over 15 years of industry experience in Information Technology and most recently in Big Data and Predictive Analytics. Specializing in designing, developing and implementing complex software solutions. Experienced in all areas of the software development life cycle. Currently working as Solutions Engineer at Hortonworks, where he is responsible for creating high-value Hadoop solutions for customers. He created and delivered in-depth technical demonstrations and delivered custom big data workshops, POC’s for new and existing accounts. He holds an M.S. in Software Engineering degree from Penn State, as well as an undergraduate degree in Computer Science from Temple University. You can catch Alex on twitter @azeltov.
Following the Spark session, we have Ketur Shah from Microsoft present their data analytics solutions on Azure.
Brief : Ketur Shah brings in over 2 decades of Capital Market and Banking experience and is a Solutions Architect at Microsoft in the Financial Services domain on the core Azure Platform, to architect and design solutions in Big data, Stream Analytics, Predictive Analytics, Data Visualization, Machine Learning, Blockchain and several other technologies on the Azure Platform.
Please provide your full name for Microsoft security.

Sponsors
Hands-On Intro to Big Data Analytics using Apache Spark and Apache Zeppelin