Data Science at Scale with HAWQ and MADlib and Hadoop


Details
Performing machine learning and advanced analytics on larger data sets is the secret to higher accuracy in data science.
Hadoop is a great way to combine and store extremely large data sets.
What you need is a faster way to perform these analytics that scales as large as your data sets can go.
https://a248.e.akamai.net/secure.meetupstatic.com/photos/event/5/1/7/6/600_453920854.jpeg
In this Meetup you'll learn about Apache HAWQ (http://hawq.incubator.apache.org/), the elastic, parallel processing query engine that operates on all your data directly within Hadoop. You'll also learn about Apache MADlib (http://madlib.incubator.apache.org/), the big data machine learning library that provides popular data science algorithms capable of leveraging the parallel processing capabilities of HAWQ.
Using Apache Zeppelin as the notebook, you'll see how you can perform data science investigations on your data in Hadoop by invoking MADlib functions in Python, R, and directly with SQL.

Sponsors
Data Science at Scale with HAWQ and MADlib and Hadoop