Skip to content

Data Science at Scale with HAWQ and MADlib and Hadoop

Photo of Future of Data
Hosted By
Future of D.
Data Science at Scale with HAWQ and MADlib and Hadoop

Details

Performing machine learning and advanced analytics on larger data sets is the secret to higher accuracy in data science.

Hadoop is a great way to combine and store extremely large data sets.

What you need is a faster way to perform these analytics that scales as large as your data sets can go.

https://a248.e.akamai.net/secure.meetupstatic.com/photos/event/5/1/7/6/600_453920854.jpeg

In this Meetup you'll learn about Apache HAWQ (http://hawq.incubator.apache.org/), the elastic, parallel processing query engine that operates on all your data directly within Hadoop. You'll also learn about Apache MADlib (http://madlib.incubator.apache.org/), the big data machine learning library that provides popular data science algorithms capable of leveraging the parallel processing capabilities of HAWQ.

Using Apache Zeppelin as the notebook, you'll see how you can perform data science investigations on your data in Hadoop by invoking MADlib functions in Python, R, and directly with SQL.

Photo of Future of Data: San Francisco group
Future of Data: San Francisco
See more events