Special thanks to the folks at WhitePages.com for hosting this talk. Dr. Steve Hanks, Principal Data Scientist at WhitePages, will be introducing Gwen.
When I found out that Gwen Shapira (Blog /Linkedin) was coming to town, I asked her if she would take the evening to speak to the Seattle data community. She agreed, and offered to share some of her recent interest in R and Hadoop.
This is a joint meetup with our friends at the Seattle useR Group.
Modern data applications often require analyzing multi-terabyte data sets. R is one of the most popular languages for data processing. It is best known for its large library of advanced statistical tools. However, using R to analyze multi-terabyte data sets present challenges – How do we avoid transmitting all the data over the network? How do we scale statistical algorithms? What are the options of integrating R with Hadoop clusters?
This presentation is geared towards R beginners with some knowledge of Hadoop and Map-Reduce concepts. Attendees will learn important R concepts, effective data wrangling tools and how to scale R algorithms for large data sets using RHadoop. We will discuss RHadoop in depth and share deployment, scalability and troubleshooting lessons that we have learned the hard way.
Gwen Shapira is a Solutions Architect at Cloudera. She has 15 years of experience working with customers to design scalable data architectures. Working as an data warehouse DBA, ETL developer and a senior consultant. She specializes in migrating data warehouses to Hadoop,integrating Hadoop with relational databases, building scalable data processing pipelines, and scaling complex data analysis algorithms.