addressalign-toparrow-leftarrow-rightbackbellblockcalendarcameraccwchatcheckchevron-downchevron-leftchevron-rightchevron-small-downchevron-small-leftchevron-small-rightchevron-small-upchevron-upcircle-with-checkcircle-with-crosscircle-with-pluscrosseditemptyheartfacebookfolderfullheartglobegmailgoogleimagesinstagramlinklocation-pinmagnifying-glassmailminusmoremuplabelShape 3 + Rectangle 1outlookpersonplusprice-ribbonImported LayersImported LayersImported Layersshieldstartickettrashtriangle-downtriangle-uptwitteruseryahoo

Scalable Analytics with R and Hadoop

Special thanks to the folks at for hosting this talk.  Dr. Steve Hanks, Principal Data Scientist at WhitePages, will be introducing Gwen.

When I found out that Gwen Shapira (Blog /Linkedin) was coming to town, I asked her if she would take the evening to speak to the Seattle data community. She agreed, and offered to share some of her recent interest in R and Hadoop.

This is a joint meetup with our friends at the Seattle useR Group.


Modern data applications often require analyzing multi-terabyte data sets. R is one of the most popular languages for data processing. It is best known for its large library of advanced statistical tools. However, using R to analyze multi-terabyte data sets present challenges – How do we avoid transmitting all the data over the network? How do we scale statistical algorithms? What are the options of integrating R with Hadoop clusters?
This presentation is geared towards R beginners with some knowledge of Hadoop and Map-Reduce concepts. Attendees will learn important R concepts, effective data wrangling tools and how to scale R algorithms for large data sets using RHadoop. We will discuss RHadoop in depth and share deployment, scalability and troubleshooting lessons that we have learned the hard way.

Speaker Bio:

Gwen Shapira is a Solutions Architect at Cloudera. She has 15 years of experience working with customers to design scalable data architectures. Working as an data warehouse DBA, ETL developer and a senior consultant. She specializes in migrating data warehouses to Hadoop,integrating Hadoop with relational databases, building scalable data processing pipelines, and scaling complex data analysis algorithms. 

Join or login to comment.

  • K C.

    Great talk -- Did the slides or code get posted somewhere?

    1 · June 21, 2014

  • Silvia V.

    Great presentation, nice adaptation to the audience's level of expertise on-the-fly, informative.

    June 17, 2014

  • Phillip B.

    Here is the link to the Revo64 AMIs on AWS:

    Revolution Analytics is still offering the free trial. Revo64 is free during the trial period but AWS costs still apply.

    Be mindful of costs and stop the instance when not using it.

    2 · June 16, 2014

  • Nick M.

    An inexpensive option is to park South of James St. where on-street parking is free starting at 6:00. WhitePages's offices are in the Rainier tower, which has parking with an entrance on Union between 4th & 5th.

    June 14, 2014

  • K C.

    Hi - Where should we park? Is there a recommended parking spot?

    June 14, 2014

Our Sponsors

People in this
Meetup are also in:

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy