Analyzing Twitter: An End-to-End Data Pipeline

Hadoop has become the established tool for dealing with big data, and one of the largest public data sets available comes from Twitter. Utilizing several tools from the Hadoop ecosystem, Twitter data can be efficiently processed and analyzed. Join us in May as we have two big data experts present their work in building complete systems to handle Twitter data.

 

6:30 PM -- Networking & Wraps from Santoni's

6:55 PM -- Greetings

7:00 PM -- Analyzing Twitter Data with Hadoop - Joey Echeverria

8:00 PM -- Working With Mahout - Sean Busbey

8:30 PM -- Time for Drinks at Little Havana


Logistics

Google maps does a good job of pinpointing the exact building location based on the address above.

Parking is not an issue. You can park on the street nearby or you can take up any Visitor spot that you can find. Also, feel free to park in the Ad.com/AOL employee parking lot but not in spaces marked Under Armour or Reserved.

 

Speakers

Joey Echeverria is a Principal Solutions Architect at Cloudera where he works directly with customers to deploy production Hadoop clusters and solve a diverse range of business and technical problems. Joey joined Cloudera from the NSA where he worked on data mining, network security, and clustered data processing using Hadoop. Prior to working full time for NSA, Joey attended Carnegie Mellon University where he attained a M.S. and a B.S. in Electrical and Computer Engineering.


Sean Busbey is a Solutions Architect at Cloudera where he works with
customers to architect, implement and optimize Big Data solutions for
a diverse range of use cases for CPG, Interactive Entertainment,
Advertising Analytics, and Federal clusters. Sean previously worked as a Software Engineer on a Big Data team at the NSA. Prior to working full time for NSA, Sean attended the University of Illinois at
Urbana-Champaign where he attained a B.S. in Computer Science.

 

Topics

Analyzing Twitter Data with Hadoop - Social media has gained immense popularity with marketing teams, and Twitter is an effective tool for a company to get people excited about its products. Twitter makes it easy to engage users and communicate directly with them, and in turn, users can provide word-of-mouth marketing for companies by discussing the products. Given limited resources, and knowing we may not be able to talk to everyone we want to target directly, marketing departments can be more efficient by being selective about whom we reach out to. In this talk, Joey will describe how you can use Apache Flume, Apache HDFS, Apache Oozie, and Apache Hive to design an end-to-end data pipeline that will enable us to analyze Twitter data.


Working With Mahout - Once the end-to-end pipeline is established, what insights can be gained? Sean will continue the Twitter analysis by describing how machine learning and data mining algorithms can be applied to the data.

 

Company

Cloudera is the leader in Apache Hadoop-based software and services and offers a powerful new data platform that enables enterprises and organizations to look at all their data — structured as well as unstructured — and ask bigger questions for unprecedented insight at the speed of thought. Behind some of the top minds in Big Data, including Doug Cutting, who invented Hadoop, Cloudera enhances the storage and processing technologies originally developed by the world’s biggest Web companies. Today, Cloudera is the market leader in Hadoop with tens of thousands of nodes under management, as well as the top contributor of code to the Hadoop ecosystem. Markets include financial services, government, telecommunications, media, web, advertising, retail, energy, bioinformatics, pharma/healthcare, university research, oil and gas, gaming and more.

Parking

All of the employee lots are up for grabs after 5pm. This means the two huge gravel/sand looking lots are good – one off of Hull Street and the other off of Key Hwy in front of Domino sugar. There is also about 75 specific visitors spots all around the buildings that they can park in. There is an alley between our building and the one in front of ours that can house at least 25 of them – most people don’t realize that they can turn down there. There is also street parking on Hull. Last – any Aol spot is up for grabs. The UA reserved spots and zip car only spots are not for public use.

 

Drinks

Little Havana

1325 Key Highway

Baltimore, MD 21230



Join or login to comment.

  • Jason B.

    Videos are trickling onto YouTube: http://www.youtube.com/user/Dat...­

    May 18, 2013

  • Jason B.

    In case you missed the meetup: http://datacommunitydc.org/blog...­

    May 13, 2013

  • Jason B.

    2 · May 11, 2013

  • Lewis B.

    Two good presentations. Thanks Jason et. al. for lining this up.

    May 10, 2013

  • Jason B.

    Thank you again to all our sponsors, and thank you all that attended. We will do our best to get the slides up and hopefully video soon.

    May 10, 2013

  • Matt R.

    I liked learning about the clustering methods used to classify Twitter users.

    May 9, 2013

  • dave b.

    bit longer than it needed to be, but that is hard to predict ....

    May 9, 2013

  • Adam S.

    Looking forward to this! I've always had a special place for Twitter in my heart... even though I don't really use it that much :/

    1 · May 8, 2013

  • Dawn T.

    Interested.

    May 6, 2013

  • Jason B.

    If you have any interesting articles to share, please join as at https://www.facebook.com/groups/...­

    May 5, 2013

  • Jackie K.

    I wanted to make it, but I won't be able to swing it time wise. Can you post your slides afterwards?

    May 1, 2013

    • Jason B.

      We will post the slides as long as the speakers are able to share them.

      1 · May 1, 2013

  • Jackie K.

    Is there recommended parking?

    April 25, 2013

    • Sean M.

      There is a ton of street parking available. Also, parking in any un-reserved Under Armour space should be ok.

      April 25, 2013

  • Lewis B.

    Anyone here happen to be using or have used Blender with Python for visualization?

    April 18, 2013

Our Sponsors

  • Founding Sponsors

    _________________

  • Varen Technologies

    An Intelligence and Defense Services Provider

  • Six3 Systems

    Architects, builds and supports enterprise software apps. and systems

  • Sponsors

    _________________

  • Intridea

    Helping businesses develop strategic solutions and launch new ideas

  • Applied Technology Group

    Provides Agile IT Solutions to the IC and Government Agencies

  • Booz Allen Hamilton

    Provides clients with engineering and operational consulting services

  • Cloudera

    The Platform for Big Data and the Leading Solution for Hadoop

  • PROTEUS Technologies

    Leading-edge engineering solutions provider for the IC and Industry

  • Teradata

    Global Leader in Data Warehousing and Big Data Analytic Technologies

  • Living Social

    We are hiring!

  • Novetta

    Big Data Analytics, Cyber Security, and more to the IC and Commercial

  • Partners & Resources

    _________________

  • Data Community DC

    Supporting the analytics community with events and more

  • O'Reilly Media

    Spreads the knowledge of innovators through its books and more.

  • Manning Publications

    Publish computer books for professionals

  • Mont. County Lab for Civic Improvement

    Helping communicate ideas and lessons learned.

People in this
Meetup are also in:

Sometimes the best Meetup Group is the one you start

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy