add-memberalign-toparrow-leftarrow-rightbellblockcalendarcamerachatchevron-leftchevron-rightchevron-small-downchevron-upcircle-with-crosscomposecrossfacebookflagfolderglobegoogleimagesinstagramkeylocation-pinmedalmoremuplabelShape 3 + Rectangle 1pagepersonpluspollsImported LayersImported LayersImported LayersshieldstartwitterwinbackClosewinbackCompletewinbackDiscountyahoo

Apache Spark - Making Sense of Big Data Faster and Easier

Dubbed the leading successor to Hadoop MapReduce, Apache Spark is a cluster compute system that makes data analytics fast -- both fast to run and fast to write. Programs written in Spark can often outperform those in MapReduce by 100X, while being 10X shorter and more understandable. In addition, Spark also provides efficient support for streaming, query execution, machine learning, and graph computation through rich high level libraries. Last but not least, the project features one of the most active open source community in Big Data: 150+ developers from 30+ organizations have contributed code to the project. In this talk, we will introduce the project, survey the high level libraries including streaming, SQL, and machine learning, and expand into how Spark can help you make better decisions easier and faster.


Reynold Xin is a committer on Apache Spark and a co-founder of Databricks. He is instrumental in the development of many high level frameworks on Spark, including SQL and graph computation. Prior to Databricks, he was pursuing a PhD in the UC Berkeley AMPLab.  

Patrick Wendell is a committer on Apache Spark and a co-founder of Databricks. Before Databricks, he was pursuing a PhD in the UC Berkeley AMPLab, where he worked on scalable low latency scheduling for data processing frameworks. In the past, he has contributed to several Hadoop projects, including Apache Flume and Apache Avro. 

Xiangrui Meng leads the development of the machine learning library on Spark at Databricks. Prior to Databricks, he was the primary developer on a Hadoop MapReduce-based machine learning framework at LinkedIn. He holds a doctorate degree in Computational and Mathematical Engineering from Stanford, where he was conducting research on large scale machine learning.


[masked]pm: Registration and Networking (with food & beverages)

6:[masked]pm: Introduction 

7:00-7:45pm: Presentations

7:[masked]pm: Q&A session

We'll be raffling two free passes for HBaseCon 2014 (May 5, San Francisco), so bring your business cards. Look for the Cloudera table!

Join or login to comment.

  • John T.

    Great event. Good presentation and excellent Q&A.

    1 · April 10, 2014

  • Justin K.

    Thanks everyone! Remember, HBaseCon 2014 is on May 5 in SF (

    2 · April 10, 2014

  • A former member
    A former member

    Fantastic meetup. Thank You Perkins Coie and The Hive

    2 · April 9, 2014

  • Kshitij K.

    Anybody returning to the East Bay afterwards? I need a ride! Thanks in advance

    April 9, 2014

  • Pashu P.

    Hi Data Bees! Start the conversation online NOW on Twitter using the hashtag #hivedata, tell your friends you are attending, ask questions, post pictures, meet each other, we want to hear from you!

    April 9, 2014

  • David L.

    will tomorrow session be recorded for people can't attend?

    1 · April 9, 2014

    • Pashu P.

      Yes David we will record the event, stay tuned on this page, will post the link after the event, thanks!

      April 9, 2014

  • A former member
    A former member

    Can you please enable web-cast options for attending for remote users like me?

    2 · April 3, 2014

    • JF H.

      Are there earlier sessions recorded?

      April 7, 2014

    • Pashu P.

      yes there is and the quality should be better starting next week!­

      April 7, 2014

  • Jim L

    In advance of this meeting, I found this interesting article on Spark:

    1 · March 19, 2014

Our Sponsors

  • Visa

    Global payments technology company.

  • Pentaho

    Reporting, analysis, dashboard, data mining and workflow capabilities.

  • Aerospike

    Flash-optimized in-memory open source NoSQL database.

  • Teradata

    Teradata is a provider of enterprise analytic technologies and services.

  • Cloudera

    Enterprise Platform for Big Data. Leading Solution for Apache Hadoop.

  • Foghorn

    Multi-tier deployment platform for data-driven IoT applications.

  • Perkins Coie

    International firm specializing in business law & litigation.

  • O'Reilly Strata

    Big data technology and strategy Conference.

People in this
Meetup are also in:

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy