addressalign-toparrow-leftarrow-rightbackbellblockcalendarcameraccwcheckchevron-downchevron-leftchevron-rightchevron-small-downchevron-small-leftchevron-small-rightchevron-small-upchevron-upcircle-with-checkcircle-with-crosscircle-with-pluscrossdots-three-verticaleditemptyheartexporteye-with-lineeyefacebookfolderfullheartglobegmailgooglegroupsimageimagesinstagramlinklocation-pinm-swarmSearchmailmessagesminusmoremuplabelShape 3 + Rectangle 1outlookpersonJoin Group on CardStartprice-ribbonImported LayersImported LayersImported Layersshieldstartickettrashtriangle-downtriangle-uptwitteruseryahoo

Big Data Week - Paco Nathan keynote + Panel

Workflows for Understanding Big Data & Data Science 
Paco Nathan

Talk Details:

In this talk we'll present a newly emerging view of Big Data & Data Science centered around workflows.  A workflow is a holistic view of data analysis which includes the software, people and processes required to generate data insights.  We will propose a simple "scorecard" for assessing workflow technology options in the context of best-of-breed features, specific business use cases, infrastructure capabilities and data analysis goals.  Additionally, we'll discuss the enormous need for human talent retooling that is occurring in the industry today -- what's driving it, how it will evolve and why it's so important.  We'll tie these themes together with plenty of real-world examples and "war stories."

On the technology side, we will focus on Apache Spark and other open source (OSS) frameworks that build upon the notion of workflow.  These platforms take organizations beyond Hadoop and allow them to shift focus from Data Center Computing -- which focuses on utilization, elasticity, latency and operating costs associated with Big Data -- to emphasize applications and automation.  Many leading firms like Google, Twitter, Airbnb, Hubspot and eBay and already developing real-time Big Data applications in this fashion.

On the talent side, we will make the case that today's business leadership is poorly prepared to contend with enormous data rates, scalability and the core mathematics required to deploy high-ROI Big Data applications.  For example: how and when can we leverage graph queries, sparse matrices, convex optimization, bayesian statistics and other advanced topics?   We'll present material from a new O'Reilly book called “Just Enough Math,” which introduces advanced math for Big Data Science for business people in the context of concrete business use cases.  These examples contain plenty of illustrations and historical background, and include brief code examples in Python that are easily understood.

About the Speaker

Our keynote speaker Paco Nathan is a “player/coach” who's led innovative Data teams building large-scale apps for 10+ years. Expert in distributed systems, machine learning, Enterprise data workflows. Paco is an O'Reilly author and engineering consultant, and an advisor for several firms including The Data Guild. Paco received his BS Math Sci and MS Comp Sci degrees from Stanford University, and has 25+ years technology industry experience ranging from Bell Labs to early-stage start-ups. 

Event Details:

 • We will kickoff our 2nd annual Big Data Week event with a luncheon followed by a keynote address and Q&A. 

• After the keynote we'll have a short break and a panel discussion with data science experts.

• Parking is available in the GCATT / GTRI parking garage.

Confirmed Panel speakers include Michael Schmidt, CEO, Nutonian, Don Brown, former Director of Field Engineering at WibiData, Jonathan Lacefield, Solution Architect, DataStax, and other great experts to be announced soon.


11:30  Lunch is served

12:15  Welcome & announcements

12:30  Keynote with Paco Nathan

1:30  Panel discussion

3:00p - Big Data Week Hackathon @ Hypepotamus

RSVP for hackathon here:


Our events are only made possible by the generous support of our sponsors.  Sponsorship provides great visibility for your organization, and allows you to directly reach our growing 1400+ membership.

If you would like to sponsor this event, or one in the future, please contact Travis Turney. 

About Michael Schmidt

Michael Schmidt's research focuses on "Machine Science" - a direction in artificial intelligence research to accelerate data-driven discovery. Over the past 6 years, he has worked on algorithms and techniques to automate knowledge discovery from data. In particular, he has published extensively on identifying mathematical relationships (such as laws of physics) in experimental data, and algorithms in evolutionary computation.

About Don Brown

Don is COO and co-founder of Scaling Data a big data startup founded by three former Cloudera executives. He previously served as the Director of Field Engineering at WibiData, leading the Support, Training, Pre-Sales and Services teams.  Prior to WibiData, he was Director of Architectural Services at Cloudera, leading the global post-sales team.  In this role, Don worked as an advisor for many of Fortune 100 companies, assisting in both strategic and tactical aspects of their Big Data deployments.  Before assuming the leadership position at Cloudera, Don worked as a Principal Solution Architect, working on dozens of the earliest and largest Hadoop implementations in the world.

About Jonathan Lacefield

Jonathan is a technical Architect focused on delivering data-driven systems at scale.  With certifications in both Hadoop and Cassandra, Jonathan works with large and complex clients, helping them design, create, deploy, and support Big Data solutions across several different industries.  Jonathan has been working for DataStax, the commercial provider of Apache Cassandra, focusing on integrating new and emerging technologies with the Apache Cassandra product suite.

Join or login to comment.

  • Jennifer K.

    That's supposed to be (votes -1) to the .8th power over age_hrs +2) to the 1.8th power, with the whole thing multiplied by applicable penalties, if any (determined by webmaster, or separate match algorithms).

    May 29, 2014

  • Jennifer K.

    Don't know if this is what you want, but here's the basic "gravity" algorithm allegedly used by one busy website:

    Articles are scored based on their upvote score, the time since the article was submitted, and various penalties using the following formula:

    (votes -1) to the .8 power
    score = ----------------------------- * penalties
    (age_hrs +2) to the 1.8th power

    Because the time has a larger exponent than the votes, an article's score will eventually drop to zero, so nothing stays on the front page too long. This exponent is known as gravity.

    1 · May 29, 2014

  • Eric L.

    Hello Data Scientists, Can you help me develop a scoring algorithm for rank ordered items?

    1 · May 29, 2014

  • Monosij D.

    Great presentation, appreciate the insight.

    May 14, 2014

  • Jacob

    Thanks again to Meetup for allowing BAH to sponsor this talk. Thanks also to Mr Nathan and the rest of the panelists for a great discussion.

    If you are interested in starting the Data Science conversation in your organization we've created a field guide for bridging the gap between practitioners and newbies.

    Let us know what you think!

    May 12, 2014

  • MaryBeth

    Looking forward toy your future publications

    May 10, 2014

  • A former member
    A former member

    Thank you very much for the opportunity to present. I'm grateful for the many discussions during the event today, and getting to meet so many wonderful people working in Data here in Atlanta. Here's a PDF of my slides from today:

    2 · May 10, 2014

  • Rakesh P.

    Great work in this talk.....thanks!

    May 10, 2014

    • A former member
      A former member

      Thank you kindly Rakesh

      May 10, 2014

  • Travis T.

    Paco's deck will be posted to this meetup ASAP and video in 2-3 weeks. Thanks everyone who attended!

    May 10, 2014

  • Andrew G.

    Also, please don't forget to signup for the new DSATL mailing list,

    May 10, 2014

  • Andrew G.

    Paco's workshops are still open! Signup for 1-day sessions at the Perimeter if you are interested:

    Mon, 5/12: Hands-On Data Science, Tue, 5/13: Hands-On Machine Learning,

    1 · May 10, 2014

  • Travis T.

    Parking onsite in the garage is free!

    May 10, 2014

  • Travis T.

    Please check in with Rhea at the registration table when you arrive. Today is going to be awesome!

    1 · May 10, 2014

  • Rhea N.

    Be sure to keep up with the conversation via #dsatl, @datascienceATL, and #bdw14. See you all tomorrow :)

    1 · May 9, 2014

Our Sponsors

People in this
Meetup are also in:

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy