Workflows for Understanding Big Data & Data Science
In this talk we'll present a newly emerging view of Big Data & Data Science centered around workflows. A workflow is a holistic view of data analysis which includes the software, people and processes required to generate data insights. We will propose a simple "scorecard" for assessing workflow technology options in the context of best-of-breed features, specific business use cases, infrastructure capabilities and data analysis goals. Additionally, we'll discuss the enormous need for human talent retooling that is occurring in the industry today -- what's driving it, how it will evolve and why it's so important. We'll tie these themes together with plenty of real-world examples and "war stories."
On the technology side, we will focus on Apache Spark and other open source (OSS) frameworks that build upon the notion of workflow. These platforms take organizations beyond Hadoop and allow them to shift focus from Data Center Computing -- which focuses on utilization, elasticity, latency and operating costs associated with Big Data -- to emphasize applications and automation. Many leading firms like Google, Twitter, Airbnb, Hubspot and eBay and already developing real-time Big Data applications in this fashion.
On the talent side, we will make the case that today's business leadership is poorly prepared to contend with enormous data rates, scalability and the core mathematics required to deploy high-ROI Big Data applications. For example: how and when can we leverage graph queries, sparse matrices, convex optimization, bayesian statistics and other advanced topics? We'll present material from a new O'Reilly book called “Just Enough Math,” which introduces advanced math for Big Data Science for business people in the context of concrete business use cases. These examples contain plenty of illustrations and historical background, and include brief code examples in Python that are easily understood.
About the Speaker
Our keynote speaker Paco Nathan is a “player/coach” who's led innovative Data teams building large-scale apps for 10+ years. Expert in distributed systems, machine learning, Enterprise data workflows. Paco is an O'Reilly author and engineering consultant, and an advisor for several firms including The Data Guild. Paco received his BS Math Sci and MS Comp Sci degrees from Stanford University, and has 25+ years technology industry experience ranging from Bell Labs to early-stage start-ups.
• We will kickoff our 2nd annual Big Data Week event with a luncheon followed by a keynote address and Q&A.
• After the keynote we'll have a short break and a panel discussion with data science experts.
• Parking is available in the GCATT / GTRI parking garage.
Confirmed Panel speakers include Michael Schmidt, CEO, Nutonian, Don Brown, former Director of Field Engineering at WibiData, Jonathan Lacefield, Solution Architect, DataStax, and other great experts to be announced soon.
11:30 Lunch is served
12:15 Welcome & announcements
12:30 Keynote with Paco Nathan
1:30 Panel discussion
3:00p - Big Data Week Hackathon @ Hypepotamus
RSVP for hackathon here:
Our events are only made possible by the generous support of our sponsors. Sponsorship provides great visibility for your organization, and allows you to directly reach our growing 1400+ membership.
If you would like to sponsor this event, or one in the future, please contact Travis Turney.
About Michael Schmidt
Michael Schmidt's research focuses on "Machine Science" - a direction in artificial intelligence research to accelerate data-driven discovery. Over the past 6 years, he has worked on algorithms and techniques to automate knowledge discovery from data. In particular, he has published extensively on identifying mathematical relationships (such as laws of physics) in experimental data, and algorithms in evolutionary computation.
About Don Brown
Don is COO and co-founder of Scaling Data a big data startup founded by three former Cloudera executives. He previously served as the Director of Field Engineering at WibiData, leading the Support, Training, Pre-Sales and Services teams. Prior to WibiData, he was Director of Architectural Services at Cloudera, leading the global post-sales team. In this role, Don worked as an advisor for many of Fortune 100 companies, assisting in both strategic and tactical aspects of their Big Data deployments. Before assuming the leadership position at Cloudera, Don worked as a Principal Solution Architect, working on dozens of the earliest and largest Hadoop implementations in the world.
About Jonathan Lacefield
Jonathan is a technical Architect focused on delivering data-driven systems at scale. With certifications in both Hadoop and Cassandra, Jonathan works with large and complex clients, helping them design, create, deploy, and support Big Data solutions across several different industries. Jonathan has been working for DataStax, the commercial provider of Apache Cassandra, focusing on integrating new and emerging technologies with the Apache Cassandra product suite.