We look forward to you joining our three quality speakers on bigger, better, and faster data science:
1. Speeding up the Data Scientist’s Discovery and Delivery Process by Jim Falgout, Chief technologist at Pervasive
2. Crunching Big Data with Google BigQuery by Ryan Boyd, Developer Advocate at Google
3. The Practice of Predictive Analytics in a Heterogeneous World of Tools and Big Data Platforms by Mike Zeller, CEO of Zementis
1:45-2:30pm Session 1
Title: Speeding up the Data Scientist’s Discovery and Delivery Process
Speaker: Jim Falgout, Chief Technologist, Pervasive Big Data & Analytics
Abstract: Two of the major barriers to effective Hadoop deployments in the enterprise are the complexity and limited applicability of MapReduce. Software developers with Hadoop and MapReduce experience are in short supply, slowing big data initiatives. Faster results to a broad range of analytic scenarios requires working at a higher level of abstraction, supported by new programming paradigms and tools.
In this talk we present one such approach based on our experience developing a visual workbench for big data analytics on Hadoop. This approach enables data scientists and analysts to build and execute complex big data workflows for Hadoop with minimal training and without MapReduce knowledge. Libraries of pre-built operators for data preparation and analytics reduce the time and effort required to develop big data projects on Hadoop. The framework is extensible allowing the addition of new operators as needed. Due to the efficiency of the underlying dataflow framework, the run times are shortened, allowing faster iterations of discovery and analysis.
Speaker bio: As Chief Technologist for Pervasive’s Big Data products team, Jim Falgout is responsible for setting innovative design principles that guide Pervasive engineering teams as they develop new big data-focused releases and products. Jim has 20 years of large-scale software development experience in roles including development manager, software architect, and principal engineer. Prior to joining Pervasive Jim was Software Development Manager for NexQL, Director of Software Architecture for Voyence, Software Development Principal for Net Perceptions/KD1, and Senior Software Engineer for Convex Computer. Jim holds a B.Sc. (Cum Laude) in Computer Science from Nicholls State University. Jim is a dataflow innovator and has authored articles including “Dataflow Programming: Handling Huge Data Loads Without Adding Complexity” in Dr. Dobb’s Journal, “How to Enhance Existing Applications with Embedded Analytics” in eWeek, “Dataflow Programming: A Scalable Data-Centric Approach to Parallelism,” “Crunching Big Data with Java” and “Let the Data Flow” in Java Developers Journal.
2:40-2:45pm Short break
2:45-3:30pm Session 2
Title: Crunching Big Data with Google BigQuery
Speaker: Ryan Boyd, Developer Advocate for Big Query at Google
Abstract: Applications which grow to web-scale generate massive amounts of data. Many developers end up throwing this data away because they can’t extract value from it without the necessary expertise or infrastructure. Google knows Big Data -- our infrastructure processes 60 hours of YouTube video uploads every minute and many of our products have hundreds of millions of users. Google has developed custom technologies to analyze this data and make intelligent product decisions. We’ve started to open up some of these technologies as APIs which allow developers to concentrate on their business problems, while Google handles the underlying infrastructure.
Google BigQuery is a Big Data analysis tool born from an internal technology known as Dremel. BigQuery enables developers to analyze terabyte data sets in seconds using a RESTful API and a SQL-like query language. We'll demonstrate how you can incorporate Google BigQuery into your own applications, and how queries are processed underneath the covers. We’ll also show examples using public data sets and demos built by other developers like you.
Speaker bio: Ryan Boyd is a Developer Advocate at Google, focused on cloud data services such as Google BigQuery. He’s been in the Developer Relations group for 5 years and previously helped build out the Google Apps ISV ecosystem. Ryan recently published his first book on “Getting Started with OAuth 2.0” with O’Reilly.
3:40-3:45pm Short break
3:45-4:30pm Session 3
Title: Big Data and Predictive Analytics: Faster Insights through Open Standards
Speaker: Michael Zeller, CEO Zementis
Abstract: While Hadoop, Cloud Computing, and database vendors provide excellent solutions for data aggregation and general analytics, they are also increasingly being used for advanced predictive analytics against vast amounts of data, preferably with low latency or in real-time. This drives the business need for data mining solutions that leverage big data and which need the ability to execute on multiple platforms/architectures. Facilitating this convergence is the Predictive Model Markup Language (PMML), a vendor-independent standard to represent and exchange data mining models that is supported by all major data mining vendors and open source tools. This presentation will outline the main features of the PMML standard as a key element of data science best practices and its application in the context of distributed processing.
Speaker Bio: Dr. Michael Zeller is the CEO of Zementis, a software company focused on the operational deployment and integration of predictive analytics and data mining solutions. In 2011 and 2012, Michael served as the co-chair for the Industry & Government Track of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining which is the premier international forum for data mining researchers and practitioners from academia, industry, and government. Michael received a Ph.D. in Physics from the University of Frankfurt (Germany), with emphasis in the development of neural networks, robotics, and human-computer intelligent interaction. Michael also received a visiting scholarship from the Department of Physics at the University of Illinois at Urbana-Champaign and was the recipient of a Presidential Postdoctoral Fellowship from the Computer Science Department at the University of Southern California.
Thanks to Yue Cathy Chang for organizing this meetup event.
Bio of Yue Cathy Chang:
Yue Cathy Chang focuses on Big Data partner ecosystem strategy and go-to-market execution, and is currently working with Pervasive Software as a consultant. Cathy’s experience spans business development, software sales, and product design/management/marketing. She had held multiple roles at enterprise software companies and startups, including Director of Business Development at Datameer, senior product management, product marketing and sales roles at Symantec and IBM. Early in her career, Cathy was a microprocessor design engineer and holds a patent for logic design. Cathy holds MS and BS degrees in Electrical and Computer Engineering from Carnegie Mellon University, as well as an MBA and a MS in Engineering Systems from MIT.
Coffee and Light Snacks will be available.