Location visible to members
6:30-7:00: Meet fellow members, networking
7:00-7:15: Welcome, raffle 1 registration to PASS Business Analytics Conference![masked]:00: Machine Learning: The Race for Great Predictive Power
8:00-8:45: GridGain Open Source In-Memory Computing Platform
8:45-9:00: Raffle for DataEDGE Conference, discussion and networking
MACHINE LEARNING: THE RACE FOR GREATER PREDICTIVE POWER
Data Modeling has been constrained through scale; Sampling still rules the day for adhoc analytics. Scale brings much needed change to the modeling world. In this talk we present the predictive power of using sophisticated algorithms on big datasets. With large data sizes comes the particularly hard problem of unbalanced data with multiple asymmetrically rare classes. Missing features pose unique problems for most classification and regression algorithms and proper handling can lead to greater predictive power. In the race for better predictions, H2O makes practical techniques accessible to anyone through an easy-to-use software product.
H2O is an open source math & machine learning engine for big data that brings distribution and parallelism to powerful algorithms while keeping the widely used languages of R and JSON as an API. It integrates neatly into popular data ecosystems of Hadoop, Amazon S3, NoSQL and SQL. We briefly discuss design choices in the implementation of Distributed Random Forest and Generalized Linear Modeling, and bringing speed and scale to the vox populi of Data Science, R. We take a peek at the elegant lego-like infrastructure that brings fine grained parallelism to math over simple distributed arrays.
A short hacking data demo presents the life cycle of Data Science:
Powerful Data Manipulation via R at scale, Interactive Summarization over large datasets, Modeling using Elastic Net (GLM), Grid Search for best parameters & low-latency scoring.
SriSatish Ambati is Co-Founder and CEO of 0xdata (http://www.oxdata.com/) (@hexadata), the builders of H2O. H2O democratizes big data science and makes Hadoop do math for better predictions. Prior to Oxdata Sri has held a variety of leadership roles in the private and academic sectors. He co-founded Platfora, and was the Director of Engineering at DataStax. At Azul Systems, a java multi-core startup, Sri was Partner & Performance Engineer, where he got to work on the entire ecosystem of enterprise apps at scale. In academics, Sri worked with researchers at Purdue and Stanford to scale R over big data, and pursued Theoretical Neuroscience at Berkeley. Sri is known for his knack for envisioning killer apps in fast evolving spaces and assembling stellar teams towards productizing that vision. A regular speaker in the BigData, NoSQL and Java circuit, Sri leaves a trail @srisatish.
GRIDGAIN OPEN SOURCE IN-MEMORY COMPUTING PLATFORM
Get an overview of GridGain 6.0, a Java-based Apache 2.0 licensed In-Memory Computing platform that combines clustering, high performance computing, streaming and Complex Event Processing (CEP), in-memory data grid, and Hadoop acceleration into one unified, easy to use platform. GridGain software is used by hundreds of companies around the world to deliver unprecedented performance and scalability gains in a variety of industries including finance, mobile payments, in-game merchant platforms, hyper-local advertising, medical imaging, cognitive analytics and natural language processing applications.
Nikita Ivanov is Founder and CTO of GridGain Systems (http://www.gridgain.com/), started in 2007 and funded by RTP Ventures and Almaz Capital. Nikita has led GridGain to develop advanced and distributed in-memory data processing technologies – the top Java in-memory computing platform starting every 10 seconds around the world today.
Nikita has over 20 years of experience in software application development, building HPC and middleware platforms, contributing to the efforts of other startups and notable companies including Adaptec, Visa and BEA Systems. Nikita was one of the pioneers in using Java technology for server side middleware development while working for one of Europe’s largest system integrators in 1996.
He is an active member of Java middleware community, contributor to the Java specification, and holds a Master’s degree in Electro Mechanics from Baltic State Technical University, Saint Petersburg, Russia.
A BIG THANK YOU TO OUR SPONSORS
Thanks to GridGain (http://www.gridgain.com/) for hosting the venue
Many thanks to the Professional Association for SQL Server (PASS) for donating a complimentary full-conference registration for the PASS Business Analytics Conference (http://www.passbaconference.com/) May 7-9, 2014 at the San Jose Convention Center. The PASS Business Analytics Conference is a professional, community-oriented gathering for business analytics professionals. This all-access pass, valued at $1795, covers your full attendance at the conference, which begins at the evening Welcome Reception, Wednesday, May 7. It includes all sessions Thursday-Friday, May 8-9, as well as all evening events and conference meals.
The Professional Association for SQL Server (PASS) is an independent, not-for-profit association dedicated to supporting, educating, and promoting the global Microsoft SQL Server community.
The pass will be raffled at the May 1 meet-up.
DISCOUNT CODE: PADSA members receive $300 off by using discount code: BACPA300
Learn More (http://www.passbaconference.com/)
Many thanks to the UC Berkeley School of Information for setting up a discount code, and for donating one complimentary registration for the DataEDGE Conference (http://dataedge.ischool.berkeley.edu/2014/) May 8-9, 2014 at UC Berkeley. The complimentary registration is valued at $650 and will be raffled at our May 1, 2014 meeting.
DataEDGE brings together social scientists, computer scientists, policy-makers, designers, and artists for an intimate two-day conference to assess the current state of data science and the data revolution. DataEDGE conference will bring you up to speed quickly on the current state of the data revolution. You will hear from leading experts in the field about the way organizations are using data to address business and societal issues, about the challenges of working with data at scale, and about the most pressing questions and debates facing data scientists today.
DISCOUNT CODE: PADSA members receive 10% off by using discount code DE14-D9KB
Learn more (http://dataedge.ischool.berkeley.edu/2014/)
The ASE (Academy of Science and Engineering) is holding their “Big Data Conference” at Stanford May 27-31. PADSA members can now register for a full 5-day pass to the conference in exchange for volunteering some time to help at the conference. Volunteering is a great way to network with other technologists, and you get to attend the conference in exchange for your time.
Available shifts can range from just a couple hours to a full day. They seem pretty flexible so put your hat in the ring and volunteer today!
Register to volunteer here (http://www.scienceengineeringacademy.org/asesite/events/volunteer-registration-system/) (please register as "ASE member $0" and note “PADSA” or “Palo Alto Data Science Association” when you register)
Program Schedule (http://www.scienceengineering.org/ase/conference/2014/bigdata/sanjose/website/pdf/PreliminaryProgram.pdf)