Past Meetup

Big Data Science Meetup @ Strata Conference

This Meetup is past

150 people went


Link for Strata Conference :

5:30 P.M. - 6:00 P.M. Welcome

6:00 P.M. - 6:25 P.M. Session 1

Title: ADAM: Big Data Processing and Storage for Genomics

Speaker: Frank Austin Nothaft, Graduate Student at UC-Berkeley

Abstract: By using cloud computing services or large computing clusters to process genomic data, we can significantly decrease the cost and latency of genomic analysis. However, current genomics data formats and processing pipelines were introduced prior to many significant advances in cloud and cluster computing technologies. Through the careful design of new file formats, we can unlock the advantages of distributed computing and also ensure that the file formats can easily be optimized for future computing advances. In this talk, we introduce ADAM, a set of file formats and command line tools for processing genome data on clusters and in the cloud. Using 100 nodes from Amazon Web Services, ADAM performs genetic processing steps such as marking duplicates and sorting 40 to 50 times faster.

Speaker Bio: Frank Austin Nothaft is a graduate student in the AMP and ASPIRE labs at UC-Berkeley, and is advised by Prof. David Patterson. His current focus is on high performance computer systems for bioinformatics, and is involved in the ADAM, avocado, and FireBox projects. Prior to Berkeley, Frank worked at Broadcom in Irvine, CA on high performance electronic design automation. Frank has a Bachelors of Science with Honors in Electrical Engineering from Stanford University.

6:25 P.M. - 6:30 P.M. Q/A

6:30 P.M. - 7:15 P.M. Session 2

Title: All models are wrong, but some models are useful

Speaker: SriSatish Ambati, Founder and CEO of 0xData(@hexadata)

Abstract: The promise of big data is better predictions. There is no best model that works for all of your data. Model predictive performance is domain specific. What works in one data domain has sometimes very little consequence in another one. Data science needs to get closer to the business and unlock value.

Ensembles are here to stay! Users want a buffet of algorithms that try to "lock-pick" the data for it's secrets. Time is eventually the key limiter. Data science efforts have to make best out of the budget for experimentation and use some kind of co-evolutionary technique that picks the "Champion" model of models for your data. Robust automation and fast analytics can speedup large parts of data smithy. In this talk we discuss ensemble techniques of boosting & trees that when applied on use cases lead to a substantial better predictions.

Speaker Bio: Sri is co-founder and CEO of 0xdata (@hexadata), the builders of H2O. H2O democratizes bigdata science and makes hadoop do math for better predictions. Before 0xdata, Sri spent time scaling R over bigdata with researchers at Purdue and Stanford. Prior to that Sri co-founded Platfora and was the Director of Engineering at DataStax. Before that Sri was Partner & Performance engineer at java multi-core startup, Azul Systems, tinkering with the entire ecosystem of enterprise apps at scale. Before that Sri was at sabbatical pursuing Theoretical Neuroscience at Berkeley. Prior to that Sri worked on nosql trie based index for semistructured data at in-memory index startup RightOrder.

Sri is known for his knack for envisioning killer apps in fast evolving spaces and assembling stellar teams towards productizing that vision. A regular speaker in the BigData, NoSQL and Java circuit, Sri leaves trail @srisatish.

7:15 P.M. - 7:25 P.M. Q/A

7:25 P.M. - 7:30 P.M. Break

7:30 P.M. - 8:15 P.M. Session 3

Title: Real World Big Data Prescriptive Analytics

Speaker: Nick Gonzalez, Pentaho

Abstract: Today’s large and convoluted data landscape coupled with the abundance of available computing resources presents unique opportunities for data scientists around the world. To remain competitive in this landscape, we must go beyond generating predictions to generating solutions from big data that are driven by actions derived from data driven predictions. And we have to do this as fast as possible. This is the real world of big data prescriptive analytics.

Performing prescriptive analytics that is both accurate and responsive on big data is simultaneously our most valuable tool and our biggest challenge. Solving this challenge involves building intelligence and automation into data preparation and acquisition, leveraging distributed architectures to increase accuracy while reducing processing time, automating the descriptive analytics process, building intelligent workflows that minimize human error while maximizing human creativity, plus a whole lot more.

This talk will address each one of these challenges and present technical solutions and algorithms to address them. By the end of this presentation each individual solution will come together in a symphony of code and hardware to form a unified automated process that is the backbone of a successful big data prescriptive analytics solution.

Speaker Bio:

Nick Gonzalez wrote his first multiplayer video game at the age of 8 on a Tandy TRS80, he was the youngest programmer ever to lead R&D efforts for one of the top video game publishers in the world, he built targeted advertising systems for the web before we knew what to call them, and he built several technology companies (one of which was acquired by Microsoft before the age of 25). Nick has held several titles in the past 18 years, including Chief Software Architect for EA Sports where he architected Tiger Woods PGA Tour for Xbox 360 and PS3, Founder and CTO of 2 companies, where he built a big data behavioral analysis system that personalizes video games in real time optimizing the user experience and driving revenue, and most recently VP & Chief Data Scientist at Pentaho where he is laying the foundation for the next generation of prescriptive analytical technologies. Predictive algorithms, artificial intelligence, and large scale distributed systems have played a central role in almost every project Nick has ever been involved in.

As a self proclaimed “modern philosopher”, Nick spends his nights and weekends contemplating the simplicity of the human mind and devising ways to replicate it in a machine with as few lines of LISP code as humanly possible without leaving Emacs. When he is not solving ridiculous problems or staring at a computer screen, he is reading math and philosophy books, “chillin” with his wife listening to music, or hanging out with his four kids playing basketball and watching kung-fu movies.

8:15 P.M. - 8:25 P.M. Q/A

8:25 P.M. - 8:30 P.M. Break

8:30 P.M. - 9:30 P.M. Networking

Sponsors for this event: