Skip to content

Big Data Science Meetup Event @ Strata Conference

Photo of Sanhita Sarkar
Hosted By
Sanhita S.
Big Data Science Meetup Event @ Strata Conference

Details

SPONSORS for this event :

O'Reilly Media, SGI, MarkLogic, Alpine Data Labs, Cloudera

5:30 P.M. - 6:00 P.M. Networking

6:00 P.M. - 6:55 P.M. Session 1

Speaker Name: Eric Bloch, MarkLogic Corp.

Title: Achieving 100,000 Transactions a Second with a NoSQL Database

Abstract:

Eric will present a Big Data use case requiring 100K transactions per second. He will briefly cover the architecture of MarkLogic, an Enterprise NoSQL Database, and why it's ideal for this case. The session concludes with a detailed discussion of the techniques used, along with related performance analyses, to achieve the requirements.

Bio:

Eric Bloch

Director, Community

Eric Bloch is MarkLogic’s Community Director. He has over 20 years of experience developing software and has built systems used by millions of people, including apps, libraries, compilers, device drivers, and operating systems. He holds Masters’ degrees in Computer Science and Sociology from Stanford University and an Sc.B. in Mathematics and Computer Science from Brown University. Eric also runs http://markmail.org (http://markmail.org/)

6:55 P.M. - 7:00 P.M. Q/A

7:00 P.M. - 7:55 P.M. Session 2

Title: Predictive Analytics for Hadoop

Speaker Name : Phil Cooper, VP, Professional Services & Alliances at Alpine Data Labs

Abstract:

It's still early days for data mining on Hadoop. If you want to run advanced analytics or build predictive models on Hadoop data, you are going to have to get your hands dirty. There are very few applications to help you, and open-source libraries are still immature.

The engineering team at Alpine Data Labs has been working on a quite general framework for running statistical and modeling operations on large datasets. It's a 'polymorphic' approach: the implementation adjusts to the data environment so that it works equally well on MPP databases as well as Hadoop, taking full advantage of the scalability of the underlying platform.

We'll discuss our approach, illustrate with examples and show you the product in action.

Bio:

Phil runs the field technical team at Alpine data labs and leverages over 15 years of enterprise software experience in selling, designing and delivering advanced analytics, data warehouse, data discovery, master data management, eCommerce and search solutions. Prior to joining Alpine, Phil was Senior Director, BI and Analytics at Oracle where he acted as a product evangelist for the acquired Endeca Information Discovery technology. Previously, Phil held a number of management positions at Endeca Technologies, including leading the Western region consulting practice where he delivered ground-breaking decision-support solutions to key accounts including Toyota, Boeing, and Apple. Before Endeca, Phil held key leadership positions at Kalido and Royal Dutch Shell. Phil holds a B.S. from the University of Exeter, U.K. and a master’s degree in GIS & Remote Sensing from the University of Cambridge, U.K.

Phil will be joined by Dr. Will Ford, Data Scientist at Alpine Data Labs

Bio of Will Ford:

Dr. Ford specializes in data mining, machine learning, and optimization via the use of evolutionary algorithms. He has worked on projects involving traditional data analysis/regression/classification, automatic/aided target recognition, and software development of analytics solutions and tools. He works and develops solutions using Alpine, Matlab Python, R, C#, Ruby, and other technologies as dictated by the needs
of his projects.

As a graduate student in Computation and Neural Systems at the California Institute of Technology (Caltech), Will applied evolutionary algorithms to various tasks including the optimization of various computational chemistry packages. A significant part of his dissertation outlined his theory that quantal effects may play a role in synaptic plasticity, a process that is thought to play a central role in learning and memory. In addition, he has experience with various experimental techniques employed in biochemistry to study neural systems.

7:55 P.M. - 8:00 P.M. Q/A

8:00 P.M. - 8:30 P.M. Session 3

Title : Greenplum HD

Speaker Name: Sameer Tiwari, Hadoop Architect at EMC GreenPlum

Abstract :

Greenplum HD is a 100 percent open-source certified and supported version of the Apache Hadoop stack that includes HDFS, MapReduce, Hive, Pig, Hbase and Zookeeper. Backed by the world’s largest Hadoop support organization and tested at scale in Greenplum’s 1,000 node Analytics Workbench, Greenplum HD brings flexible storage options to an enterprise-ready Hadoop stack. Greenplum HD makes Hadoop faster, more dependable, and easier to use.

Bio:

Sameer has been building platform products for large deployments since the Application Server days at Sun Microsystems. He started working on big data prior to the invention of Hadoop, in the field of email archiving/search. Recently he was working on Ad-Serving and User Platform systems at Yahoo. He is the Hadoop Architect at EMC Greenplum, building the next generation systems for Big Data Analytics

8:30 P.M. - 9:30 P.M. Networking

Food and Drinks will be available.

Photo of Big Data Science group
Big Data Science
See more events
Santa Clara Convention Center, Ballroom E
5001 Great American Pkwy · Santa Clara, CA