Rock Bottom Restaurant & Brewery - Tuesday May 19, 2015 @ 6:00pm MDT
NOTE: For folks unable to attend in person register and we will email you a webinar link 2 hours prior to event.
NOTE: Extending Word2Vec for Performance and Semi-supervised Learning by Michael Malak will not be included in the webinar. It is an opportunity for the presenter, Michael Malak, to practice for his Spark Summit presentation in June.
Location: Rock Bottom Brewery - 16th Street Mall #100, Denver, CO 80265 - Map: https://goo.gl/maps/Pphtt
6:00 - 6:20 Schmooze - Beer & Food shall be served
6:20 - 6:30 Announcements
6:30 - 7:00 Extending Word2Vec for Performance and Semi-supervised Learning by Michael Malak
7:00 - 8:00 Intro to Apache Ignite: Distributed Framework for Unified In-memory Data Fabric by Nikita Ivanov
8:00 - 8:30 Networking
Extending Word2Vec for Performance and Semi-supervised Learning - Abstract
MLLib Word2Vec is an unsupervised learning technique that can generate vectors of features that can then be clustered. But the weakness of unsupervised learning is that although it can say an apple is close to a banana, it can’t put the label of “fruit” on that group. We show how MLLib Word2Vec can be combined with the human-created data of YAGO2 (which is derived from the crowd-sourced Wikipedia metadata), along with the NLP metrics Levenshtein and Jaccard, to properly label categories. As an alternative to GraphX even though YAGO2 is a graph, we make use of Ankur Dave’s powerful IndexedRDD, which is slated for inclusion in Spark 1.3 or 1.4. IndexedRDD is also used in a second way: to further parallelize MLLib Word2Vec. The use case is labeling columns of unlabeled data uploaded to the Oracle Data Enrichment Cloud Service (ODECS) cloud app, which processes big data in the cloud.
Michael Malak - Bio
Michael Malak has been implementing Spark solutions for two Fortune 200 companies since early 2013. He is currently at Oracle in Colorado in a team developing a Spark-based Big Data cloud app. He has an M.S. Math from George Mason University. His book Spark GraphX In Action is due to be published later in 2015.
Intro to Apache Ignite: Distributed Framework for Unified In-memory Data Fabric - Abstract
An introduction to Apache Ignite™ (incubating), which is an open source, distributed framework for a unified In-Memory Data Fabric. Ignite provides a high-performance, distributed in-memory data management software layer that has been designed to operate between both new and existing data sources and applications, boosting application performance and scale by orders of magnitude. We will start with a summary of the technical drivers and market forces, and will cover popular and emerging use cases for in-memory computing, from financial industry trading platforms to mobile payment processing, online advertising, online/mobile gaming back-ends and more. We will then present some foundational concepts and terminology, and discuss the architecture, capabilities and benefits of the Ignite In-Memory Data Fabric in quite some detail.
Nikita Ivanov - Bio
Nikita Ivanov is founder and CTO of GridGain Systems, the leading Java in-memory data fabric and a PMCC Member of the Apache Ignite™ (incubating) project. Nikita has over 20 years of experience in software application development, building HPC and middleware platforms, contributing to the efforts of other startups and notable companies including Adaptec, Visa and BEA Systems.