Online learning techniques, such as Stochastic Gradient Descent (SGD), are powerful when applied to risk minimization and convex games on large problems. However, their sequential design prevents them from taking advantage of newer distributed frameworks such as Hadoop/YARN. In this session, we will introduce “Knitting Boar”, an open-source Java library for performing distributed online learning on a Hadoop cluster under YARN. We will give an overview of how Knitting Boar works and examine the lessons learned from YARN application construction.
Josh Patterson is a Principal Solution Architect at Cloudera. Prior to joining Cloudera, he was responsible for bringing Hadoop into the smartgrid during his involvement in the openPDC project. His focus in the smartgrid realm with Hadoop and HBase was using machine learning to discover and index anomalies in time series data. Josh is a graduate of the University of Tennessee at Chattanooga with a Bachelors in Business Management and a Masters of Computer Science with a thesis titled "TinyTermite: A Secure Routing Algorithm" where he worked in mesh networks and social insect swarm algorithms. Josh has over 15 years in software development and continues to contribute to projects such as Apache Mahout, openPDC, and JMotif in the open source community.