Location visible to members
Kernel machines, such as support vector machines, are among the most powerful and mathematically mature algorithms in machine learning. Despite the statistical maturity, however, the computational aspect involved with training and prediction suffer from inefficiency at scale. Besides, kernel machines in general, and Support Vector Machines in particular, are not well suited to the MapReduce paradigm, which is the dominant massive parallel processing platform nowadays. The computational complexity in implementing SVM arises from the requirements for dual representation of data and model during training and prediction as well as the iterative nature of the popular convex optimizers for approximating the solution.
There have been some endeavors towards making efficient implementations of SVM: some have led to fast, but unfortunately single-threaded, primal and dual solvers (LibSVM, LibLinear, basic VowpalWabbit, etc.); some have proposed more MapReduce-friendly methods for solving the primal formulation (the SVM implementation idea in and the AllReduce-based implementation of parallel VowpalWabbit) but with the price of sacrificing the non-linear separability power of SVM as a kernel machine.
In this work, we propose an outline for implementing non-linear SVM on the MapReduce framework using some interesting recent research in large-scale convex optimization and kernel computation theories. We exploit the idea of random features and shift-invariant kernels, as alternatives to approximate the Gram matrix representation of data in dual space, in order to relax the memory and CPU requirements for learning non-linear SVM. We also employ the parallelized stochastic gradient descent method as a fast and MapReduce-friendly online solver for training.