Szilard Pafka @ eHarmony


Details
Benchmarking Machine Learning Tools for Scalability, Speed and Accuracy
12:00 arrival and lunch served
12:30 talk starts
13:30 end of Q&A
Title: Benchmarking Machine Learning Tools for Scalability, Speed and Accuracy
Please park below 2401 Colorado Ave. and bring your tickets for validation.
Abstract
Binary classification is a fundamental topic in machine learning and it is widely used in business applications. In practice one often has millions of observations with a mix of numerical and categorical features. If the number of features is not very large (sparse), algorithms such as random forests, gradient boosted trees or deep learning neural networks (and ensembles of those) are expected to perform the best in terms of accuracy. (Non-linear support vector machines can be accurate as well, but they cannot scale to millions of observations.) There are countless off-the-shelf implementations for the previous algorithms, but which one to use in practice? Surprisingly, there is a huge variation between even the most commonly used implementation/tools of the same algorithm in terms of scalability, speed, accuracy or how they deal with peculiarities of the data such as categorical features or missing observations. To make things worse, most of the tools claim to be “high-performance, optimized, scalable, lightening-fast etc”, which unfortunately too often does not live up to the reality. In this talk we’ll cut through the “big data” hype and discuss which open source tools work decently on largish datasets.
Bio
Szilard studied Physics in the 90s and has obtained a PhD by using statistical methods to analyze the risk of financial portfolios. Next he has worked in a bank quantifying and managing market risk. About a decade ago he moved to California to become the Chief Scientist of a credit card processing company doing everything data (ETL, analysis, visualization, machine learning etc). He is also the founder/organizer of several data science related meetups in Los Angeles (LA R, DataVis LA, Data Science LA etc.).
LinkedIn: https://www.linkedin.com/in/szilard
Twitter: @DataScienceLA (https://twitter.com/DataScienceLA)


Szilard Pafka @ eHarmony