Skip to content

Szilard Pafka @ eHarmony

Photo of Jon Morra
Hosted By
Jon M.
Szilard Pafka @ eHarmony

Details

Benchmarking Machine Learning Tools for Scalability, Speed and Accuracy

12:00 arrival and lunch served
12:30 talk starts
13:30 end of Q&A

Title: Benchmarking Machine Learning Tools for Scalability, Speed and Accuracy

Please park below 2401 Colorado Ave. and bring your tickets for validation.

Abstract

Binary classification is a fundamental topic in machine learning and it is widely used in business applications. In practice one often has millions of observations with a mix of numerical and categorical features. If the number of features is not very large (sparse), algorithms such as random forests, gradient boosted trees or deep learning neural networks (and ensembles of those) are expected to perform the best in terms of accuracy. (Non-linear support vector machines can be accurate as well, but they cannot scale to millions of observations.) There are countless off-the-shelf implementations for the previous algorithms, but which one to use in practice? Surprisingly, there is a huge variation between even the most commonly used implementation/tools of the same algorithm in terms of scalability, speed, accuracy or how they deal with peculiarities of the data such as categorical features or missing observations. To make things worse, most of the tools claim to be “high-performance, optimized, scalable, lightening-fast etc”, which unfortunately too often does not live up to the reality. In this talk we’ll cut through the “big data” hype and discuss which open source tools work decently on largish datasets.

Bio

Szilard studied Physics in the 90s and has obtained a PhD by using statistical methods to analyze the risk of financial portfolios. Next he has worked in a bank quantifying and managing market risk. About a decade ago he moved to California to become the Chief Scientist of a credit card processing company doing everything data (ETL, analysis, visualization, machine learning etc). He is also the founder/organizer of several data science related meetups in Los Angeles (LA R, DataVis LA, Data Science LA etc.).

LinkedIn: https://www.linkedin.com/in/szilard

Twitter: @DataScienceLA (https://twitter.com/DataScienceLA)

Photo of LA Machine Learning group
LA Machine Learning
See more events
LA Machine Learning
Photo of LA Machine Learning group
No ratings yet
eHarmony Inc
2401 Colorado Ave, Suite A200 (2nd Floor) · Santa Monica, CA