Best Algo for Tabular/Business Data? Sorry, It’s Not Deep Learning…


Details
At this event we'll discuss which algorithms are usually the best for machine learning on tabular/structured data (the type of data in the majority of business applications). Is it deep learning (including specialized methods like TabNet, TabTransformers or SAINT)?
We'll fit in a 1-hour slot, talk 35 minutes + Q&A 20 minutes (11:00-11:55am Central Time).
The zoom link will be posted in comments below 5 minutes before the event starts, and due to our zoom's 100-attendee limit, only the first 100 people will be able to join the zoom call.
Start: 9am Pacific, 11am Central, noon ET, 5pm UK, 6pm Europe
Best Algorithm for Tabular/Business Data? Sorry, It’s Not Deep Learning…
Szilard Pafka, PhD
Chief Scientist, Epoch
With all the hype about deep learning and "AI", it is not well (enough) publicized that for structured/tabular data widely encountered in business applications it is actually another machine learning algorithm, the gradient boosting machine/gradient boosted decision trees (GBM/GBDT) that most often achieves the highest accuracy in supervised learning/prediction tasks. In this talk we'll provide plenty of evidence about the vast superiority of GBMs for tabular/business data over deep learning including deep learning methods “specialized” for tabular data such as TabNet, TabTransformer or SAINT. Next, we will present some of the major open source GBM implementations such as xgboost, h2o, lightgbm and catboost (all of them available from R and Python) and we will compare their main performance characteristics: training speed, memory footprint, scaling to multiple CPU cores, GPU implementations etc. While deep learning is certainly the best algorithm available for computer vision (and it has also shown some success in a few other rather specialized domains), in most business applications, where the data is most often of a tabular structure, gradient boosted decision trees are vastly superior to deep learning neural networks and should definitely be the algorithm of choice.
Bio:
Szilard studied Physics in the 90s and obtained a PhD by using statistical methods to analyze the risk of financial portfolios. He worked in finance, then in 2006 he moved to become the Chief Scientist of a tech company in Santa Monica, California doing everything data (analysis, modeling, data visualization, machine learning, data infrastructure etc). He was the founder/organizer of several meetups in the Los Angeles area (R, data science etc) and the data science community website datascience.la for more than a decade until he relocated to Texas in 2021. He is the author of a well-known machine learning benchmark on github (1000+ stars), a frequent speaker at conferences (keynote/invited at KDD, R-finance, Crunch, eRum and contributed at useR!, PAW, EARL, H2O World, Data Science Pop-up, Dataworks Summit etc.), and he has developed and taught graduate data science and machine learning courses as a visiting professor at two universities (UCLA in California and CEU in Europe).
LinkedIn: https://www.linkedin.com/in/szilard
Twitter: https://twitter.com/SzilardPafka/
Github: https://github.com/szilard/

Best Algo for Tabular/Business Data? Sorry, It’s Not Deep Learning…