Gradient Boosting Machines (GBMs) in the Age of LLMs and ChatGPT
Szilard Pafka, PhD
Chief Scientist, Epoch
Gradient Boosting Machines (GBMs) have been considered (in the last decade) as the best machine learning algorithm (in terms of highest accuracy) for supervised learning/predictive analytics with structured/tabular data (widely encountered in business applications). Are they still relevant in the age of Large Language Models (LLMs) and ChatGPT? This talk will tackle this very question and will also present updates to the author's GBM-perf benchmark (available on GitHub) including the newest results of training XGBoost and LightGBM on monster CPU servers (192 cores on c7i.metal-48xl and c7a.metal-48xl) and powerful GPUs (A100, H100).
Bio:
Szilard studied Physics in the 90s and obtained a PhD by using statistical methods to analyze the risk of financial portfolios. He worked in finance, then in 2006 he moved to become the Chief Scientist of a tech company in Santa Monica, California doing everything data (analysis, modeling, data visualization, machine learning, data infrastructure etc). He was the founder/organizer of several meetups in the Los Angeles area (R, data science etc) and the data science community website datascience.la for more than a decade until he relocated to Texas in 2021. He is the author of a well-known machine learning benchmark on github (1000+ stars), a frequent speaker at conferences (keynote/invited at KDD, R-finance, Crunch, eRum and contributed at useR!, PAW, EARL, H2O World, Data Science Pop-up, Dataworks Summit etc.), and he has developed and taught graduate data science and machine learning courses as a visiting professor at two universities (UCLA in California and CEU in Europe).
LinkedIn: https://www.linkedin.com/in/szilard
Twitter: https://twitter.com/SzilardPafka/
Github: https://github.com/szilard/