///// Apache Spark meetup | Spark Performance Best Practices and ML with MLflow & Mllib /////
In the next Apache Spark meetup, we are hosting Henning Kropp from Databricks, the company founded by the original authors of Apache Spark. Henning will give us an introduction about managing the machine learning lifecycle with MLflow & Mllib. Besides Henning, Mate Gulyas will talk about Apache Spark performance best practices. If you are interested in Apache Spark, don't miss this awesome event!
6:00pm - Doors open
6:30pm - Talks
8:00pm - Networking
IBM Budapest Lab
Budapest, Andrássy út 39, 1061
ACCELERATING THE MACHINE LEARNING LIFECYCLE WITH MLFLOW & MLLIB by HENNING KROPP
Machine learning development creates multiple new challenges that are not present in traditional software development. These include keeping track of the myriad inputs to an ML application (e.g., data versions, code and tuning parameters), reproducing results, and production deployment. We describe MLflow, an open source platform to streamline the machine learning lifecycle that we launched in response to these challenges at Databricks. MLflow covers three key problems: experiment management, reproducibility, and model deployment, using generic APIs that work with any ML library, algorithm and programming language. The project has a rapidly growing open source community, with over 75 contributors representing more than 30 companies since its launch in June 2018.
Henning is a Solutions Architect for Databricks, the company founded by the creators of Apache Spark. In his previous roles, Henning worked as an Architect, Data Engineer, and Data Scientist at various different companies like Hortonworks, SAP and GfK.
PERFORMANCE BEST PRACTICES WITH APACHE SPARK by MATE GULYAS
Teaching and consulting with many companies in the last 4 years, Datapao learned the hard way how developers and data scientists struggle with Spark. Slow queries, insufficient resource usage, exotic bugs and inconsistent behaviours are the key signs that developers miss a fundamental understanding of some of the underlying concepts. In this presentation, we show the top 5 issues or challenges with real-life Spark jobs, the solution and the reasons why they cause problems.
Máté is CEO and Senior Instructor at Datapao, a Big Data and Cloud consultancy and training firm, focusing on industrial applications (aka Industry 4.0). Datapao helps Fortune 500 companies kick off and mature their data analytics infrastructure by giving them Apache Spark, Big Data and Data Analytics training and consultancy. Mate also serves as Senior Instructor and Consultant in the Professional Services Team at Databricks, the company founded by the authors of Apache Spark. Previously he was Co-Founder and CTO of enbrite.ly, an award-winning Budapest based startup.
See you there!