Effective Machine Learning with Apache Spark
Details
AGENDA
6.30 - 7.00 pm - Drinks & Networking
7:00 - Introduction
7.05 - 7.30 pm - Machine learning in Finance
7.30 -7.45 pm - Break/Q&A
7.45 - 8.30 pm - Lessons from the field - Databricks
Apache Spark is a powerful open source processing engine built around speed, ease of use, and sophisticated analytics. Spark has seen rapid adoption by enterprises across a wide range of industries. Internet powerhouses such as Yahoo, eBay and Netflix have deployed Spark at massive scale, processing multiple petabytes of data on clusters of over 8,000 nodes. Apache Spark has also become the largest open source community in big data, with over 1000 contributors from 250+ organizations.
Databricks Unified Analytics Platform, from the original creators of Apache Spark™, unifies data science and engineering across the Machine Learning lifecycle from data preparation, to experimentation and deployment of ML applications. Come get an idea of what spark is about and how you can leverage big data and AI to build machine learning products with Holly Smith from databricks. We also have Maša talking about how she is using spark at HSBC for her data science internship.
TALKS
7.00 - 7.30 pm Machine Learning in Finance
Speaker: Maša Vujović
Talk: My core interest is at the intersection of how humans and machines learn. I am currently writing up my PhD thesis on the computational mechanisms of human language learning at UCL, and I am doing a Data Science and Engineering internship at HSBC, where my team and I are using big data and machine learning to understand client behaviour. We use spark/hadoop to comb through millions of data points and develop insights to help transform business problems in finance to intelligent ai powered data products.
7.45 - 8.30 pm Lessons from the field
Speaker: Holly Smith, Customer Success Engineer, Databricks
Talk: Here at Databricks we have combined expertise of literally hundreds of years. Over this time we've accumulated a wealth of experience of what to do, but more importantly ...what not to do. A terrifying 85% of data science project fail, and our perspective means we see the whole spectrum. This talk will take you through a variety of stumbling points seen in data science projects, from the pitfalls of one hot encoding and Bayes theory to the big picture decisions about remediating model drift and making sure your model is used in the right way.
