Skip to content

PyData Meetup - November 2016 - PySpark/Predictive Modelling

T
Hosted By
Talha O.
PyData Meetup - November 2016 - PySpark/Predictive Modelling

Details

https://a248.e.akamai.net/secure.meetupstatic.com/photos/event/8/1/7/a/600_445653146.jpeg

Agenda

• 6:45pm - 7:00pm Networking

• 7:00pm - 8:00pm Presentation: PySpark case - using Random Forest for binary classification problem

• 8:00pm-8:10pm break

• 8:10-9:00pm Presentation: Incremental Data Processing with Apache Spark on Azure HDInsight

Details:

PySpark case - using Random Forest for binary classification problem

Synopsis: A binary classification problem (products recommendation) using PySpark on hadoop platform is presented. Specifically, presentation using ipython notebook will go through details such as - 1) data pre-processing, 2) Using mllib random forest classifier for binary classification, 3) Measuring performance using AUC score, 4) Different strategies to handle the problem of unbalanced dataset

Speaker: Weimin Wang - works as Data Scientist in Merck Singapore. During his job, he focuses on Advanced Analytics and Bioinformatics Research. With solid knowledge in Data Mining and Machine Learning. Weimin is also actively involved in Data Science competitions like Kaggle and Data Science Game. His interests lie in Machine Learning, Deep Learning and Natural Language Processing.

Incremental Data Processing with Apache Spark on Azure HDInsight

Synopsis: Social media, like Facebook and Twitter, have data feeds that contain a wealth of information that can aid in trend discovery. We recently worked with the United Nations to parse incoming social feed information to enable the UN to watch for trending keywords that could alert them to potential humanitarian crises like food shortages and terrorist attacks. One typical pattern for efficiently processing evolving datasets like this is to process the new slice of data incrementally and merge the results with previous results. In this talk, I will discuss how you can use incremental processing to make your data pipeline more efficient.

Speaker: Rita Zhang ( https://twitter.com/ritazzhang ) is an Open Source Engineer at Microsoft, based in San Francisco, hacking away with engineering teams, open source communities, and startups using emerging open source technologies, and sharing technical collaterals with the developer community. During her spare time, she develops new smart home gadgets for her startup.

Updates

• Our friends @Quantopian our holding QuantCon on Nov 10-12th. Their program includes: Interactive workshops and expert talks on algorithmic trading, machine learning, and Python. Special 10% discount code on any ticket for our members: QuantConMeetup. RSVP: http://quantcon.sg/ (http://l.facebook.com/l.php?u=http%3A%2F%2Fquantcon.sg%2F&h=RAQHNdsLaAQEzFrUXQlvHGSOQg_Vyvzra8KThLzkfKMwlxQ&enc=AZMsUfGykuYR2P2XJMxMnM-Ep5avAxEir1azYnFPrdokx230jBuCYIzfNqLmqeY9ohC6Qzbz7TRg1JF-LuyhhM_LbcIdZTQDiYxzuY1EgZuVYXHAQ57_xJobYiPZRBRB9kfIKQbL6XSmdddARBV4G7Us45i8tD6ORyAoQkrPYIDuVQ&s=1).

• We will give away one free pass for Quantcon in our meet up, there will be a lucky draw for that :)

• Also good friends with us, O'Reilly's Strata + Hadoop World in Singapore, happening December 5-8, is where cutting-edge data science and new business fundamentals merge. It’s a deep-immersion experience where data scientists, analysts, and business executives dissect case studies, develop new skills, share emerging best practices, and build the future of big data. Save 20% with discount code UGPYDASG. Check out the impressive agenda and speaker lineup.

Join us on Facebook and Twitter

https://www.facebook.com/groups/pydatasg/
https://www.twitter.com/pydatasg

Sponsors

https://a248.e.akamai.net/secure.meetupstatic.com/photos/event/3/2/c/c/600_445153004.jpeg

https://a248.e.akamai.net/secure.meetupstatic.com/photos/event/d/1/3/3/600_443873555.jpeg

https://a248.e.akamai.net/secure.meetupstatic.com/photos/event/3/9/c/3/600_445874787.jpeg

Group Sponsor

https://a248.e.akamai.net/secure.meetupstatic.com/photos/event/8/2/b/8/600_445653464.jpeg

Photo of PyData Singapore group
PyData Singapore
See more events