42nd meetup



NOTE: a valid photo ID is required by building security. Please use your full real names when signing up, otherwise you may be refused entry!



As always, there'll be free beer and pizza, generously provided by our host AHL.

We are issuing tickets via a lottery - if you want to be in with a chance of a place - sign up for the waitlist! The lottery will be run approx 1 week before the meetup, and we will re-run the lottery to fill any spaces that free up or use the waitlist towards the time of the event.


Soledad Galli on "Feature Engineering for Machine Learning"

Machine Learning algorithms can determine patterns in past data and use them to predict behaviour in future data. To this end, machine learning models learn from existing data. However, available data in business is generally not ready for used in machine learning modelling. Instead, an extensive amount of time is often devoted to pre-process the data to leave it suitable for its use in machine learning.

In this talk, I will cover some of the typical problems found in data, how they affect the different machine learning algorithms, and how we can pre-process the data in order to account for these problems and minimise their impact in the algorithm predictive performance.

I will begin by describing common issues in data preparation related typically to processing numerical and categorical variables. I will highlight which machine learning models are susceptible to which variable problems. Finally, I will introduce and compare different techniques for imputation of missing data, processing of outliers and encoding of categorical variables.

I have gathered the techniques described here into a recently launched course in Udemy (www.udemy.com/feature-engineering-for-machine-learning). The course involves code in Python for feature engineering, using Pandas, Numpy and Scikit-Learn. Throughout the talk, we will focus both on the limitations and advantages of each technique, highlighting how we usually approach feature engineering within a business setting.


Richard J Brooker on "Trump, Alzheimer's and AI"

Donald Trump is the oldest US president to assume office, and some people have started to worry about his health. There is even concerns that he has the early signs of Alzheimer’s.
Recent explorations in machine learning have been able to predict Alzheimer’s by looking at patterns in the patient’s speech. Could we use these techniques to learn something about Trump?
In this talk we look at a dataset provided by DementiaBank. We use machine learning and natural language processing to see how well we can predict Alzheimer’s. We then look at a set of interviews with Trump and see what we can learn from there.

We will primarily be using Spacy and scikit-learn.

Lightning talks:

James Salsman on "Intelligibility Prediction for Pronunciation Remediation"

Speech recognition assesses English learners' pronunciation usingauthentic intelligibility to predict whether transcriptionists cantype what they are supposed to say using Python's Scikit-learn supportvector machine classifier (SVC.) Carnegie Mellon PocketSphinx is usedin alignment mode with many recognition passes to find substitutionand deletion of expected expected phonemes and insertion of unexpectedphonemes along with differences of phonetic place, closure,roundedness, voicing, and the proportion of physiologicallyneighboring phonemes less likely in PocketSphinx's n-best results. SVCmodels achieve 82% agreement with the accuracy of crowdworkers'stranscriptions, up from 75% reported by the inventor of the techniqueand Educational Testing Service. After asking learners to pronounce aword, if the SVC model suggests it was pronounced intelligibly, wethen ask them to repeat it after offering audio with the wordpronounced correctly and their worst phoneme amplified and extended induration.


Doors open at 6.30 (get there early as you have to sign-in via AHL's security), talks start at 7 pm, beers from 9 pm in the bar. We normally have > 200 folks in the room so there's plenty of people to discuss data science questions with!

Please unRSVP in good time if you realise you can't make it. We're limited by building security on the number of attendees, so please free up your place for your fellow community members!

Follow @pydatalondon (https://twitter.com/pydatalondon) for updates and early announcements. See you on the 6th!