33rd meetup


Note: Please use your full real names while signing up, otherwise we have problems with building security.

As always, there'll be free beer and pizza, generously provided by AHL.


Main speakers:

Ju Liu on Predicting Titanic survivors with machine learning

What's a better way to understand machine learning than a practical example? And who hasn't watched the classic 1997 movie? In this session, we will implement various machine learning techniques step-by-step to predict the chance of survival of Titanic passengers, backed by real historical data and some amazing Python libraries.

Egor Kraev on Asynchronous and streaming Python for real-time analytics

Anyone who has built a reasonably-sized analytics stack knows the 'smart' stuff is only a tiny part of it - most of the effort goes into shoveling data around and getting it into proper shape. That challenge is even harder in real-time systems - out-of-sync data streams must be aligned, data must get to where it's needed and be processed as soon as it's available, yet fast upstream data sources must not overwhelm slower downstream processing. Most importantly, all that must happen invisibly to the guy who writes the business logic, yet requires no developer support.

This talk will go over frameworks, libraries and built-in features for achieving all that in Python. We'll introduce the powerful patterns of asynchronous processing, actors, and streaming graphs, and key concepts such as backpressure. We'll illustrate all this with a toy application.

The presenter: After studying theoretical math and economic modeling, Dr. Egor Kraev has worked for nine years as a quant in top-tier investment banks, designing full-stack pricing/quoting/post-trade analytics systems. He now runs his own company, Dagon Analytics, providing consulting in Machine Learning.


Lightning talks:

Celine Boudier on Dashboard for Code For Life education tools data with Google Data Studio Beta

Code For Life is a non-profit initiative from Ocado Technology creating free, open source games to teach all students computing. We currently have a main portal website, codeforlife.education, that we are redesigning, and the first game for primary schools, Rapid Router: https://www.codeforlife.education/rapidrouter/. We started to use the new data visualization dashboards from Google Data Studio, to help us make sense of our analytics and viewers' behavior. This talk briefly covers how to set up a dashboard and what benefits we can get from such a tool.

Matti Lyra onHands on with GloVes

Distributed word representations in the form of word2vec and GloVe have become very popular in the last 5 years. While the Stanford NLP group has published their trained GloVe vectors, the data format isn't exactly convenient. I'll present a small utility I wrote `GloVe2H5` that makes accessing the pre-trained vectors much easier.

Adam Hill on Building a graph model of corporate ownership data to uncover potential corruption

In June 2016, Companies House started publishing the world’s first open data register of “beneficial owners” or “people with significant control” of companies registered in the UK. In November, DataKind UK ran a weekend DataDive in cooperation with Global Witness and OpenCorporates to explore this new data for the first time and see whether the data points to any promising leads for further investigation into cases of tax evasion and corruption.
One of the three teams at the event was tasked with building up a graph of the data and performing network analysis to produce easily reproducible analysis and queries. In this talk, we will demonstrate the general graph model we built, what we found out and the impact the analysis has already had.



Doors open at 6.30 (get there early as you have to sign-in via AHL's security), talks start at 7 pm, beers from 9 pm in the bar. We normally have > 200 folks in the room so there's plenty of people to discuss data science questions with!

Please unRSVP in good time if you realise you can't make it. We're limited by building security on the number of attendees, so please free up your place for your fellow community members!

Follow @pydatalondon (https://twitter.com/pydatalondon) for updates and early announcements. See you on the 7th!