Skip to content

26th Meetup

Photo of
Hosted By
Ian O. and 11 others


Note: Please use your full real names where signing up, otherwise we have problems with building security.

Main speakers:

Pav Andre ( on WebApps using Jupyter and other Jovian tricks

Jupyter and JupyterLab are great for tinkering and research. But what if you want to re-use Jupyter notebooks in the future? What if you want to share them with colleagues who may run them concurrently? Productionizing Jupyter notebooks is the new black, and people have developed various hacks to achieve this. Prefer clean solutions? Then come and join my talk.

Roberto Vitillo ( on Growing a Data Pipeline

When dealing with Tera or Petabytes of data, even simple problems such as counting become challenging. Nowadays, there are a myriad of open-source tools available to solve this problem at scale, but it isn't always obvious beforehand what the various trade-offs are. Additionally, once the data has been collected, encouraging and promoting access to non-experts can be challenging. If you are planning to build your own data infrastructure, then this talk is for you. I will discuss some of the hurdles Mozilla faced when building its data pipeline and what was learned throughout.

Lightning Talks:

Jake Coltman ( on Using Hidden Markov Models to predict churn

Hidden Markov models are powerful tools for modeling client engagement and churn. Like many Bayesian models, they combine accuracy with producing results that are simple to interpret and communicate to other teams in the business. By the end of the talk, you will be able to run simple HMMs on your own data using hmmlearn.

Rehan Ali ( on Predicting user purchasing intent using Keras and LSTMs

We show how we built a model of customer behaviour for predicting their intent to buy a photo book from our app, Printastic, using an LSTM implemented in Keras. We train and test the model using user app usage data obtained from MixPanel. The model allows us to identify users who are close to buying and could benefit from more personalised marketing attention.

Tariq Rashid ( on collections.defaultdict or how to index stuff in 3 lines of code

Counting data items is a common task for data scientists. Python has a not-well-known built-in collections package to make this easy. The collections.defaultdict makes more complex indexing realky concise! Find out how


Doors open at 6.30 (get there early as you have to sign-in via AHL's security), talks start at 7pm, beers from 9pm in the bar. We normally have > 200 folk in the room so there's plenty of people to discuss data science questions with!

Please unRSVP if you realise you can't make it. We're limited by building security on number of attendees, so please free up your place for your fellow community members!

Follow @pydatalondon ( for updates and early announcements. See you on the 6th!