Note: Please use your full real names where signing up, otherwise we have problems with building security.
As always, there'll be free beer and pizza, generously provided by AHL.
Jason McFall (https://twitter.com/JasonMcFall) on Privacy and data science
Data science on customer data opens up huge opportunities, both for economic benefit and social good. But as datasets become richer, individual privacy comes under threat, and indeed responsible organisations are blocked from innovating because they have no way to guarantee privacy.
Technology has created this problem, and technology can solve it.
I'll talk about Privacy Engineering techniques that enable the safe and effective use of data, including tokenisation and masking, statistical generalisation and blurring of data (such as k-anonymity), controlled privacy-preserving querying of data (such as differential privacy), homomorphic encryption and randomised response. I'll describe the state of the art, and outline the hard problems that must be solved next.
Andraz Hribernik (https://twitter.com/ahribo) on NLP In 10 Lines of Code
At Cytora, our production system works 24/7 to transform billions of pieces of unstructured web data into structured data sets. This is a huge job, and we use spaCy to help us on a daily basis.
SpaCy is an easy-to-use open source Python NLP library that excels at large-scale information extraction. It supports tokenization, sentence segmentation, named entity recognition, part of speech tagging and dependency parsing.
During this talk, we are going to demonstrate some of spaCy's core functionalities by performing a simple NLP analysis on Jane Austen's Pride and Prejudice.
Here's what we will achieve during this analysis:
- Extract the character names from the book (e.g. Elizabeth, Darcy, Bingley)
- Visualise character occurrences with regards to their relative position in the book (e.g. are specific characters mentioned more in the beginning of the book and others more towards the end?)
- Describe Mr Darcy's character using syntactic dependencies
Lev Konstantinovskiy (https://twitter.com/teagermylk) on NLP in Python: next gen of word embeddings
Word embeddings are ways to find if two text documents are on the same topic or if they are completely different. The most popular one is Google’s word2vec but there has been new ones like WordRank and FastText. I will tell what is the difference between them.
Doors open at 6.30 (get there early as you have to sign-in via AHL's security), talks start at 7pm, beers from 9pm in the bar. We normally have > 200 folk in the room so there's plenty of people to discuss data science questions with!
Please unRSVP in good time if you realise you can't make it. We're limited by building security on number of attendees, so please free up your place for your fellow community members!
Follow @pydatalondon (https://twitter.com/pydatalondon) for updates and early announcements. See you on the 7th!