6.30 Welcome, free pizza+beer
Talks start at 7pm
"Creating Usable Customer Intelligence from Social Media Data: Network Analytics meets Text Mining" by Rosaria Silipo, Sr. Data Scientist @Knime
Rosaria will show an integrated approach that retrieves and combines information using text mining as well as network analysis techniques from a public forum using open data and open source tools. Each forum user is described in terms of a leadership score, calculated using a network analytics algorithm, and an attitude value, produced by a text-mining based sentiment analysis procedure.
"Scraping and parsing PDFs in Python" by Ian Hopkinson, Sr. Data Scientist @Scraperwiki
In this talk Ian will show a specific example of scraping and structuring the verbatim records of the UN which are provided as PDF files, and demonstrate some techniques using Python. He will also make some comments on the coding process for data scientists and how it contrasts with conventional software development.
Beer Break, Networking around, and Community Update
"Lucene & Hadoop: Together at Last" by Doug Cutting, Chief Architect @Cloudera
Fifteen years ago, Doug wrote a search engine called Lucene. Seven years ago, he helped to found the Hadoop project. Now these two projects are combined in a platform that, among other things, supports scalable, distributed search using Lucene on top of Hadoop. Doug will tell the story of how these projects grew and how they finally came together.
"Disambiguating brands in social media: detecting the right "apples"and "oranges" with Python and scikit-learn" by Ian Ozsvald, independent Data Scientist
Existing Named Entity Recognition APIs are poorly suited to recognising brands in tweets. Ian is building an open sourced Python tool using scikit-learn and NLTK to classify brands better than the existing APIs. This will be an early report on his progress, discussing the linguistic difficulties faced by existing tools and showing the prototype to date.
Session ends by 9.30pm-ish