The Essentials of Text-Mining and Sentiment Analysis

Details
Tatiana Meleshko and Alex Tennant, Data Scientists at Cybera (https://www.cybera.ca/), will share their experience and knowledge on text-mining and sentiment analysis. We appreciate their generous contribution to our community.
Please use the University of Calgary Interactive room finder to locate ST147 http://ucmapspro.ucalgary.ca/RoomFinder/
Building: Science Theatres
Room: 147
As an introduction to text-mining and sentiment analysis we will outline our trials, tribulations and workflow experienced during our analysis of 65,000 pages of documents submitted to the CRTC as part of its Basic Service Objective consultation in 2015. These documents contained a variety of questions and answers from a number of invested parties, ranging from personal letters to official responses from various telecom service providers, from which we hoped to extract useful information. Faced with a variety of document types and unpredictable formats, we approached the analysis with a variety of tools to sort and process the documents.
We shall outline our use of the neo4j graph database in order categorize each document and visualize the relationships between them. We will describe the use of “fuzzy” text searches with solr as well as more more abstract searches using gensim’s doc2vec to locate and extract elements of text relevant to the questions we set out to answer. After we have outlined our process of text extraction, we discuss our approach using sentiment, N-gram, text2vec word filters, LDA topic analysis, and what can be learned from the visualization of the hidden relationships between words. Finally, we will present the resulting tool we made available for anyone to browse and explore the documents submitted to the consultation on their own. This introductory talk will provide you with a basic understanding of text-mining which will aid you in your own document processing expeditions.
This event is supported by the Pacific Institute for the Mathematical Sciences (https://www.pims.math.ca/) and the Cenovus Energy (http://www.cenovus.com/). More information can be found from the page at http://people.ucalgary.ca/~chelhee.lee/pages/crug.html
Note: The event starts from 6 PM with pizza and soda for social networking. The actual talk start from 6:30 PM.

Sponsors
The Essentials of Text-Mining and Sentiment Analysis