NLP Basics - Part 2


Details
Welcome back everyone. We are really excited to start off 2017 with local data scientist from Nextiva, Michael Capizzi, present on some more topics in natural language processing including encoding of documents and using scikit-learn to perform various data science tasks.
Summary:
This talk will be a continuation of NLP Basics Part 1 (https://www.meetup.com/Data-Science-Phoenix/events/233757492/). Last time we looked at the ways in which Natural Language Processing attempts to encode information at the word level. This time we'll look at how we encode information at the document level. We will look at the tools available in scikit-learn (http://scikit-learn.org/stable/tutorial/) to encode a document into a vector, which will allow us to determine when two or more documents are similar as well as build models for classification and other machine-learning tasks.
Like last time, we will use Jupyter (https://jupyter.readthedocs.io/en/latest/index.html) (formerly known as iPython), so feel free to bring your laptop. Instructions for preparing the tools necessary to follow along will be available in the comments below.
Note: We will do a very quick review of some of the concepts from Part 1, but you can review on your own by following this link (https://github.com/michaelcapizzi/nlp-basics/blob/master/Word-Embeddings_Demo.ipynb) to the Jupyter notebook (which will render in your browser).
Speaker Bio:
Michael Capizzi comes to Data Science as a second career. A high school English teacher for over 10 years, he has recently graduated with a Masters in Human Language Technology from the University of Arizona and now works as a Natural Language Processing Engineer at Nextiva in Scottsdale. His responsibilities range from building and evaluating classification models to building an automatic speech recognition system for voice mail transcription and keyword search. His general interests are how to harness the complexities of language so they can be leveraged as "data".
Agenda:
• 6:30-6:45pm: Sign-in & network (with Food & Beverages from ASU Research Computing)
• 6:45-7:00pm: Introduction & announcements
• 7:00-8:15pm: Presentation by Michael Capizzi
• 8:15-8:30pm: Q&A & wrap-up
Sponsor:
We appreciate ASU's Research Computing for sponsoring food and refreshments for this event.
Parking is free:
- Free covered parking - after 5:30 PM!! Address is 1551 S. Rural Rd., Tempe, AZ (https://www.google.com/maps?q=1551+S.+Rural+Rd.,+Tempe,+AZ)
http://photos2.meetupstatic.com/photos/event/e/8/0/d/600_440279405.jpeg

NLP Basics - Part 2