Intro to Natural Language Text Mining - Short Course


Details
Intro to Natural Language Text Mining - Short Course
This class will cover machine learning applied to natural language text documents. We will cover the use of statistical algorithms for accomplishing machine learning tasks on texts. We won't cover more traditional rule-based semantics, parsing, etc.
We'll start with some introduction to the subject matter, comparison of statistical techniques to semantic approaches, definition of problems in text mining, and simple text manipulations. We'll cover various algorithms for dealing with standard text mining problems, such as indexing, automatic classification (e.g. spam filtering) topic modeling, classification etc.
Course Outline
I. Intro to text mining problems
II. R language background
III Basic text manipulations
-Normalization
-Stop words
-Stemming
IV Document-Term Matrix Processing
-Formation and Basic Manipulations of Document-Term Matrix
-Latent Semantic Indexing - Search
-Topic Modelling - Clustering and Classification
-Spam Detection.
Prerequisites - Programming experience is required. We'll use code examples to work through the material. We'll use R programming language so you should have R installed and R Studio. There will be a short intro to R for those who haven't used it. Other than that you'll only need general undergrad level background math.
Class Registration
http://textmining.eventbrite.com
There's a $100 discount if you sign up at least 5 days before the class starts.
Those who don't register on Eventbrite can register and pay by check or cash the day of class. In-class registration will go from 9:00 am until 9:30 am.
Web-Cast
The class will be webcast for those who want to view remotely. You'll need to sign up on eventbrite, if you want the to receive the webcast.

Intro to Natural Language Text Mining - Short Course