NLP Workshop for Beginners
Details
This is an hands-on workshop for total beginners in Natural Language Processing who are already proficient with python.
Potential audience: Whoever is interested in experimenting with this fascinating domain of NLP, whether you're a pro data scientist with no experience in NLP or not a dev guy (PM, bizdev, ...) but can code in Python.
Please register here: http://yarok-hok.com/events/nlp-workshop/
We would cover text classification in python, using sklearn, spacy, nltk and pandas, via a challenge (and competition) over real data.
This event is hands-on, All attendees MUST bring:
- A laptop with python 3.6 installed (preferably anaconda for python3)
- Download Data: http://goren.ml/pdnlp
- Clone repository: https://github.com/urigoren/nlp_classification
Agenda (Short):
16:30-17:00 - Gathering
17:00 - 18:30 - Natural language processing introduction (3 short lectures)
18:30 Hands-on workshop & Competition Start
21:00 Competition End, 'and the winner is...'
Agenda (Long):
16:30-17:00 - Gathering
17:00 - 17:15 - Natural language processing intro
- A brief overview of NLP tasks
- Supervised tasks (Named entity recognition, Sentiment analysis, classification)
- Unsuperised tasks (Text generation, machine translation, topic modeling)
- Why is NLP harder in Hebrew
- overview of the data set
17:15 - 18:00 From Textual documents to vectors - Preprocessing (String manipulations and Defining word boundaries and tokens)
- Word stems / lemmas
- Identifying phrases
- Generating custom vocabularies
- Transforming a document to a vector
- One hot word encodings
- Word Vectors (word2vec, glove)
- Combining word vectors in document vectors with the Bag Of Words assumption
18:00 - 18:30 Modelling in depth - Logistic regression for document classification
- Naive Bayes modeling
- Model evaluation
- Training and testing
- Metrics
18:30 Hands-on workshop and Competition Start
20:30 Competition End
