When you start to analyze free text, often the first task is to figure out what language it's in. The tools that help with that are also broadly used for other tasks in text understanding. We will touch on advanced algorithms, data structures, and numerical methods, but the workshop is meant to be accessible to anyone comfortable with dictionaries in python and basic probability.
Preparation: Know about python dictionaries and have familiarty with basic probability. Bring a laptop with python 3 and git. Please make sure your laptop is wireless enabled and fully charged.
7-8pm Roll out your own character bigram language identifier.
8-9p Show and tell with scikit-learn, langid, and the Google Translate API.
Instructor Bio: Gregory Marton is a software engineer doing natural language processing at Google. He was a CS major with linguistics concentration at Maryland, and studied natural language technologies at MIT.
**Food and drink provided**