KnoxData - a PyData group - is for anyone interested in the multitude of fields which rely on data as their life blood. The group is open to novices and experts from the business and academic worlds that have an interest in anything data-related: science, uses, analysis, mathematics, methodologies, tools, visualization, and all. The aim is to maintain a forum for connecting people around data specific topics such as tutorials and their applications, local success stories, discussions of new technologies, and best practices. All are welcome to attend, network, and present!
Text data is meaningful in many contexts, but can also be difficult to work with for machine learning because of its complexity and ambiguity. Until recently, the state of the art for natural language processing was to use word embeddings, converting each word to a fixed length vector that preserves the semantic meaning. This past year researchers have found that they could train general purpose deep language models and then apply these models in a variety of NLP tasks such as summarization, text classification, named entity recognition, question answering and many many more using transfer learning. This a watershed moment for NLP, on par with the impact of pre-trained deep neural classifiers for ImageNet that revolutionized the field of computer vision over the last 7 years.
This talk will provide some historical context on embeddings and language modeling, as well as a demo of how to adapt a deep language model pretrained on millions of sentences from Wikipedia to perform sentiment analysis and intent classification with very small sets of novel training data.
Nikhil Deshmukh is an independent data science consultant who has applied his expertise to everything from embedded low-power computer vision for autonomous vehicles, optimizing HR processes for Fortune 500 companies, and developing IoT solutions for precision agriculture in rural India. Nikhil received his PhD in Neuroscience and Molecular Biology from Princeton University, where he studied the neural circuitry in the retina that converts light into electrical signals.
Cover photo from Arisoy, E., Sainath, T. N., Kingsbury, B., & Ramabhadran, B. (2012, June). Deep neural network language models.