Introducing Spark-Native Natural Language Processing
Details
This talk starts with explaining common NLP use cases and tasks, how they commonly fit into machine learning and deep learning pipelines, and which popular open source libraries you can use to build them today (spaCy, OpenNLP, nltk). The speakers then describe the gap in providing an NLP library that is Spark native - capable of running directly on Dataframes in the JVM, be a natural extension of the spark ML API's, and provide an easily extensible NLP annotations framework. They will then go over live code examples of a newly open sourced Spark NLP library, and discuss what you can expect from it in terms of performance, accuracy, scalabilty and extensibility.
Note: Refreshments will be provided so plan to come early.
Speakers:
David Talby is Usermind's chief technology officer. David has extensive experience in building and operating web-scale data science and business platforms, as well as building world-class, Agile, distributed teams. Previously, he was with Microsoft’s Bing group, where he led business operations for Bing Shopping in the US and Europe. Earlier, he worked at Amazon both in Seattle and the UK, where he built and ran distributed teams that helped scale Amazon’s financial systems. David holds a PhD in computer science and master’s degrees in both computer science and business administration.
Alex Thomas is a data scientist at Indeed. Over his career, Alex has used natural language processing (NLP) and machine learning with clinical data, identity data, and (now) employer and jobseeker data. He has worked with Apache Spark since version 0.9 as well as NLP libraries and frameworks including UIMA and OpenNLP.
