Deep Learning to Build a Global Named Entity Recognizer + An Intro to Spark NLP

Details

Our December meetup features two speakers. Kfir Bar, chief scientist at Basis Technology, will present AI in Any Language: Using Deep Learning to Build Truly Global Named Entity Recognizer, and Chris Marciniak, a data scientist with John Snow Labs, will present An Introduction to Spark NLP.

Doors open at 6 pm and our program starts at 6:30 pm. Thanks to WeWork Navy Yard for hosting us.

Chris Marciniak will introduce Spark NLP, an open source library based on Apache Spark for natural language processing at scale. Spark NLP natively extends the Spark ML pipeline API's which enable zero-copy, distributed, combined NLP & ML pipelines. The library also leverages Spark's built-in performance optimizations. The library implements core NLP algorithms including lemmatization, part of speech tagging, dependency parsing, named entity recognition, spell checking, and sentiment detection. The talk will demonstrate using these algorithms to build commonly used pipelines, using PySpark in Jupyter notebooks.

Kfir Bar has spent many years working in a wide range of NLP disciplines, including statistical machine translation, named entity recognition, and digital-humanity applications. He's a big fan of combining linguistic knowledge with sophisticated AI algorithms, for extracting the most important information from a piece of text.

His talk: Vital for medical record analysis, customer relationship management, and chatbot analytics, Named Entity Recognition (NER) is a key NLP component. The ability to accurately identify when entities—like people, organizations, and locations—are mentioned in lakes of text data helps AI applications complete knowledge tasks with human-like accuracy. However, to leverage this powerful capability, challenges related to language representations, data quality, and scalability have to be overcome. Most of the existing solutions focus on English and a few other Latin-based languages. However, the linguistic diversity of human languages introduces additional challenges.

In this talk, I will present our latest research results on NER and provide real-life examples of how we are applying these cutting-edge techniques to more than 20 different languages, including Spanish, English, Arabic, Hebrew, Persian, Korean, and Japanese.

This talk focuses on practical methods for deploying deep, multilingual NLP solutions at the enterprise level. We will discuss the difference we see in accuracy, speed, and memory footprint while comparing some of the best known deep architectures, including variations of a recurrent, as well as self-attended networks.