What we're about

This group is for everyone working in Natural Language Processing technologies and applications. Our online meetings (for now) and Philadelphia meetings (eventually) will provide opportunities to hear about and present innovative work and research, to learn about emerging technologies, network, and exchange ideas and brainstorm. Topics will include machine learning, computational linguistics, text analytics, speech processing, conversational systems, sentiment and emotion AI, and search and applications in finance, customer experience, online and social media, health sciences, and more.

If you're actively involved in NLP or just want to learn more, please join us, and follow us on Twitter at @PHLNLP ( https://twitter.com/PHLNLP ).

Upcoming events (2)

Amazon DataTuner: End-to-End Neural Data-to-Text Generation

Hamza Harkous and Isabel Groves will present Amazon DataTuner as described in their COLING 2020 paper written with Amir Saffari, "Have Your Text and Use It Too! End-to-End Neural Data-to-Text Generation with Semantic Fidelity," https://www.aclweb.org/anthology/2020.coling-main.218/

Abstract: Data-to-text generation converts information from a structured format such as a table into natural language. This allows structured information to be read or listened to, as when a device displays a weather forecast or a voice assistant answers a question.

Language models trained on billions of sentences learn common linguistic patterns and can generate natural-sounding sentences by predicting likely sequences of words. However, in data-to-text generation we want to generate language that not only is fluent but also conveys content accurately.

Some approaches to data-to-text generation use a pipeline of machine learning models to turn the data into text, but this can be labor intensive to create, and pipelining poses the risk that errors in one step will compound in later steps.

In this talk, we present a neural, end-to-end, data-to-text generation system called DataTuner, which can be used for a variety of data types and topics to generate fluent and accurate texts. We also show how we evaluated this system’s performance, comparing it to the state of the art in this domain.

This work has been done at Amazon Alexa.

Bios:

Isabel Groves is a Research Scientist at Amazon Alexa, and previously studied linguistics at the University of Edinburgh and Aix-Marseille Université.

Hamza Harkous is currently a Research Scientist at Google. He works at the intersection of NLP and data privacy. He has previously worked at Amazon Alexa and received his PhD from EPFL, Switzerland.

--
By responding here, you acknowledge and consent to our Code of Conduct: We seek to provide a respectful, friendly, professional experience for everyone regardless of gender, sexual orientation, physical appearance, disability, age, race, and religion. We do not tolerate behavior that is harassing or degrading to any individual, in any form. Participants are responsible for knowing and abiding by these standards. We encourage all attendees to assist in creating a welcoming, safe, and respectful experience.

We are grateful for meetup support provided by Kensho (https://www.kensho.com/), AI & Machine Learning Driving Essential Intelligence, and by John Snow Labs (https://www.johnsnowlabs.com/), publisher of the Spark NLP, an open source text processing library for Python, Java, and Scala.

Aggregating Big Topic Models At Scale + Anacode's AI Trends&News Monitor

Our July 6, 2021 program features two speakers, Jason Lee on Aggregating Big Topic Models At Scale and Janna Lipenkova from Anacode on their AI Trends&News Monitor. The program starts at 12 noon US-Eastern / 9 am US-Pacific / 5 pm UK / 6 pm CEST.

Presentation #1 is a solution spotlight presented by Anacode CEO Janna Lipenkova on their AI Trends&News Monitor, which allows you to stay up-to-date on all things AI, and the NLP pipeline behind the solution. The aim is to disentangle the global AI landscape. AI Trends&News Monitor collects large quantities of online news and articles from the Web. It then uses NLP and a specialised AI ontology to extract relevant categories, aggregate related information and provide you with insights at your desired level of detail. After a brief demo, Janna will dive into the underlying technology.

Dr. Janna Lipenkova (www.jannalipenkova.com) holds a PhD in Computational Linguistics from the Free University in Berlin. Currently, she runs two analytics companies, Anacode and Equintel, and focuses on distilling hidden knowledge from unstructured data, thus helping businesses to generate a unique information advantage.

Presentation #2 is Aggregating Big Topic Models At Scale with Jason Lee.

Abstract: As useful as topic modeling is, it is still often like reading tea leaves. This presentation describes some proposed improvements by which multiple topic model runs are aggregated to create more coherent, consistent topics with relative confidence in the topics, and without having to define a target count for topic models. This approach has been used to generate the models behind this recent paper in HDSR on COVID-19 research [hdsr.mitpress.mit.edu]. There are also suggestions of approaches to make the associated documents more relevant, and describe a pipeline by which the models can be generated efficiently at scale.

This was a personal project creating a pipeline to generate topic models using Apache Spark and then aggregate the topics, and subsequently create unique visualizations to better understand the data. There were experiments performed on various datasets, including Pubmed abstracts, US Patent Office grants, and the CORD-19 dataset for COVID research. As a note, the first dataset to have been experimented on was the EEBO (Early English Books Online) corpus.

Jason Lee is currently a Staff Software Engineer at Flatiron Health, working on the integration of radiology scans into the EHR and other pipelines. He previously worked in financial services. This project is independent of his work at his current employer.

--

By responding here, you acknowledge and consent to our Code of Conduct: We seek to provide a respectful, friendly, professional experience for everyone regardless of gender, sexual orientation, physical appearance, disability, age, race, and religion. We do not tolerate behavior that is harassing or degrading to any individual, in any form. Participants are responsible for knowing and abiding by these standards. We encourage all attendees to assist in creating a welcoming, safe, and respectful experience.

We are grateful for meetup support provided by Kensho (https://www.kensho.com/), AI & Machine Learning Driving Essential Intelligence, and by John Snow Labs (https://www.johnsnowlabs.com/), publisher of the Spark NLP, an open source text processing library for Python, Java, and Scala.

Photos (24)