NLP in practice - From bank transactions to non-profit organizations

PyData Amsterdam
PyData Amsterdam
Openbare groep


Nieuwendammerkade 26A-5 · Amsterdam

Hoe vind je ons

For directions visit

Locatieafbeelding van evenementslocatie

Wat we doen

Hi folks!
This time we're co-organising meetup with PyLadies Amsterdam (, and it's all about applications for NLP.

# The Program
18:00 - 18:20 walk-in (food and drinks)
18:20 - 18:30 Introduction by Shubha Guha from Textkernel
18:30 - 19:00 Talk1: Determining translation urgency by Saskia Lensink
19:00 - 19:15 break
19:15 - 19:45 Talk2: Bank Import Classification by Estelle Rambier
19:45 - 20:00 break
20:00 - 20:30 Talk3: Matching people and jobs: a neural IR approach by Textkernel
20:30 - closing: mingle

# Talk 1: Determining translation urgency: Mining relevant information in times of crisis
During Hackathon for Peace, Justice, and Security 2019 the non-profit Translators without Borders posed a big challenge: how to determine which documents need to be translated first in times of crises? This non-profit provides translation services for humanitarian non-profits and often has few translators available and too many documents to translate. Therefore, they were looking for ways to automatically prioritize the translations. Although they did not have an annotated corpus, they did have clear ideas about the most important themes for different types of events. By combining text mining and natural language processing tools in Python, we helped build a system that aids in prioritizing texts and gauging the impact a translation would make; and won this challenge.

Bio Saskia Lensink
After finishing her Ph.D. in experimental linguistics, Saskia has been working as a data scientist at CGI, an IT consultancy, on a wide array of data science projects, ranging from incident management on the road to mining governmental reports.

# Talk 2: Bank Import Classification
One of the most time-consuming tasks for accountants is the right category classification of bank transactions. As data scientists at Exact, we thought our computers could do it for them. It turned out to be a challenging, 900 class, classification problem with a short bank description as a predictor. We will dig deep into our text processing techniques, own word embedding, and the production model that we are running today.

Bio Estelle Rambier
Estelle Rambier is a data scientist at Exact, an accountancy software that aims to make bookkeeping automatic. Graduated with honors with a Master in mathematics in 2017, she joined Exact last February and is working with inconsistent times series, challenging NLP problems, undetectable outliers, and data maverick colleagues.

# Talk 3: Matching people and jobs: a neural information retrieval approach
Neural IR approaches are gaining popularity as methods to overcome the limitations of exact term matching presented in many search systems. An essential ingredient here is the representation of queries and documents in a shared embedding space.
Matching people and jobs can be cast as an IR problem with CV as a query and job ads as documents or vice-versa. Successful neural IR in such a domain brings two additional challenges. First, the queries are entire documents rather than user-inputted terms. In addition, there is a specific “vocabulary gap” between the query documents and the result documents: job ads tend to be shorter than CVs and use different language/terms (e.g. “You should be familiar with …” vs “I have experience with …”). Due to these challenges, standard document embedding techniques (e.g. doc2vec) failed to accomplish our goal. We want to build document embeddings from entire CVs and job ads such that, within this continuous semantic space, looking for good matching jobs for a CV becomes the nearest neighbor search. To solve this, we develop a new embedding approach using a Siamese CNN architecture trained on a dataset of real examples of people applying to jobs.

Bio Mihai Rotaru - Head of R&D at Textkernel

Bio Luigi Lorato
Research Engineer at Textkernel