PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other.
The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R.
The PyData Code of Conduct (http://pydata.org/code-of-conduct.html)governs this meetup. To discuss any issues or concerns relating to the code of conduct or the behavior of anyone at a PyData meetup, please contact NumFOCUS Executive Director Leah Silen (+1 512-222-5449; email@example.com ) or the group organizer.
We run monthly meetups at changing locations and have organized four conferences, in 2014, 2015, 2016 and 2017. You can see our latest meetups, submit a talk idea and read PyData blog posts on our site : https://berlin.pydata.org.
Our October meetup is hosted by Takeaway and has a strong focus on NLP. Drinks and food are kindly sponsored by Takeaway
Evelyn Trautmann, Sebastian Hansmann and Sumit Sidana
We demo a text classification approach based on an open dataset from https://world.openfoodfacts.org/. The goal of our talk is to categorize food items with various tags, where a particular food item is defined by a product’s name, a generic name as well as a brand. For classification, we use facebooks NLP model fasttext, which provides a text classification model based on word embeddings as well as character n-gram embeddings. In a first experiment, we only use a single tag and remove additional ones from each data point. In this case, the evaluation is straight forward. However, since some classes are more closely related than others, we don’t want to evaluate predictions in a binary manner as one would typically do. To this end, we implement a similarity concept and a multilabel classification approach. Additionally, we present some applications of a standardised food catalog, for instance search and recommendations.
Transfer learning has taken the NLP community by storm in the recent past. Large scale language models based on the Transformer architecture are largely behind this development. I'll give an overview of the transformer architecture and show how to utilize the recent academic advances in practice with pytorch. Finally I'll provide some concrete examples how these advances are helping Comtravo automate the travel booking process.