PyData @ Picnic: NLP in Action & Dockering


Details
PyData is back in the new year! We are kicking off the meetup season on Wednesday the 18th of January, from 17:30 until 21:30. The first 2023 PyData Meetup will be at the office of Picnic (Van Marwijk Kooystraat 15, right next to Overamstel metro station). Not only will it feature two inspiring talks to tickle your data needs, but also food and drinks will be provided. Öykü Kapcak will explain how NLP can be leveraged to bring structure into massive amounts of customer feedback; and Matthijs Brouns will give a primer on how to ensure that your Docker containers are as small as is practically feasible.
Schedule:
17.30-18.30: Welcome (🍕 /🍺)
18.30-19.15: Talk 1 - “Supporting Customer Success with NLP”, Öykü Kapcak
19.15-19.30: Break
19.30-20.15: Talk 2 - “How small can I get that Docker container?”, Matthijs Brouns
20.15-21.30: Networking
“Supporting Customer Success with NLP” by Öykü Kapcak
At Picnic, customers love to interact and we receive large amounts of feedback through various channels. Consequently, we use NLP to help the Customer Success team keep up with our fast growth by streamlining parts of their job. In this presentation, we will talk about how we implement text classification models to handle large numbers of customer messages, in order to reduce the workload of CS agents. We will discuss the different text classification methods we use for different types of input channels. Besides, we will talk about the productionization of these models and the learnings we gained along the way.
Öykü Kapcak is a Data Scientist at Picnic. She holds a Master’s in Computer Science from the Delft University of Technology. Her current focus is to improve demand forecasting of deliveries at Picnic using transformer models.
“How small can I get that Docker container?” by Matthijs Brouns
If you work with Docker on a regular basis you've probably been told that you should try to keep your container images small. We generally prefer smaller images because they upload faster and take up less disk space.
In this talk we'll try to build the smallest possible docker image, containing a basic PyData tools stack that includes Matplotlib, Scipy, Numpy and Scikit-learn. In particular, we'll discuss:
- The basics, such as ensuring we don't copy data into the container using .dockerignore
- Squashing layers to ensure we don't end up with redundant cached files in layers
- Selection of the base image, and find out why the often-heard advice of using an alpine base image actually doesn't help
- How we can see what is actually taking up space in our images by using a tool called dive
- How to use Multi-stage builds to make sure we only have the files we need to run our application in the image
- Optimizing package sizes by compiling them ourselves
- How Python wheels work and how they can help
At the end of this talk, you'll know some practical, and some very impractical methods you can apply to build tiny containers for the PyData stack.

Sponsors
PyData @ Picnic: NLP in Action & Dockering