PyData Montreal Meetup #29 (in-person | en personne)


Details
🎒📚 It’s back-to-school season, and you know what that means - a brand new PyData meetup! We’re back at it again for yet another season of talented speakers 🔈, interesting subjects 🤌 & incredible fellow attendees 🤜🤛.
Our first event of the season will be taking place on September 19th at the Montreal Rio Tinto offices 🎉.
We have a great roster of presenters for this event: Haleemur Ali, Principal Data Engineering at Infostrux & Ishika Dhall, AI Scientist at the National Bank of Canada. See a description of their talks below 👇.
AGENDA
- 17h30 - Open doors
- 18h00 - Introduction
- 18h10 - Talk #1
- 18h50 - Break
- 19h10 - Talk #2
- 19h50 - Networking
- 20h30 - End of event
TALKS
1. Singer-spec & Meltano: code first open source ETL
By Haleemur Ali
Description of the talk :
Almost every company needs to centralize data and build EL (extract-load) pipelines. Modern SaaS solutions may be too costly or may not implement the required connectivity. Organizations often have internal tools with custom APIs that SaaS offerings are unable to interact with. The solution to these challenges has traditionally involved writing custom connectors. The singer-spec is an open source standard describing how EL connectors should be composed to generate data pipelines. The community has developed an excellent SDK enabling rapid plugin development.
In the talk, we’ll take a brief walk through the singer-spec's history, how to create new plugins using the SDK and how to use existing plugins to create data pipelines quickly in Meltano.
2. Tackling the challenges of Text Annotation using Active Learning
By Ishika Dhall
Description of the talk:
The exponential growth of digital communication channels has resulted in an increase in unstructured text data, highlighting the need for NLP techniques. NLP-driven applications are made possible due to the advancements in deep learning models. However, a large amount of labelled data is required for training deep learning models in supervised learning, thereby making labelled data an indispensable component of the process. Retrieving labelled data can be a major challenge as the task of annotating large amounts of data is laborious and error-prone. One way to mitigate this is by using Active Learning where the model selectively queries the most informative and uncertain examples for annotation, thus optimising the data labelling process. Today, our main objective is to explore the concept of Active Learning, identify potential obstacles, and discuss strategies for implementing it effectively.
We’re excited to see you there 👋

Sponsors
PyData Montreal Meetup #29 (in-person | en personne)