SF Python: Measuring garbage collection latency impact and Tokenization in NLP


Details
Want to learn more about Python and meet other Pythonistas?
Please register here. https://ti.to/sfpython/sfpython-github-08142024
π Submit your 5, 15 or 25 mins talk proposals here: https://bit.ly/sfpython-cfp
SCHEDULED ANNOUNCEMENTS/TALKS
π Opening Remarks/Sponsor Acknowledgement
Thanks to our sponsors - Neo4j - Yolande Poirier
π Measuring Garbage Collection Latency Impact (~15 mins + Q&A)
Oleksandr Pryimak - Staff Software Engineer at Thumbtack
Abstract: Python is often used for building web services especially if they need to access machine learning models (see Sagemaker Endpoints for example). Low latency responses are paramount for such an application When engineers face poor long tail latency with their Python web application they often blame garbage collection. After all, it does not have a great reputation and it is a stop-the-world GC. In this talk I will show how one can measure the GC impact on latency. I will briefly share how we use this technique in Thumbtack to show that GC is not to blame most of the time!
π Announcement: PyBay
Christopher Brousseau - PyBay Conference Chair
π Unlocking Language Understanding: A Hands-on Guide to Tokenization in NLP (~15 mins + Q&A)
Suman Debnath - Principal Developer Advocate for Machine Learning at Amazon Web Services
Suman is passionate about deep learning, natural language understanding, and large-scale distributed systems. He is also an avid fan of Python.
Abstract: In this hands on session, we will explore the critical role of tokenization in Natural Language Processing (NLP). It will cover how tokenization serves as the first step in helping machines understand human language, breaking down text into manageable segments for deeper analysis. The talk will also discuss advanced techniques such as Byte Pair Encoding and sliding windows, demonstrating how they contribute to more efficient language processing and better training data. Whether you're a beginner or an experienced practitioner, this session will provide both foundational knowledge and cutting-edge insights in an engaging manner
Note: for this meetup we're going to have more "networking" time. So come on out, have some food/drink, and chat with other Pythonistas!
AGENDA
6:30p Reconnect with friends!
7:00p Opening remarks, sponsors acknowledgement
7:10p Scheduled talks and Q&A + networking break
8:30p Wrap up last talk, more networking
THIS EVENT IS PRODUCED BY
SF Python, a volunteers-run organization aiming to foster the Python Community in the Bay Area
COVID-19 safety measures

Sponsors
SF Python: Measuring garbage collection latency impact and Tokenization in NLP