Skip to content

PyData Berlin October Meetup

Photo of Adrin
Hosted By
Adrin and 4 others
PyData Berlin October Meetup

Details

Welcome to the October Virtual Meetup

The talks will start at 19:00

The link to the Zoom meeting will be sent to all attendees about an hour before the meetup and there will be a YouTube live stream for those not on the zoom call.

Talk 1 by Limor Gultchin : Long Story Short: Using BERT for abstractive text summarization on a small, curated corpus

Abstract:
Machine Learning provides a myriad of exciting new ways to extract and analyze data from the ever growing number of information sources we have today. While the internet indeed provides vast amounts of high-quality data, a lot of information is still enclosed in documents, and PDF documents in particualr. To unlock their potential, OCR and other information retrieval tools already provide a convenient way to extract knowledge from well structured files. An obstacle remains in the realm of tables: the format in which most quantitive information in documents are stored. And tables, while they can be extracted, usually only make sense in the context of their original document. In this talk, I'm going to share my experience working on a project to automatically compose informative table titles, using the powerful NLP model BERT, and connect the task to generative abstractive text summarization, for a specialized domain with limited amounts of data.

Bio:
Limor is a PhD student in Machine Learning and Causal Inference at the computer science department in the University of Oxford, and at the Alan Turing Institute. Her current research interests are in Causal Inference in the service of Responsible ML, but previously she worked on Natural Language Processing, ML for social science research and computational humor. Limor will be very happy to discuss any of those topics in the Q&A.

Talk 2 by Arnault Chazareix: Building a NLP pipeline to detect relationship between fictional characters

Abstract:
Who's this guy again ? Every time we start watching the new season of our favorite show, we find ourselves asking this question. What if we didn't need to binge watch all the previous seasons ? What if we could just look at a graph summarizing all the characters and their relationship to one another ? All this information is available on the Internet, but there is no easy way to use it because the data is unstructured.
This talk will demonstrate how to use Natural Language Processing to extract a relationship graph from any TV show

Bio:
Arnault studied at Centrale Paris (a french engineering Grande Ecole) in Computer Science & AI. He interned as a NLP data scientist at Feedly in Palo Alto. Nowadays Arnault is working as a lead data scientist at Sicara, a Data consulting startup specialized in Computer Vision.
He specializes in Detection and Few-Shot Learning, and building great data sets. He is interested in the ability to transform unstructured "human" data (text, images, video, sound...) into structured data.

----------------------------------------------------------------------------------------------------
NumFOCUS Code of Conduct
https://numfocus.org/code-of-conduct
Please have a look at the comment section for the short version of our Code of Conduct.

Photo of PyData Berlin group
PyData Berlin
See more events
Online event
This event has passed