DSPT#71 - Memes to Genes: Cracking the Human Code

Data Science Portugal (DSPT)
Data Science Portugal (DSPT)
Public group
Location image of event venue


New decade, new codes? Information is at the core of the human species and cracking the code that makes us….well us, is at the core of our curiosity. To welcome the new year we’ll dedicate this meetup to the two languages that constitute humans. First, Fernando Pais from Talkdesk, will talk to us about how Talkdesk is dealing with information retrieval and the very important task of transmitting memes, our culture’s code. Then, Nuno Barbosa Morais from iMM will take us deep into our genes and uncover how our own cell’s code can tell us a lot about our own health. Ready to crack some codes into the new year? Join us!
=== SCHEDULE ===
• 18:30-19:00: Welcome and get together
• 19:00-19:30: Talk1
• 19:40-19:45: Group photo
• 19:45-20:15: Networking / Coffee Break
• 20:15-20:45: Talk2
• 20:50: Closing
• 21:00: Dinner is optional but it might be an excellent opportunity for networking (https://doodle.com/poll/te9qv3htdisgf77v).

This meetup is sponsored by Talkdesk (https://www.talkdesk.com/) Thank you for your support!

Talk 1: Embeddings applied to information retrieval

In this talk, I will present how we are tackling this challenge at Talkdesk, using embeddings to perform sentence and topic classification. We will explore how to train embeddings using state-of-the-art sentence2vec approaches like BERT, InferSent, and Universal Sentence Encoder. We will showcase how to use those embeddings to perform intent classification and identify sentences that represent changes in topics (aka moments). With sentence embeddings, it is also possible to cluster moments that relate to specific topics. We will go over some semi-supervised approaches that group together captured moments of an interaction with manually annotated moments of previous conversations.

Fernando Pais is a Data Scientist in the Talkdesk Agent Assist team, with a background in Robotics and Computer Vision. During his master's degree, he developed an artificial emotion system, applied to artificial agents in a virtual reality environment, for human emotional elicitation. Before joining Talkdesk, Fernando was a researcher at the University of Coimbra, in the Institute of System and Robotics. He worked as part of the EuroAge project, developing systems for guiding elderly activity using deep learning and social robotics. His current work at Talkdesk consists of using Natural Language Processing for information retrieval, in the context of recommendation systems and data analytics.

Talk 2: What transcriptomic data tell us about disease

Most molecular mechanisms of disease involve strong alterations in how genes are expressed. Next-generation sequencing technologies enable us to characterise the primary products (transcripts) of most genes in biological samples and to explore them as molecular portraits of tissues’ health. These so-called transcriptomic data therefore give us an early and accurately profileable measure of cells’ response to physiological or pathological stimuli. In this talk, we will discuss how analyses of transcriptomes of large human tissue sample cohorts can help us, for example, to elucidate the biology of ageing or to unveil the functional interplay between cell types and the molecular complexity of immunity along the progression of neurodegenerative diseases and cancer.

Nuno Barbosa-Morais leads the Disease Transcriptomics research group at Instituto de Medicina Molecular (iMM) in Lisbon. His lab uses (and often develops) computational biology approaches to the analysis of next-generation sequencing data to understand how ageing-associated molecular changes in human tissues increase their proneness to diseases like cancer and neurodegenerative disorders. Originally trained as a physics engineer by Instituto Superior Técnico, Nuno converted himself into a computational biologist through a decade of international experience as a doctoral and postdoctoral researcher in Cambridge and Toronto, before joining iMM.