Applying Bayesian Methods to Clustering Models (feat. Memorial Sloan Kettering)

This is a past event

184 people went

General Assembly

902 Broadway 4th Floor · New York, NY

How to find us

4th Floor - Please note this is the General Assembly at 902 Broadway (on 20th and Broadway), not the one on E 21st.

Location image of event venue

Details

Please RSVP on both the Meetup site and General Assembly here: https://generalassemb.ly/education/applying-bayesian-methods-to-clustering-models-feat-memorial-sloan-kettering/new-york-city/84806

Tentative Schedule:
6:30pm: Pizza + Beer networking
7:00pm: TBD with Data Scientist at Dataiku
7:30pm: A Bayesian Approach To Model Overlapping Objects Available As Distance Data with Sandhya Prabhakaran, Research Fellow at Memorial Sloan Kettering Cancer Centre

Talk Abstracts:
A Bayesian Approach To Model Overlapping Objects Available As Distance Data with Sandhya Prabhakaran, Research Fellow at Memorial Sloan Kettering Cancer Centre:
Traditional clustering methods often partition objects into mutually exclusive clusters - however, it's more realistic that objects may belong to multiple, overlapping clusters. When healthcare data is available in pairwise distances -- such as in genomic string alignments, protein contact maps, or pairwise patient similarities - there is no probabilistic clustering model that allows such overlap, and solutions for these types of models are often noisy and heavily biased. Therefore, it would be advantageous to have a model which caters to clustering distance data directly.

In this talk, we'll address this problem and introduce a Probabilistic model for Overlapping Clustering on Distance data (POCD) that gives objects the freedom to belong to one or more clusters at the same time. Since POCD is a probabilistic model, on output we obtain samples from a distribution over partitions and use an Indian Buffet Process (IBP) beforehand to remove the need to pre-emptively fix the number of overlapping clusters. We will demonstrate the benefits of working with distances directly and the utility of POCD in both simulated as well as real world distance data of neonatal patients and HIV1 protease inhibitor contact maps.

(This is joint work with Julia E. Vogt (Department of Computer Science, ETH, Switzerland and Swiss Institute of Bioinformatics (SIB), Basel, Switzerland))

Speaker bios:
Sandhya has been a Research Fellow at Memorial Sloan Kettering Cancer Centre beginning in December of 2016. Before that, from October 2014, she was a Research Scientist at the same lab at Columbia University in the City of New York. Sandhya received her Ph.D degree from the Department of Mathematics and Computer Science, University of Basel, Switzerland and her Masters in Intelligent Systems (Robotics) from University of Edinburgh, Scotland. My research deals with developing statistical theory and inference models, particularly to problems in Cancer Biology. Prior to academics, she was an Assembler programmer working with the Mainframe Operating System (z/OS) at IBM Software Laboratories, Bangalore and has developed Mainframe applications at UST Global, Kerala. She is an avid hiker and distance runner and has completed 4 of the 6 World Marathon Majors. Webpage: www.sandhyaprabhakaran.com
Twitter: @sandhya212