Skip to content

Scalable clustering and normalisation of single-cell RNA-sequencing data

Photo of Erik Bernhardsson
Hosted By
Erik B. and Max K.
Scalable clustering and normalisation of  single-cell RNA-sequencing data

Details

Abstract:

The emerging technology of single-cell RNA-seq gives access to gene expression measurements for thousands of cells, allowing discovery and characterisation of cell types via clustering. However, the data is confounded by technical variation emanating from experimental errors and cell type-specific biases. Current clustering approaches perform a global normalisation prior to analysing biological signals, which does not resolve missing data or variation dependent on latent cell types.

In this talk, I will discuss an iterative normalisation and clustering method for single-cell gene expression data called BISCUIT (ICML 2016, CELL 2018). The model is formulated as a hierarchical Bayesian mixture model with cell-specific scalings that aid the iterative normalisation and clustering of cells, teasing apart technical variation from biological signals. The approach is superior to global normalisation followed by clustering. Identifiability and weak convergence guarantees and a scalable Gibbs inference algorithm will be presented. This method improves cluster inference in both synthetic and real single-cell data compared with previous methods, and allows easy interpretation and recovery of the underlying structure and cell types.

With the launch of Human Cell Atlas (HCA) and Human Tumor Atlas (HTA) consortia, vast amounts of public single-cell data will be generated in the next decade, presenting opportunities for developing and applying ML and DL techniques appropriate to the complexity of biological systems and the challenges inherent to single-cell data. With this talk, my goal is also to bring awareness and encourage interdisciplinary efforts between theory and application in the Computational Biology domain.

Speaker bio

Sandhya Prabhakaran (https://sandhya212.github.io) is a Research Fellow at Memorial Sloan Kettering Cancer Centre, NYC. Her research deals with developing and applying statistical models to problems in Computational Biology, particularly in analysing both single-cell sequencing and imaging data. She works on clustering, network inference and sparsity selection models. She obtained her Ph.D degree from the Department of Mathematics and Computer Science, University of Basel, Switzerland and her Masters in Artificial Intelligence from University of Edinburgh, Scotland. Sandhya has received the Best Student Paper Award in ACML 2012, Best Paper Award Runner-Up in ICML 2010 and was one of the 23 global recipients of the Scottish International Scholarship 2008. She is also an avid hiker and runner.

Photo of NYC Machine Learning group
NYC Machine Learning
See more events
NYC Machine Learning
Photo of NYC Machine Learning group
No ratings yet
eBay
625 6th Ave (between 18th & 19th), 3rd floor · New York, NY