In the past 5 years or so major new sources of genomic and epigenomic information that have arisen (ENCODE, the Roadmap Epigenomics project, the Cancer Genome Atlas, TARGET, the 1000 Genomes Project and the Personal Genome Project) along with the tools required to grapple with and integrate the resulting data. My particular case studies begin, as always, with leukemias, but have grown to encompass thousands of tumors within the Cancer Genome Atlas project, and one of the capstone papers for the project now focuses on a mechanism by which cancer cells lose developmental plasticity in a fashion that mirrors accelerated aging.
Further support for this comes from directly studying both aging stem cells and stem cells from aging individuals. In all cases, the interpretation of the data has been aided immeasurably by using hidden Markov models and, more recently, Bayesian changepoint models that take advantage of results from infinite factorial HMMs. In order to fit the models, sequence reads must often be realigned, which itself can be challenging if not done efficiently. Therefore, in the course of the presentation, we will go from raw reads to conclusions about common mechanisms at play in aging, cancer, and immunity, all originating from (and returning to) the most accessible tissue in the human body: our blood.
I will demonstrate how relatively simple methods such as the Lasso can yield fast, accurate predictors of age and disease based on DNA methylation marks, and progressively build up to the nearly inevitable, yet novel, conclusions, introducing at the end a number of Kaggle-like resources such as Sage Synapse where real prediction models can compete publicly for visibility and prizes in applications such as breast cancer recurrence modeling.
Tim Triche graduated from college with a degree in chemistry and vague ideas about a career. A job at an NSF supercomputing center brought him into contact with applied mathematicians, physicists, and engineers who created tools to discover and visualize patterns in complex data, and a concrete direction to pursue. Years later, advice from a more senior colleague at Google led him to pursue a graduate degree in computational biology (now, more specifically, in statistical genetics).