We are taking August off but our talk series and networking event will be back in full force for September. We are excited to announce in this session of Boston Data Mining our guest speaker Matthew Eaton PhD, a researcher from the Computer Science and Artificial Intelligence Lab of MIT and the Broad Institute. Matt will give a general introduction to matrix decomposition, focusing on what sorts of inputs you should expect to supply and the sort of output you'll get. He will cover some introductory info but quickly dive into the details so this presentation is also aimed at advanced practitioners. Additionally Matt will cover some real-world examples, and provide code samples in R.
Here’s how Matt describes his work in relation to matrix decomposition:
“The real world data that we collect for the purposes of data mining are more often than not generated by the combined effect of multiple, unseen, latent processes. Matrix decompositions are a class of algorithms that attempt to separate out a matrix of information into the distinct effects that generated it into its constitutive components. Take as an example stock market data; each stock price is the additive combination of a number of different factors. For example recent company news, overall market and sector health, and competitor performance can all contribute to create the final stock price, and matrix decomposition represents a class of algorithms that attempt to separate out those factors from the overall stock price. It can also be used in internet link mining, market basket prediction, and any other application where large amounts of data are generated by the combination of a number of unseen factors. In my own research I use matrix decomposition as a data cleaning tool. I collect hundreds of thousands of data points on the molecular state of hundreds of Alzheimer's disease patient brains. I then use matrix decomposition to break those measurements down into the technical and disease-related factors that underlie the measurements so that I can better understand which processes are at work in my samples, and strip away the variation in the data that isn't related to Alzheimer's disease.”
Matt is currently a postdoctoral researcher in computational biology at MIT in the lab of Prof. Manolis Kellis. He works on integrating genetic, epigenetic, and genomic data to elucidate mechanism in complex human disease. The toolbox that he employs for these studies draws from the intersection of statistics, computer science, and biology. His main project involves profiling the genome-wide DNA methylation patterns in 750 elderly brains, approximately half of whom had Alzheimer's disease, to detect changes in the epigenome associated with the disease. He also works on large-scale data integration projects such as the ENCODE project and the NIH Epigenome Roadmap project. He received his PhD from Duke University's Program in Computational Biology and Bioinformatics, where he examined the influence of chromatin structure on the DNA replication program in eukaryotic organisms, and also worked on the modENCODE project. Matt grew up in Maine, and received a BA in Computer Science and Philosophy from Wesleyan University in Connecticut.
Our usual networking event will follow Matt's talk. Hope to see you there.