Dual geometries of high-dimensional data sets


Details
This month we have Sarah Constantin presenting "Dual geometries of high-dimensional data sets".
Abstract:
When a dataset consists of many samples, each consisting of many features (for example, a database of questionnaire responses) there are two complementary ways to organize the structure of the dataset: similar samples are those whose features are similar, and related features are those which are shared between similar samples. An iterative procedure, alternating between these dual geometries, simultaneously clusters both features and samples, in a robust fashion. This method can compress, denoise, and organize large, high-dimensional datasets. Relatedly, there's a duality between the geometry of a surface and the geometry of the Laplacian eigenfunctions on that surface, though there are many open problems surrounding the nature of that relationship. This talk will be based on past work by Dr. Ronald Coifman and my in-progress research.
Bio:
Sarah Constantin is a PhD student in mathematics at Yale University. Her research interests are in harmonic analysis with applications to machine learning. She's published in Communications of Pure and Applied Analysis and presented at several conferences. She received her BA from Princeton in 2010.

Dual geometries of high-dimensional data sets