Apr 9, 2013 · 5:30 PM
This location is shown only to members
Datasets come in all shapes and sizes. Some are tall and skinny (lots of samples), some are short and wide (less samples more variables), and some are both tall and wide (common for network graphs). In many cases, the number of variables becomes too large to manage effectively in memory.
Dimensionality Reduction is a common method of modeling data as a smaller representation, which maintains item similarity (as measured by the "distance" between two samples).
Unfortunately, the techniques generally employed in Dimensionality Reduction come with intimidating and unwieldy names such as Singular Value Decomposition, Principle Component Analysis, Latent Dirichlet Allocation, K-Means Clustering, Latent Semantic Analysis, Random Projections, etc. Many of the techniques involve the same underlying mathematics, they were simply developed independently for different domains.
Jeff will attempt to demystify the topic a bit by explaining what it is, why you would use it and many of the common underlying themes, terms and and approaches in everyday English.
Jeff Hansen is a Senior Data Engineer at Think Big Analytics. Yet another physicist gone rogue, when Jeff isn't busy building Big Data systems, he enjoys herding goats in France, scuba diving in Thailand and factoring prime numbers on his couch. Jeff received a BA in Physics and German from Washington University in St. Louis. As a frugal autodidact, Jeff has since paid institutions to take a number of courses for fun, but he now prefers to do most of his learning for free offline with the help of coursera.com, oyc.yale.edu and webcast.berkeley.edu.