Mining of Massive Datasets using Locality Sensitive Hashing (LSH) presented by J Singh and David Weisman
In the 50th Anniversary issue of Communications of the ACM in 2008, two pieces of "Breakthrough Research" were cited. One was Map Reduce, the other was clustering based on Locality Sensitive Hashing (LSH).
Locality Sensitive Hashing is for large data sets. Want to know if a piece of writing was plagiarized from the web and modified slightly so as not to be an exact match? Want to see if you have pictures of a suspect in your archives? Curious about where a fragment of Fruit Fly DNA might occur in Humans? LSH will get you there faster than most other techniques.
J Singh has been the CTO of various startups and early stage companies in the Boston area, architecting cloud-based platforms and helping bring them to market. He has been an invited speaker at many technology seminars and venture forums, genrally speaking on cloud computing and Big Data. J is a Principal at DataThinks and also teaches part time at WPI.
David Weisman is a data scientist consultant with over 35 years of experience in the software field. In addition to consulting, he is a researcher at the University of Massachusetts Boston, working at the intersection of molecular biology and data mining. David is searching for cancer biomarkers in enormous volumes of DNA sequence data, identifying biosensors of environmental pollutants in bacterial and plant transcriptomic data, and teaching bioinformatics courses. Prior to obtaining his recent Ph.D. in molecular biology, David ran a long-term successful software consulting firm, specializing in distributed system development, compiler design, operating system development, quantitative finance, network security, and health care informatics.