In this session we will take a broad survey of clustering, one of the foundational ideas of data science. Clustering is an enormously practical approach for segmenting markets and products, analyzing health care data, recognizing patterns in financial instruments, detecting insurance fraud, grouping unstructured text documents, and identifying communities in social networks. We'll focus on several core ideas that broadly apply to diverse applications, and we'll demonstrate very practical clustering using RapidMiner. We want this to be an interactive session, so please bring your questions!
David Weisman is a data scientist consultant with over 35 years of experience in the software field. In addition to consulting, he is a researcher at the University of Massachusetts Boston, working at the intersection of molecular biology and data mining. David is searching for cancer biomarkers in enormous volumes of DNA sequence data, identifying biosensors of environmental pollutants in bacterial and plant transcriptomic data, and teaching bioinformatics courses. Prior to obtaining his recent Ph.D. in molecular biology, David ran a long-term successful software consulting firm, specializing in distributed system development, compiler design, operating system development, quantitative finance, network security, and health care informatics.