Distributed computing frameworks like Hadoop have revolutionized our ability to process large amounts of data. Using these tools typically requires writing complex programs in lower-level languages like Java; however, data scientists and analysts prefer to spend time in higher-level languages, such as Python. In order to address this gap, multiple open-source Python frameworks have been built to enable simple, user-friendly access to Hadoop’s underlying systems. This talk will review the different available frameworks, including a comparison of performance, ease of use/installation, differences in implementation, and other features.
Uri Laserson is a data scientist at Cloudera. Previously, he received his PhD from MIT developing applications of high-throughput DNA sequencing to immunology. During that time, he co-founded Good Start Genetics, a next-generation diagnostics company focused on genetic carrier screening. In 2012 he was selected to Forbes's list of 30 under 30.