Skip to content

A guide to Python frameworks for Hadoop

A guide to Python frameworks for Hadoop

Details

Distributed computing frameworks like Hadoop have revolutionized our ability to process large amounts of data. Using these tools typically requires writing complex programs in lower-level languages like Java; however, data scientists and analysts prefer to spend time in higher-level languages, such as Python. In order to address this gap, multiple open-source Python frameworks have been built to enable simple, user-friendly access to Hadoop’s underlying systems. This talk will review the different available frameworks, including a comparison of performance, ease of use/installation, differences in implementation, and other features.

Bio
Uri Laserson is a data scientist at Cloudera. Previously, he received his PhD from MIT developing applications of high-throughput DNA sequencing to immunology. During that time, he co-founded Good Start Genetics, a next-generation diagnostics company focused on genetic carrier screening. In 2012 he was selected to Forbes's list of 30 under 30.

Photo of New York Hadoop User group group
New York Hadoop User group
See more events
foursquare freight entrance
110 Crosby St (between Houston and Prince), 10th floor · New York, NY