Our February meeting will be at Sidecar with food/beverage provided by IBM.
It Came from the Data Lake: Best Practices for Hadoop by Vicki Boykis
Hadoop is popular enough now that many companies are in the second iteration of their Hadoop architecture, and many more are just getting started with it. In this talk, we'll cover considerations for Hadoop platform best practices today starting with whether you should even use Hadoop, Hadoop native file formats, and the state of Hadoop and Spark development.
Vicki Boykis has worked at all ends of the data science spectrum, including predictive analytics, data engineering, and data visualization. She has experience in healthcare, education, telecommunications, and finance. While not data science-ing, she enjoys writing and Nutella.
A second talk will be presented by Rafi Kurlansik
Analyze over 30 years of data from the National Highway Traffic Safety Administration to better understand fatal accidents, focusing on the following two questions: A) How does the safety profile of different manufacturers compare to eachother? B) In the event of a crash, which factors are associated with fatality?
We will explore and visualize the data with SparkR, PostgreSQL and Object Storage
Before IBM, Rafi completed the data science specialization by Johns Hopkins University on Coursera.org and built predictive models for Virtua Health System in New Jersey. When he's not analyzing publicly available data for fun, he enjoys homebrewing, PC gaming, and exploring nature.