BAM - Hortonworks Data Flow and A Tidy Text Analysis of The Simpsons in R

Details
Hortonworks Data Flow, presented by Jon Ingalls from Hortonworks
Abstract:
As the world around us is becoming increasingly instrumented and connected, managing streaming data effectively is one of the major challenges faced by data architects and engineers. In this presentation, we will discuss how the components of Hortonworks Data Flow (HDF) can be used together to address aspects of data flow management and streaming analytics, including Apache NiFi, Apache Kafka, Apache Storm, and Hortonworks Schema Registry and Streaming Analytics Manager. We will review an end-to-end solution focused on alarm fatigue in the Healthcare space that will demonstrate the capabilities of a comprehensive data in-motion platform like HDF.
About the speaker:
Jon Ingalls is a Software Engineer with Hortonworks with almost 30 years of experience in various IT capacities including Consultant, Programmer, Quality Assurance/Tester, Software Developer, and Software Engineer. As a software engineer over the past 18 years, Jon has worked with several large and small software organizations including Oracle, Endeca, IBM, Cognos, and ParAccel, selling software for MPP Database, Business Intelligence, Search, and Data Discovery. Jon has been focused on the Big Data/Hadoop space since 2014 by bringing solutions to his customers’ real-world use cases leveraging several of the major Hadoop distributions including Hortonworks. Jon holds a B.S. in Computer Sciences from Northern Illinois University (1991).
A Tidy Text Analysis of The Simpsons in R, presented by Seamus Wedge
In this presentation, Seamus will illustrate the benefits of adopting "tidy" principles for data analysis. This approach can be applied to great use with unstructured data, such as text. As an example, he will use over 500 scripts from the first 27 seasons of The Simpsons to walk through the process of transforming largely unstructured data into a structured format. Adding onto the work of others, and primarily using R, the analysis will establish the concept of a narrative arc to look at sentiment and story structure of individual characters, specific writers, and the nuanced interactions between characters when they appear together. The presentation is intended to be a fun and interactive example of a project, the principles of which might be applied to more practical real-world problems.
About the Speaker
Seamus Wedge is a Data Scientist at Jewelers Mutual Insurance Group. With a B.S. Chemical Engineering from UW-Madison, and after 10 years in product development and marketing in the plastics converting industry, he embarked on a career change by completing the Masters of Data Science program through UW-Eau Claire, leading to his current role. Work projects currently include using data to model and improve the customer experience across channels. His passion is for finding hidden insights in data and using creative ways to clearly communicate those results.

BAM - Hortonworks Data Flow and A Tidy Text Analysis of The Simpsons in R