Apache Drill & Analyzing Text and Building Predictive Models with Greenplum


Details
Apache Drill
Keys BotzumTechnology Evangelist, MapR
Apache Drill is a new Apache Incubator project for interactive analysis of large-scale data sets, inspired by Google's Dremel. It will allow
users to query terabytes of data in seconds, as opposed to minutes or
Hours.
Bio:
Keys Botzum is a Senior Principal Technologist with MapR Technologies. He has over 15 years of experience in large scale distributed system design. Mr. Botzum has worked with a variety of distributed technologies, including Sun RPC, DCE, CORBA, Java EE, AFS, and DFS. Recently, he has been focusing on Hadoop and related technologies. Previously he was a Senior Technical Staff Member with IBM and a respected author of many articles on WebSphere Application Server as well as a book. He holds a Masters degree in Computer Science from Stanford University and a B.S. in Applied Mathematics/Computer Science from Carnegie Mellon University.
Analyzing Text and Building Predictive Models with the Greenplum Unified Analytics Platform:
Niels Kasch Greenplum
In this talk Niels Kasch will present how to develop a language-processing pipeline on top of Hadoop to facilitate a wide range of text analytics tasks. Specifically, I will demonstrate how to utilize Pig, OpenNLP (a open-source language processing toolkit), Mahout, and the Greenplum Data Platform to perform sentiment analysis on unstructured text sources. Using practical examples, the talk covers the necessary tools and steps involved in developing a predictive model for this task. Furthermore, I will illustrate how these techniques extend to other application areas in the machine vision (security analytics) and public health (patient care) domains.
Bio:
Niels Kasch is a Senior Data Scientist at EMC Greenplum, where he focuses on machine learning, natural language processing, and information retrieval to develop large-scale data analytics solutions. Before coming to Greenplum, he developed delay-tolerant networking and routing protocols for the Interplanetary Internet and mission-critical space flight software at the Johns Hopkins Applied Physics Laboratory. Kasch received his Ph.D. from the University of Maryland, BC in Computer Science where he specialized in natural language processing. In his dissertation, he developed novel algorithms to mine and construct commonsense knowledge from large-scale data sources to support cognitive tasks in the area of artificial intelligence.
Agenda
6:00-7:00 - Snacks and Networking
7:00-7:15 - Announcements
7:15-7:45 - First Speaker
7:45-7:50 - Break
7:50-8:20 - Second Speaker
8:20-8:50 - Meet with Donald Miner, Author of the new book MapReduce Design Patterns

Apache Drill & Analyzing Text and Building Predictive Models with Greenplum