Apache Drill & Analyzing Text and Building Predictive Models with Greenplum

Apache Drill

Keys Botzum
Technology Evangelist, MapR

Apache Drill is a new Apache Incubator project for interactive analysis of large-scale data sets, inspired by Google's Dremel. It will allow
users to query terabytes of data in seconds, as opposed to minutes or


Keys Botzum is a Senior Principal Technologist with MapR Technologies. He has over 15 years of experience in large scale distributed system design. Mr. Botzum has worked with a variety of distributed technologies, including Sun RPC, DCE, CORBA, Java EE, AFS, and DFS. Recently, he has been focusing on Hadoop and related technologies. Previously he was a Senior Technical Staff Member with IBM and a respected author of many articles on WebSphere Application Server as well as a book. He holds a Masters degree in Computer Science from Stanford University and a B.S. in Applied Mathematics/Computer Science from Carnegie Mellon University.

Analyzing Text and Building Predictive Models with the Greenplum Unified Analytics Platform:

Niels Kasch

In this talk Niels Kasch will present how to develop a language-processing pipeline on top of Hadoop to facilitate a wide range of text analytics tasks. Specifically, I will demonstrate how to utilize Pig, OpenNLP (a open-source language processing toolkit), Mahout, and the Greenplum Data Platform to perform sentiment analysis on unstructured text sources. Using practical examples, the talk covers the necessary tools and steps involved in developing a predictive model for this task. Furthermore, I will illustrate how these techniques extend to other application areas in the machine vision (security analytics) and public health (patient care) domains.


Niels Kasch is a Senior Data Scientist at EMC Greenplum, where he focuses on machine learning, natural language processing, and information retrieval to develop large-scale data analytics solutions. Before coming to Greenplum, he developed delay-tolerant networking and routing protocols for the Interplanetary Internet and mission-critical space flight software at the Johns Hopkins Applied Physics Laboratory. Kasch received his Ph.D. from the University of Maryland, BC in Computer Science where he specialized in natural language processing. In his dissertation, he developed novel algorithms to mine and construct commonsense knowledge from large-scale data sources to support cognitive tasks in the area of artificial intelligence.


6:00-7:00 - Snacks and Networking
7:00-7:15 - Announcements
7:15-7:45 - First Speaker
7:45-7:50 - Break
7:50-8:20 - Second Speaker
8:20-8:50 - Meet with Donald Miner, Author of the new book MapReduce Design Patterns

Join or login to comment.

  • David M.

    Hi guys,

    I dropped my MiFi 4620L Jetpack on the floor in the meeting room. Did anyone happen to find it? It was under one of the seats on the right hand side of the room toward the front.


    November 29, 2012

    • A former member
      A former member

      David, bummer....I do recommend getting a mail out (with the help of HUG organizers?) directly to the subscription emails of HUG. Many folks may miss/ignore a "Meetup" mail or delay reading it.

      November 30, 2012

    • A former member
      A former member

      ...btw....really sorry you're having to deal with this. I had my 'puter and hotspot there last night and had a couple moments of nervousness where I thought i wasn't paying enough attention to it. Argh.

      November 30, 2012

  • A former member
    A former member

    This was a great session (love multi-talk sessions) and the hospitality was out of the park. Thanks to EMC/Greenplum and Booz for their generosity and classy event. Thanks, too, to the authors and Greenplum for the bulk autographed book giveaway. *That's* the kind of marketing collateral that makes a dent!

    November 29, 2012

  • Shawn H.

    Great session! Thanks to the sponsors for the food and spacious venue (rare finds in tech meetups).

    November 29, 2012

  • Brian V.

    Since there doesn't appear to be one, I hereby declare the hashtag for this event to be #hadoopdc.

    November 29, 2012

  • Alps

    Hoping to make the next one.

    November 29, 2012

  • Amine R.

    Hey there, does anybody know if there is parking in this location ?


    November 29, 2012

Our Sponsors

  • Tetra Concepts

    Thank you to Tetra Concepts for sponsoring this meetup.

  • BAE Systems

    Thank you to BAE Systems for sponsoring this meetup.

  • Booz Allen

    Thank you to Booz Allen for sponsoring this meetup.

People in this
Meetup are also in:

Sometimes the best Meetup Group is the one you start

Get started Learn more

I'm surprised by the level of growth I've seen since becoming an organizer, it's given me more confidence in my abilities.

Katie, started NYC ICO

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy