Skip to content

June 2016 Meetup

June 2016 Meetup

Details

To expedite check-in, please make it a priority to add your name to the GA list please:

https://generalassemb.ly/education/sf-hadoop-users-meetup/san-francisco/25069

Agenda:
6-630pm Food and Drinks
630-715pm Tech Talks with Q&A
715-8pm Networking

Tech Talk Description: Real-time stream analysis starts with ingesting raw data and extracting structured records. While stream-processing frameworks such as Apache Spark and Apache Storm provide primitives for processing individual records, processing windows of records, and grouping/joining records, the process of performing common actions such as filtering, applying regular expressions to extract data, and converting records from one schema to another are left to developers writing business logic.

Joey Echeverria presents an alternative approach based on a reusable library that provides configuration-based data transformation. This allows users to write command data-transformation rules once and reuse them in multiple contexts. A common pattern is to consume a single, raw stream and transform it using the same rules before storing in different repositories such as Apache Solr for search and Apache Hadoop HDFS for deep storage.

Topic includes:

*An overview of the stream-processing landscape
*The common analysis phases in stream-processing applications (filter, extract, transform, group, aggregate, and join)
*Existing solutions for data transformation in a streaming context
*The solution for common data filtering, transformation, and extraction
*The extensibility of this solution with new, custom transformation actions that can be driven by configuration
*How to use the transformation library for log analytics and IT operations events

Speaker: Joey Echeverria is the director of engineering at Rocana, where he builds applications for scaling IT operations built on the Apache Hadoop platform. Joey is a committer on the Kite SDK, an Apache-licensed data API for the Hadoop ecosystem. Joey was previously a software engineer at Cloudera, where contributed to several ASF projects including Apache Flume, Apache Sqoop, Apache Hadoop, and Apache HBase. Joey is also a coauthor of Hadoop Security, published by O’Reilly Media.

Notes: The nearest BART station is Montgomery

Photo of San Francisco Hadoop Users group
San Francisco Hadoop Users
See more events
General Assembly
225 Bush Street, 5th Floor · San Francisco, CA