Real time data ingestion & streaming: talks from Avvo, Expedia and Confluent


Details
Happy New Year 2017 to everyone! Hope all of you had a great holiday and are back re-energized. Over the last year we met about once a quarter which was a good rhythm, with the great content and presenters we had. In 2017, let us continue the momentum. Kicking off the series for this year, we have three great talks lined up with presenters from Avvo, Expedia and Confluent. Venue and food are sponsored by Avvo. I'd also like to invite all of you to plan ahead and reach out to the organizers with talk/hosting proposals for future meetups.
Time: 5:30-8:30 PM
Venue: Avvo (720 Olive Way, Floor14, Seattle, WA).
Apache Flume – Real time data ingestion into HDFS
Abstract: I will give a technical overview on how to use Flume for Real time data ingestion into HDFS and HBase.
Speaker: Tanuj Mehta, Director - BI/Machine Learning/Engineering, Avvo
Streaming Data Ecosystems with Brandon O'Brien
Abstract: Expedia, Inc is a global e-Commerce company that runs a high-traffic website (and mobile app) for finding and purchasing great travel products. We generate a lot of data, so we've built out a world-class streaming data platform with a supporting ecosystem of tools designed to make it super simple for our teams to produce and consume streaming data, so that we can deploy new data products fast. The main components of our streaming data platform are the common Kafka cluster, our simple HTTP data ingestion facade, and our tool called Primer that makes it easy to create and deploy new Storm and Spark apps as Kafka consumers. In the talk, I'll explain what that ecosystem looks like, how teams are using Kafka and more.
Speaker: Brandon O'Brien, Principal Software Engineer at Expedia, Inc ( https://twitter.com/hakczar )
State of the Streaming Platform 2017 : An Overview of Apache Kafka and the Confluent Platform
Abstract: In the past few years Apache Kafka has emerged as the world's most popular real-time data streaming platform. In this talk, we introduce some key additions to the Apache project from 2016: Kafka Connect and Kafka Streams.
Kafka Connect is the community’s framework for scalable, fault-tolerant data import and export into your streaming platform. By standardizing the most common design patterns seen in large Kafka deployments, Kafka Connect dramatically simplifies the development of ETL pipelines and the integration of disparate data systems across the Kafka backbone. We’ll discuss the Kafka Connect architecture and how you can publish or subscribe to Kafka topics by simply configuring a standard Connector.
Kafka Streams provides a natural DSL for writing stream processing applications and a light-weight deployment model that integrates with any execution framework. As such, it is the most convenient yet scalable option to analyze, transform, or otherwise process data that is streaming through Kafka.
We'll round out the discussion with a brief demonstration of a data pipeline illustrating all of these components along with the latest monitoring and alerting capabilities of the Confluent Enterprise offering.
Speaker: David Tucker, Director, Partner Engineering and Alliances, Confluent
David Tucker is a solution architect specializing in complex deployments of enterprise software in physical and virtual environments. He is experienced in product development and solution design across the full range of classic business applications (RDBMS and ERP) as well as the latest in big data technologies (Hadoop platforms and Kafka streams). Prior to Confluent, David was at MapR and Hewlett-Packard.

Sponsors
Real time data ingestion & streaming: talks from Avvo, Expedia and Confluent