addressalign-toparrow-leftarrow-rightbackbellblockcalendarcameraccwcheckchevron-downchevron-leftchevron-rightchevron-small-downchevron-small-leftchevron-small-rightchevron-small-upchevron-upcircle-with-checkcircle-with-crosscircle-with-pluscrossdots-three-verticaleditemptyheartexporteye-with-lineeyefacebookfolderfullheartglobegmailgooglegroupshelp-with-circleimageimagesinstagramlinklocation-pinm-swarmSearchmailmessagesminusmoremuplabelShape 3 + Rectangle 1ShapeoutlookpersonJoin Group on CardStartprice-ribbonShapeShapeShapeShapeImported LayersImported LayersImported Layersshieldstartickettrashtriangle-downtriangle-uptwitteruserwarningyahoo

Real time data ingestion & streaming: talks from Avvo, Expedia and Confluent

  • January 25 · 5:30 PM

Happy New Year 2017 to everyone! Hope all of you had a great holiday and are back re-energized. Over the last year we met about once a quarter which was a good rhythm, with the great content and presenters we had. In 2017, let us continue the momentum. Kicking off the series for this year, we have three great talks lined up with presenters from Avvo, Expedia and Confluent. Venue and food are sponsored by Avvo. I'd also like to invite all of you to plan ahead and reach out to the organizers with talk/hosting proposals for future meetups.

Time: 5:30-8:30 PM

Venue: Avvo (720 Olive Way, Floor14, Seattle, WA).

Apache Flume – Real time data ingestion into HDFS

Abstract: I will give a technical overview on how to use Flume for Real time data ingestion into HDFS and HBase.

Speaker: Tanuj Mehta, Director - BI/Machine Learning/Engineering, Avvo

Streaming Data Ecosystems with Brandon O'Brien

Abstract: Expedia, Inc is a global e-Commerce company that runs a high-traffic website (and mobile app) for finding and purchasing great travel products. We generate a lot of data, so we've built out a world-class streaming data platform with a supporting ecosystem of tools designed to make it super simple for our teams to produce and consume streaming data, so that we can deploy new data products fast. The main components of our streaming data platform are the common Kafka cluster, our simple HTTP data ingestion facade, and our tool called Primer that makes it easy to create and deploy new Storm and Spark apps as Kafka consumers.  In the talk, I'll explain what that ecosystem looks like, how teams are using Kafka and more.

Speaker: Brandon O'Brien, Principal Software Engineer at Expedia, Inc (

State of the Streaming Platform 2017 : An Overview of Apache Kafka and the Confluent Platform

Abstract: In the past few years Apache Kafka has emerged as the world's most popular real-time data streaming platform. In this talk, we introduce some key additions to the Apache project from 2016:  Kafka Connect and Kafka Streams.  

Kafka Connect is the community’s framework for scalable, fault-tolerant data import and export into your streaming platform. By standardizing the most common design patterns seen in large Kafka deployments, Kafka Connect dramatically simplifies the development of ETL pipelines and the integration of disparate data systems across the Kafka backbone.   We’ll discuss the Kafka Connect architecture and how you can publish or subscribe to Kafka topics by simply configuring a standard Connector.

Kafka Streams provides a natural DSL for writing stream processing applications and a light-weight deployment model that integrates with any execution framework.  As such, it is the most convenient yet scalable option to analyze, transform, or otherwise process data that is streaming through Kafka.

We'll round out the discussion with a brief demonstration of a data pipeline illustrating all of these components along with the latest monitoring and alerting capabilities of the Confluent Enterprise offering.

Speaker: David Tucker, Director, Partner Engineering and Alliances, Confluent

David Tucker is a solution architect specializing in complex deployments of enterprise software in physical and virtual environments. He is experienced in product development and solution design across the full range of classic business applications (RDBMS and ERP) as well as the latest in big data technologies (Hadoop platforms and Kafka streams). Prior to Confluent, David was at MapR and Hewlett-Packard. 

Join or login to comment.

  • Nitin K.

    1) State of the Streaming Platform 2017: An Overview of Apache Kafka and the Confluent Platform
    Speaker: David Tucker, Director, Partner Engineering and Alliances, Confluent
    Slides -

    2) Streaming Data Ecosystems with Brandon O'Brien
    Speaker: Brandon O'Brien, Principal Software Engineer at Expedia, Inc
    Slides -

    3) Apache Flume – Real time data ingestion into HDFS
    Speaker: Tanuj Mehta, Director - BI/Machine Learning/Engineering, Avvo
    Slides -

    1 · January 28

  • Harry

    It was very informative. Being able to hear about real-world deployments and the issues and benefits was great. I also liked the future of streaming talk, and how it showed such an extensive the view of the future of enterprises. A big thank you to all three speakers and to the organizers.

    1 · January 26

  • Jeff T.

    Great talk last night very informative and beautiful demos thank you ... Will a video or materials be posted for last night's session?

    1 · January 26

    • Nitin K.

      Thanks everyone for your participation. Slides and videos will be posted soon.

      January 26

  • Brandon O.

    Thanks all for attending, I hope it was interesting. Please feel free to reach out any with more questions (@hakczar). For those interested in a deep dive on processing Kafka data using Spark Streaming, I have an upcoming talk on March 23rd that will focus on this:

    1 · January 25

Our Sponsors

  • Microsoft

    Food and venues for various meetup events.

People in this
Meetup are also in:

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy