Stream Processing with Apache Kafka & Apache Samza

Details
Welcome:
Welcome to the May 2017 Stream Processing Meetup hosted by LinkedIn in Sunnyvale.
This meetup focusses on Apache Kafka, Apache Samza and related streaming technologies.
Location:
Our new Corporate HQ in Sunnyvale. We will be in a 300-person auditorium named Unity at 950 W Maude Ave in Sunnyvale.
Agenda:
6PM: Doors open
6-6:35PM: Networking & Welcome
6:35-7:10PM: Streaming Data Pipelines with Brooklin (Samarth Shetty, LinkedIn)
In recent years, data and streaming applications have grown by leaps and bound and streaming data fast and reliably from the storage layer to the streaming applications has become a non-trivial problem. Building one-off data pipelines that serve the requirements of every application and dataset combination is not sustainable.
At LinkedIn, we’ve developed a system called Brooklin to create data pipelines connecting streaming data sources (i.e. Kafka, EventHubs, Change-Capture streams) with nearline applications. In this talk we will talk about Brooklin, the problems it addresses, its design, usage and future directions.
7:15-7:50PM: Kafka at Half the Price (Dong Lin, LinkedIn)
At LinkedIn we have 1500+ machines for running Kafka which costs millions of dollars in operation and maintenance. As our cluster size increases and hardware becomes older, we observed increasing occurrence of double broker failure in the last year which motivates us to increase replication factor from 2 to 3 to keep our data available to users. However, this change in replication factor is prohibitively expensive as it increases our hardware cost by another 50% which means millions of dollars a year. In this talk we present our work on supporting JBOD setup in Kafka which allows us to save 50% cost, or increase replication factor to 3 and save 25% hardware cost at the same time. We will compare JBOD with alternatives including RAID and one-broker-per-disk, explain its high level design and discuss possible future work to further reduce Kafka's operation cost.
7:55-8:30PM: Managed or stand alone, streaming or batch; Unified processing with the Samza Fluent API (Yi Pan, LinkedIn)
Samza 0.13 improves the simplicity and portability of Samza applications. The new fluent API supports common operations like windowing, map and join on streams. Developers can now express application logic concisely in few lines of code and accomplish what previously used to require several jobs. The other exciting Samza 0.13.0 feature is Standalone Deployment. It empowers developers to deploy and scale Samza applications as a simple embedded library, which is much more flexible than the original YARN deployment model. This talk will cover the new Fluent API and Standalone as well as batch processing. both in terms of what is available in the 0.13.0 release and what is coming in the future.
RSVP:
Please RSVP only if you plan to attend in person. Our facility can host 300 guests.
Parking & Entrance:
You can park in the uncovered parking that is along 950 Maude or in the parking garage located behind the building. There is also street parking available for overflow.
NDA:
You will need to sign a standard NDA when you enter the lobby.
Food & Drink:
Food & drink will be provided.
Can’t join us live?:
Live Stream will be at https://primetime.bluejeans.com/a2m/live-event/ow53483.
Recording will be posted in a few days.
Want to talk at a future meetup?:
Please contact us via the “Contact” button in meetup.com.


Stream Processing with Apache Kafka & Apache Samza