Stream Processing with Apache Kafka & Real-time Data Integration at Scale


Details
Join us for an Apache Kafka meetup on July 4th from 6:30pm, hosted by Zalando in Dublin. The address is 3 Grand Canal Quay. Dublin 2 . D02 WC65. The agenda and speaker information can be found below. See you there!
-----
Agenda:
6:30pm: Doors open
6:30pm - 7:00pm: Networking, Pizza and Drinks
7:00pm - 7:30pm: Presentation #1: Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, Streams vs. Databases, Michael Noll, Confluent
7:30pm - 8:00pm: Presentation #2: Real-time Data Integration at Scale with Kafka Connect, Robin Moffatt, Confluent
8:00pm - 8:15pm: Additional Q&A and Networking
-----
First Talk
Speaker:
Michael Noll
Bio:
Michael Noll is a product manager at Confluent, the company founded by the creators of Apache Kafka. Previously, Michael was the technical lead of DNS operator Verisign’s big data platform, where he grew the Kafka, Hadoop, and Storm-based infrastructure from zero to petabyte-sized production clusters spanning multiple data centers—one of the largest big data infrastructures in Europe at the time. He is a well-known tech blogger in the big data community (www.michael-noll.com). In his spare time, Michael serves as a technical reviewer for publishers such as Manning and is a frequent speaker at international conferences, including ACM SIGIR, ApacheCon, and Strata. Michael holds a PhD in computer science.
Title:
Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, Streams vs. Databases
Abstract:
Modern businesses have data at their core, and this data is changing continuously. How can we harness this torrent of information in real-time? The answer is stream processing, and the technology that has since become the core platform for streaming data is Apache Kafka. Among the thousands of companies that use Kafka to transform and reshape their industries are the likes of Netflix, Uber, PayPal, and AirBnB, but also established players such as Goldman Sachs, Cisco, and Oracle.
Unfortunately, today’s common architectures for real-time data processing at scale suffer from complexity: there are many technologies that need to be stitched and operated together, and each individual technology is often complex by itself. This has led to a strong discrepancy between how we, as engineers, would like to work vs. how we actually end up working in practice.
In this session we talk about how Apache Kafka helps you to radically simplify your data architectures. We cover how you can now build normal applications to serve your real-time processing needs — rather than building clusters or similar special-purpose infrastructure — and still benefit from properties such as high scalability, distributed computing, and fault-tolerance, which are typically associated exclusively with cluster technologies. We discuss common use cases to realize that stream processing in practice often requires database-like functionality, and how Kafka allows you to bridge the worlds of streams and databases when implementing your own core business applications (inventory management for large retailers, patient monitoring in healthcare, fleet tracking in logistics, etc), for example in the form of event-driven, containerized microservices.
---
Second Talk
Speaker:
Robin Moffatt
Bio:
Robin is a Partner Technology Evangelist at Confluent, the company founded by the creators of Apache Kafka, as well as an Oracle ACE Director. His career has always involved data, from the old worlds of COBOL and DB2, through the worlds of Oracle and Hadoop, and into the current world with Kafka. His particular interests are analytics, systems architecture, performance testing and optimization. He blogs at http://rmoff.net/ (and previously http://ritt.md/rmoff ) and can be found tweeting grumpy geek thoughts as @rmoff. Outside of work he enjoys drinking good beer and eating fried breakfasts, although generally not at the same time.
Title:
Real-time Data Integration at Scale with Kafka Connect
Abstract:
Apache Kafka is a streaming data platform. It enables integration of data across the enterprise, and ships with its own stream processing capabilities. But how do we get data in and out of Kafka in an easy, scalable, and standardised manner? Enter Kafka Connect. Part of Apache Kafka since 0.9, Kafka Connect defines an API that enables the integration of data from multiple sources, including MQTT, common NoSQL stores, and CDC from relational databases such as Oracle. By "turning the database inside out" we can enable an event-driven architecture in our business that reacts to changes made by applications writing to a database, without having to modify those applications themselves. As well as ingest, Kafka Connect has connectors with support for numerous targets, including HDFS, S3, and Elasticsearch.
This presentation will briefly recap the purpose of Kafka, and then dive into Kafka Connect, with practical examples of data pipelines that can be built with it and are in production at companies around the world already. We'll also look at the Single Message Transform (SMT) capabilities introduced with Kafka 0.10.2 and how they can make Kafka Connect even more flexible and powerful.
-----
Special thanks to Zalando who are hosting us for this event.
Don't forget to join our Community Slack Team (https://slackpass.io/confluentcommunity)!
If you would like to speak or host our next event please let us know! community@confluent.io
NOTE: We are unable to cater for any attendees under the age of 18. Please do not sign up for this event if you are under 18.


Stream Processing with Apache Kafka & Real-time Data Integration at Scale