Skip to content

ETL, Kafka and Elasticsearch combined!

Photo of James Crisp
Hosted By
James C. and 2 others
ETL, Kafka and Elasticsearch combined!

Details

This month, we bring you a full length talk..

== Scalable realtime ETL streaming using Apache Kafka stack & Elasticsearch
Anil Surmeli (@surmelianil)

Traditionally, ETL (extract, transfer, load) has been used with batch processing, in which part of a system connects to a target data store periodically, gets and manipulates the data and persist it in another data store. Though it’s a pretty straightforward approach, it doesn’t meet today’s fast-paced workflows. We need realtime streaming that can horizontally scale.

The focus of this talk is to demonstrate how easily you can implement these streaming ETL pipelines in Apache Kafka. In this session, I will try to simulate a product update process for an e-commerce platform using Apache Kafka Stack & Elastic.

We will assume that the platform uses Elastic to list their products on the website. Our main goal is to make the update process as fast as possible so that customers can see up-to-date information about what they're browsing on the store.

First, we are going to simulate our streaming source with a simple Python application, then we are going to activate our producers so that the streaming data can be transferred to the product topic in Kafka. "Kafka Streams" application will fetch the data from the topic, enhance some of its properties and persist it back to another Kafka topic, and finally we will be able to see our updated product information in Elastic thanks to the Kafka consumers. We may use Kafka Sink Connector instead of consumers during the demo.
Monitoring the data flowing through Kafka is one of the most satisfying activities that I've done so far. Seamless joy. I hope you'll also like it :)

About Anil:
Anil is a senior software developer for CBA. He achieved one of his life goals last year and moved to Sydney from Istanbul. He had a chance to work with Martin Fowler's team, which helped him to practice DDD, TDD, and clean code architecture. While working for high traffic e-commerce domain, Anil experienced real-time ETL processing with Kafka, dynamic e-commerce filtering with Elastic and CQRS with .net core/python.
http://medium.com/@surmelianil | https://github.com/skynyrd

NOTE: The group meets in the Sydney CBD, near Wynyard station, not in Revesby (there's been some map bugs around this)!
Address for copy paste: level 10/50 Carrington St, Sydney NSW 2000
Google maps link: https://goo.gl/maps/oaBFbAYyjGB2

Photo of Sydney Alt.Net User Group group
Sydney Alt.Net User Group
See more events