Building Scalable Data Pipelines with Kafka and Apache Spark
Details
Hello,
Let's meet and discuss building data pipelines using Kafka and Spark.
This talk discusses the process of building data pipelines, e.g. extraction, cleaning, integration, pre-processing of data, in general all the steps that are necessary to prepare your data for your data-driven product. In particular, the focus is on data plumbing and on the practice of going from prototype to production.
Some of the topics we will cover include:
- What are Data Pipelines
- Use cases for Data Pipelines
- What does it mean for data pipelines to be scalable?
- A generic data pipeline architecture
- What is Kafka and where it stands in the data pipeline architecture
- Kafka architecture and components
- Example of Kafka client API usage (Producer, Consumer and Connect API)
- Walkthrough of codebase and Kafka API
- What Spark is and where it stands in the architecture
- Spark architecture and components
- An example of a Spark job (reading from Kafka, perform some use case driven computation, write to a data store like Redis)
- Walkthrough of codebase and Spark API
- QA
