Apache Kafka® and R: Real-time prediction and model (re)training


Details
Hallo Streamers!
Please find the details to join this fun and informative meetup below.
Find information about upcoming meetups and tons of content from past Kafka Meetups all over the world:
cnfl.io/meetup-hub
-----
Agenda:
6:00pm-6:10pm: Online Networking (feel free to BYOB!!)
6:10pm-6:55pm: Apache Kafka® and R: Real-time prediction and model (re)training, Patrick Neff, Data Scientist, Baader
6:55pm-7:10 pm: Q&A
Joining our slack space is not instant, so ensure that you are in, in time for the event, follow the steps within this link before the day of the event if you can! cnfl.io/slack
-----
Title:
Apache Kafka and R: Real-time prediction and model (re)training
Speaker:
Patrick Neff, Data Scientist, Baader
Abstract:
Data prediction using Machine Learning models on real-time data - a lot of fancy buzzwords but today we implement such a pipeline with Apache Kafka and R on our local machine.
In more detail, fake data is generated by a Kafka Producer into Kafka topics. A Kafka Streams application consumes one topic, predicts future values by communicating via REST API with R, and produces the data back to a sink topic. Finally, in ksqlDB, predicted and real values are compared and retraining is triggered once the prediction error exceeds a certain threshold.
In this talk, we go over the code of all building blocks, talk about testing crucial parts, and analyze how the pipeline performs. Also, limitations and drawbacks are highlighted.
Note: The Kafka Producer and Streams application are implemented in Kotlin; in R we work mainly with the plumber package.
Bio:
Patrick Neff is a Data Scientist and works for the company BAADER - a global provider of fish and poultry processing machines - in Hamburg, Germany. He is part of the digitalization department developing new solutions within the food processing industry. Before that, Patrick did his master's degree in Applied Statistics focusing on data science and machine learning. He began working with Apache Kafka in 2019. Since then, he developed several microservices with Kafka Streams, used Kafka Connect for data analytics projects, and was a speaker at the Kafka Summit Europe 2021.
-----
Online Meetup Etiquette:
•Please unmute yourself when you have a question.
•Please hold your questions until the end of the presentation or use the zoomchat!
•Please arrive on time as zoom meetings can become locked for many reasons (though if you get locked out a recording will be available, but you may have to wait a little while for it!)
----
If you would like to speak or host our next event please let us know! community@confluent.io

Apache Kafka® and R: Real-time prediction and model (re)training