Modern Data Processing With Streaming Data Pipelines


Details
Please register on the event website to receive your customized zoom joining link: https://www.aicamp.ai/event/eventdetails/W2022072612
(Our partner AICamp provides free Zoom service for our members)
Agenda:
12:00 - 12:05 pm members join online
12:05 - 1 pm talk + QA
1 pm – closing
Summary: In our meetup talk, we will show some best practices we have discovered over the last 7 years in building data streaming applications including IoT, CDC, Logs, and data feed.
In our modern data processing approach, we utilize several highly scalable open-source frameworks to maximize the best features of all. We often start with Apache NiFi as the orchestrator of streams flowing into Apache Pulsar.
From there we build streaming ETL with Apache Spark, and enhance events with Pulsar Functions for ML and enrichment.
We build continuous queries against our topics with Flink SQL for aggregations, real-time alerts, and Delta Lake population.
With Slides, Demos, Q&A
Speakers: Timothy Spann and David Kjerrumgaard
Timothy Spann
Developer Advocate, StreamNative
Former Principal DataFlow Field Engineer at Cloudera
Former Senior Solutions Engineer at Hortonworks
Former Senior Field Engineer at Pivotal
DZone MVB Blogger
David Kjerrumgaard
Developer Advocate
Apache Pulsar Committer | Author of Pulsar In Action
Former Principal Software Engineer on Splunk’s messaging team Responsible for Splunk’s internal Pulsar-as-a-Service platform
Former Director of Solution Architecture at Streamlio

Modern Data Processing With Streaming Data Pipelines