Skip to content

Modern Data Processing With Streaming Data Pipelines

Photo of AnyaB
Hosted By
AnyaB
Modern Data Processing With Streaming Data Pipelines

Details

Please register on the event website to receive your customized zoom joining link: https://www.aicamp.ai/event/eventdetails/W2022072612
(Our partner AICamp provides free Zoom service for our members)

Agenda:
12:00 - 12:05 pm members join online

12:05 - 1 pm talk + QA

1 pm – closing

Summary: In our meetup talk, we will show some best practices we have discovered over the last 7 years in building data streaming applications including IoT, CDC, Logs, and data feed.

In our modern data processing approach, we utilize several highly scalable open-source frameworks to maximize the best features of all. We often start with Apache NiFi as the orchestrator of streams flowing into Apache Pulsar.

From there we build streaming ETL with Apache Spark, and enhance events with Pulsar Functions for ML and enrichment.

We build continuous queries against our topics with Flink SQL for aggregations, real-time alerts, and Delta Lake population.

With Slides, Demos, Q&A

Speakers: Timothy Spann and David Kjerrumgaard

Timothy Spann

Developer Advocate, StreamNative
Former Principal DataFlow Field Engineer at Cloudera
Former Senior Solutions Engineer at Hortonworks
Former Senior Field Engineer at Pivotal
DZone MVB Blogger

David Kjerrumgaard

Developer Advocate
Apache Pulsar Committer | Author of Pulsar In Action
Former Principal Software Engineer on Splunk’s messaging team Responsible for Splunk’s internal Pulsar-as-a-Service platform
Former Director of Solution Architecture at Streamlio

Photo of SF Big Analytics group
SF Big Analytics
See more events
Online event
This event has passed