Skip to content

BDAM: Distributed Rules Engine & Exactly-once processing with Apache Kafka!

Photo of Priyanka Nambiar
Hosted By
Priyanka N.
BDAM: Distributed Rules Engine & Exactly-once processing with Apache Kafka!

Details

Shoutout to Cask ( http://cask.co/ ) for kindly sponsoring and hosting this meetup!

Cask will also be giving away an Amazon Dot! Enter the raffle on the day of the event for a chance to win.

AGENDA

6:00 - 6:30 - Socialize over food and beverages

6:30 - 8:00 - Talks

TALKS

Talk #1: Introducing a horizontally scalable, inference-based business Rules Engine for Big Data processing, by Nitin Motgi from Cask

Talk #2: Building Stream Processing Applications with Apache Kafka's Exactly-Once processing guarantees, by Matthias Sax from Confluent

Unfortunately, we had a last minute cancellation for tonight's planned talk titled: "Advanced Data Engineering Patterns with Apache Airflow". It will be rescheduled for a future meetup.

ABSTRACTS

Talk #1: Introducing a horizontally scalable, inference-based business Rules Engine for Big Data processing, by Nitin Motgi from Cask

Business Rules are statements that describe business policies or procedures to process data. Rules engines or inference engines execute business rules in a runtime production environment, and have become commonplace for many IT applications. Except in the world of big data, where there has been a gap for a horizontally scalable, lightweight inference-based business rules engine for big data processing.

In this session, you will learn about Cask’s new business rule engine built on top of CDAP, which is a sophisticated if-then-else statement interpreter that runs natively on big data systems such as Spark, Hadoop, Amazon EMR, Azure HDInsight and GCE. It provides an alternative computational model for transforming your data while empowering business users to specify and manage the transformations and policy enforcements.

In his talk, Nitin Motgi, Cask co-founder and CTO, will demonstrate this new, distributed rule engine and explain how business users in big data environments can make decisions on their data, enforce policies, and be an integral part of the data ingestion and ETL process. He will also show how business users can write, manage, deploy, execute and monitor business data transformation and policy enforcements.

Talk #2: Building Stream Processing Applications with Apache Kafka's Exactly-Once processing guarantees, by Matthias Sax from Confluent

Kafka 0.11 added a new feature called "exactly-once guarantees". In this talk, we will explain what "exactly-once" means in the context of Kafka and data stream processing and how it effects application development. The talk will go into some details about exactly-once namely the new idempotent producer and transactions and how both can be exploited to simplify application code: for example, you don't need to have complex deduplication code in your input path, as you can rely on Kafka to deduplicate messages when data is produces by an upstream application. Transactions can be used to write multiple messages into different topics and/or partitions and commit all writes in an atomic manner (or abort all writes so none will be read by a downstream consumer in read-committed mode). Thus, transactions allow for applications with strong consistency guarantees, like in the financial sector (e.g., either send a withdrawal and deposit message to transfer money or none of them). Finally, we talk about Kafka's Streams API that makes exactly-once stream processing as simple as it can get.

SPEAKER BIOS

• Nitin Motgi is Co-Founder and CTO of Cask, where he is responsible for developing the company’s long-term technology, driving company engineering initiatives and collaboration. Prior to Cask, Nitin was at Yahoo! working on a large-scale content optimization system externally known as C.O.R.E.
Prior to Yahoo!, Nitin led the development of a large-scale fabrication analysis system at Altera, and he previously held senior engineering roles at FedEx. Nitin holds a Master’s degree in computer science from University of Central Florida (UCF).

• Matthias Sax is a Software Engineer at Confluent working mainly on Kafka's Streams API (aka Kafka Streams) and was involved in the exactly-once development efforts. Before Confluent he was a PhD student at Humboldt-University of Berlin, Germany, focusing on distributed stream processing systems. Matthias is also a committer at Apache Flink and Apache Storm.

ARRIVAL AND PARKING

Cask HQ is a few minutes walk from the California Avenue Caltrain Station.

Also, Cask HQ has its own parking lot, but it will certainly not accommodate all guests. Please use parking lots available nearby:

https://secure.meetupstatic.com/photos/event/5/b/2/f/600_438983343.jpeg

Photo of Big Data Application Meetup group
Big Data Application Meetup
See more events
150 Grant Ave, Suite C · Palo Alto, CA