Spam detection with Kafka + Samza & “Your Data Isn't That Big”

This is a past event

130 people went

Klarna Office

Yigal Alon 98, Tel Aviv (Electra building), floor 13 · Tel Aviv-Yafo

How to find us

https://www.facebook.com/groups/bigthingshere/

Location image of event venue

Details

18:00 - 18:30 - Mingling
18:30 - 19:30 - Near real time stream processing with Apache Kafka and Apache Samza – Spam detection use case - Michael Sklyar (https://www.linkedin.com/in/sklyarmichael), Infrastructure Team Lead @Cyren
19:30 -20:15 - Your Data Isn't That Big - Big data processing with bash scripting via command line - Boaz Menuhin (https://www.linkedin.com/in/boaz-menuhin-3481b413) - Sr. Software Engineer @ Crosswise (Oracle)

“Near real time stream processing with Apache Kafka and Apache Samza – Spam detection use case”

Abstract:

In Cyren we deal with serious amounts of data. Our team mission was to rewrite our anti-spam legacy NRT detection stream processing layer. The system is processing billions of transactions/day while every second counts in order to protect our (your!) mail boxes.

In this session I would like to present our use case, the technology decisions, development experience and the results (solid numbers!).

I aim to cover general stream processing concepts such as back-pressure, at least once/exactly once processing, state management, windowing, partitioning.

I will present how these concepts are solved with Apache Samza and, when appropriate, compare to other stream processing framework – Apache Storm.

Bio:
Michael Sklyar (https://www.linkedin.com/in/sklyarmichael), Infrastructure Team Lead @ Cyren.

I have over 15 years of experience in SW. After a few years of Project Management in Telecom industry, I am happy to be back building systems in R&D.

I am passionate with design & architecture, big scale systems and massive amounts of data.

“Your Data Isn't That Big - Big data processing with bash scripting via command line”

Abstract:

Bash scripting and command line utils can be used as powerful tools for many big-data tasks. In some cases using command-line can run faster and more efficiently than running a MapReduce job. In this talk I will cover the scenarios in which one should consider using command line instead of Hadoop and cover available tools and recommended usage.

Bio:
Boaz (https://www.linkedin.com/in/boaz-menuhin-3481b413) is a Software engineer with +10 years of experience. Enjoying prototyping, cost reduction, solving Big Data problems but mostly enjoying solving problems which requires theoretical computer science knowledge.

Speaks fluent Python and bash scripting is a friend of mine. Was one of the first Crosswise employees (acquired by Oracle) and worked for some cyber security companies.