Spam detection with Kafka + Samza & “Your Data Isn't That Big”


Details
18:00 - 18:30 - Mingling
18:30 - 19:30 - Near real time stream processing with Apache Kafka and Apache Samza – Spam detection use case - Michael Sklyar (https://www.linkedin.com/in/sklyarmichael), Infrastructure Team Lead @Cyren
19:30 -20:15 - Your Data Isn't That Big - Big data processing with bash scripting via command line - Boaz Menuhin (https://www.linkedin.com/in/boaz-menuhin-3481b413) - Sr. Software Engineer @ Crosswise (Oracle)
“Near real time stream processing with Apache Kafka and Apache Samza – Spam detection use case”
Abstract:
In Cyren we deal with serious amounts of data. Our team mission was to rewrite our anti-spam legacy NRT detection stream processing layer. The system is processing billions of transactions/day while every second counts in order to protect our (your!) mail boxes.
In this session I would like to present our use case, the technology decisions, development experience and the results (solid numbers!).
I aim to cover general stream processing concepts such as back-pressure, at least once/exactly once processing, state management, windowing, partitioning.
I will present how these concepts are solved with Apache Samza and, when appropriate, compare to other stream processing framework – Apache Storm.
Bio:
Michael Sklyar (https://www.linkedin.com/in/sklyarmichael), Infrastructure Team Lead @ Cyren.
I have over 15 years of experience in SW. After a few years of Project Management in Telecom industry, I am happy to be back building systems in R&D.
I am passionate with design & architecture, big scale systems and massive amounts of data.
“Your Data Isn't That Big - Big data processing with bash scripting via command line”
Abstract:
Bash scripting and command line utils can be used as powerful tools for many big-data tasks. In some cases using command-line can run faster and more efficiently than running a MapReduce job. In this talk I will cover the scenarios in which one should consider using command line instead of Hadoop and cover available tools and recommended usage.
Bio:
Boaz (https://www.linkedin.com/in/boaz-menuhin-3481b413) is a Software engineer with +10 years of experience. Enjoying prototyping, cost reduction, solving Big Data problems but mostly enjoying solving problems which requires theoretical computer science knowledge.
Speaks fluent Python and bash scripting is a friend of mine. Was one of the first Crosswise employees (acquired by Oracle) and worked for some cyber security companies.

Spam detection with Kafka + Samza & “Your Data Isn't That Big”