Skip to content

Details

18:00 - 18:30 - Mingling
18:30 - 19:30 - Near real time stream processing with Apache Kafka and Apache Samza – Spam detection use case - Michael Sklyar (https://www.linkedin.com/in/sklyarmichael), Infrastructure Team Lead @Cyren
19:30 -20:15 - Your Data Isn't That Big - Big data processing with bash scripting via command line - Boaz Menuhin (https://www.linkedin.com/in/boaz-menuhin-3481b413) - Sr. Software Engineer @ Crosswise (Oracle)

https://lh5.googleusercontent.com/TlWDsn9hcToB10mPCBVyaj198gWzzXNMWAcenfPDwiYBnSTbXey4WUfZUZ42G_xtd4u0_iLbVV3QTBzNqYXf2HGGlT0DQ-TCFDswAD58gZavvjcMD2N72yAKmcrMHb2-RrSgbudrhttps://lh4.googleusercontent.com/CYKi15PDi6GSI76pQEnd4l7H1FxzZRIQxcHG5A-PdYR00XtZPUWP8DVaEn5j_fdhNinh8ccQQZydNith-vOx5JjPKDW8F03qQaLjklD4v4IFoAEVCM8P80ag38IWNYxQRRoMdKC_https://lh4.googleusercontent.com/ecOoXfpaZFC2ZMByuvxiAQ0qUzjfzUwmorFY-IWibviJ1OWaMR06Z2Ba1SS7Hr2pm9KbDDa95HdAbGIU5ifmYk-ISbE5QfCeGR4Un45efJa_jAtUb4yAVuCBokb4Kizu2MBdyUye

“Near real time stream processing with Apache Kafka and Apache Samza – Spam detection use case”

Abstract:

In Cyren we deal with serious amounts of data. Our team mission was to rewrite our anti-spam legacy NRT detection stream processing layer. The system is processing billions of transactions/day while every second counts in order to protect our (your!) mail boxes.

In this session I would like to present our use case, the technology decisions, development experience and the results (solid numbers!).

I aim to cover general stream processing concepts such as back-pressure, at least once/exactly once processing, state management, windowing, partitioning.

I will present how these concepts are solved with Apache Samza and, when appropriate, compare to other stream processing framework – Apache Storm.

Bio:
Michael Sklyar (https://www.linkedin.com/in/sklyarmichael), Infrastructure Team Lead @ Cyren.

I have over 15 years of experience in SW. After a few years of Project Management in Telecom industry, I am happy to be back building systems in R&D.

I am passionate with design & architecture, big scale systems and massive amounts of data.

“Your Data Isn't That Big - Big data processing with bash scripting via command line”

Abstract:

Bash scripting and command line utils can be used as powerful tools for many big-data tasks. In some cases using command-line can run faster and more efficiently than running a MapReduce job. In this talk I will cover the scenarios in which one should consider using command line instead of Hadoop and cover available tools and recommended usage.

Bio:
Boaz (https://www.linkedin.com/in/boaz-menuhin-3481b413) is a Software engineer with +10 years of experience. Enjoying prototyping, cost reduction, solving Big Data problems but mostly enjoying solving problems which requires theoretical computer science knowledge.

Speaks fluent Python and bash scripting is a friend of mine. Was one of the first Crosswise employees (acquired by Oracle) and worked for some cyber security companies.

Members are also interested in