Talk1: Real-time Anomaly detection on 19 BN events. Talk2: Analysing Meetup data

This is a past event

53 people went


Join us for a great line up of talks showcasing use cases for the best scalable and open source technologies used in modern data platform: Kafka, Cassandra, k8s and Elasticsearch.

First talk: Kafka,Cassandra & K8s @ Scale: Real-time Anomaly detection on 19 Billion events a day

Abstract: Apache Kafka, Apache Cassandra and Kubernetes are open source big data technologies enabling applications and business operations to scale massively and rapidly. While Kafka and Cassandra underpins the data layer of the stack providing capability to stream, disseminate, store and retrieve data at very low latency, Kubernetes is a container orchestration technology that helps in automated application deployment and scaling of application clusters. In this presentation, we will reveal how we architected a massive scale deployment of a streaming data pipeline with Kafka and Cassandra to cater to an example Anomaly detection application running on a Kubernetes cluster and generating and processing massive amount of events. Anomaly detection is a method used to detect unusual events in an event stream. It is widely used in a range of applications such as financial fraud detection, security, threat detection, website user analytics, sensors, IoT, system health monitoring, etc. When such applications operate at massive scale generating millions or billions of events, they impose significant computational, performance and scalability challenges to anomaly detection algorithms and data layer technologies. We will demonstrate the scalability, performance and cost effectiveness of Apache Kafka, Cassandra and Kubernetes, with results from our experiments allowing the Anomaly detection application to scale to 19 Billion anomaly checks per day.

Bio: This talk will be presented by Paul Brebner. Paul is the Technology Evangelist at Instaclustr. He’s been learning new scalable big data technologies, solving realistic problems and building applications, and blogging about Apache Cassandra, Spark, Zeppelin, and Kafka. Paul has extensive R&D and industry experience in distributed systems, technology innovation, software architecture and engineering, software performance and scalability, grid and cloud computing, and data analytics and machine learning (UNSW, CSIRO, UCL/UK, NICTA, and founder/CTO of a NICTA tech startup).

Second Talk: Analysing Meetups at Big Data Meetup by using Elastic Stack

Abstract: In this talk, we'll cover why Elastic stack is becoming very popular choice when it comes to tackling big data challenges. We'll discuss how to design and implement data ingest layer via Logstash from Meetup service, store the data that we're interested in efficiently in Elasticsearch and explore the Meetup data via Kibana to uncover all sort of interesting things about meetups. We'll sprinkle some machine learning as well, since we never know what we're going to find.

Bio: Solutions Architect @ Elastic Hrvoje (H) Pejcinovic

5:30 - Welcome, food, drinks, network
6:00 - First talk
6:45 - Second talk

Food, drinks, and giveaways will be provided!

Please also register via Eventbrite to confirm your spot.