[Online] Scalable Real-time Geospatial Data Processing with Kafka and Cassandra

Melbourne Distributed
Melbourne Distributed
Public group

Needs a location


Join us on April 30th for a talk by Paul Brebner of Instaclustr.

Due to the Covid-19 Quarantine this meetup will be held via Zoom - a big thanks to Instaclustr for organising. Please register here:


Paul shares his experience of building scalable, real-time geospatial data-processing systems, abstract follows:

When joining the Zoom session:

• Please mute your microphone unless you have a question.
• Please use Zoomchat rather than voice when Paul is presenting.
• Please keep your questions to the end of the presentation.


This presentation will explore how we added location data to a scalable real-time anomaly detection application, built around Apache Kafka, and Cassandra.

Kafka and Cassandra are designed for time-series data, however, it’s not so obvious how they can process geospatial data. In order to find location-specific anomalies, we need a way to represent locations, index locations, and query locations.

We explore alternative geospatial representations including: Latitude/Longitude points, Bounding Boxes, Geohashes, and go vertical with 3D representations, including 3D Geohashes.

For each representation we also explore possible Cassandra implementations including: Clustering columns, Secondary indexes, Denormalized tables, and the Cassandra Lucene Index Plugin.

To conclude we measure and compare the query throughput of some of the solutions, and summarise the results in terms of accuracy vs. performance to answer the question “Which geospatial data representation and Cassandra implementation is best?”


Talk: 1830
Location: Online // Zoom.
Register: https://zoom.us/webinar/register/WN_MLU66lXjTRunOwGVGGIUBw