• Data Processing and SecOps at Scale


    Hi Everyone! Were back with our first meetup of the year and have the CTO from Humio joining us to discuss data processing and SecOps. As always the venue and beer/pizza is kindly provided by OpenCredo! Plan! 6.30pm - Arrive - Beers, Pizza and Socialising 7pm - Talk - Kresten Thorup - CTO @ Humio - Data Processing and SecOps at Scale (A Humio Case Study) Humio is a log aggregation and data processing system designed for sec-ops and dev-ops users. Fundamentally designed as a streaming timeseries-text data engine, it is perfectly suited for high-volume log processing. In this talk, we will walk through how one of our customers use Humio as a central component in their security and incident response infrastructure, doing live-processing of live logs from some[masked] desktop PCs to identify malware and bad actors in a changing environment. This case study will explore the impact of processing data where the vast majority of the stored data is never actually directly retrieved, instead operating with most of the data being processed in-flight on arrival. This affords a data processing architecture that combines stream processing of live data and aggressive compression and time-only indexing of stored data. We explore how this trade off provides the ability to vastly outperform indexing-heavy solutions in both cpu and disk capacity load. 8pm More beer pizza etc and to the pub! Hope to see you all there.

  • Spark and Hazelcast Jet


    Hi Everyone! Were back with our last meetup before xmas and have a really interesting speaker lined up! Ben Evans - Co-Founder of jClarity will be joining us to present on some of the work he has been doing with Spark and Hazelcast Jet! Plan! 6.30pm - Arrive - Beers, Pizza and Socialising 7pm - Talk - Ben Evans - - Gambling With Leopards This is a fairly light-hearted talk, where I provide an experience report of a real application - BetLeopard - which is a reference implementation of an open-source horse racing engine. I use it to showcase some different ways of approaching a calculation problem - first by using Java 8 lambdas, then by using Hazelcast IMDG with Apache Spark for processing and then again with Hazelcast Jet. 8pm More beer pizza etc and to the pub! Hope to see you all there.

  • Cockroach Labs: The Hows & Whys of a Distributed SQL Database

    Hi Everyone! Very excited to announce our October meetup! We've got the creators of CockroachDB (https://www.cockroachlabs.com) in town to tell us all about "The Hows & Whys of a Distributed SQL Database" The Plan: 6.30pm - Arrive - Beer, Pizza and Socialising 7pm - Talk - Raphael Poss - Software Engineer @ Cockroach Labs Title: The Hows & Whys of a Distributed SQL Database Abstract: Until recently, developers have had to deal with serious tradeoffs when picking a database technology. One could pick a SQL database and deal with its eventual scaling problems or pick a NoSQL database and have to work around its lack of transactions, strong consistency, and/or secondary indexes. However, a new class of distributed database engines is emerging that combines the transactional consistency guarantees of traditional relational databases with the horizontal scalability and high availability of popular NoSQL databases. In this talk, we'll take a deep dive into the key design choices behind one open source distributed SQL database, CockroachDB, that enables it to offer such properties and compare them to past SQL and NoSQL designs. We will look specifically at how to achieve the easy deployment and management of a scalable, self-healing, strongly-consistent database with techniques such as dynamic sharding and rebalancing, consensus protocols, lock-free transactions, and more. 8pm - Finish - To the pub! Look forward to seeing you all there!

  • SixFifty's Story: Why Are UK Elections So Hard To Predict?

    Hi Everyone! We're back our next event, this time reflecting on recent political events and how Data and Machine Learning can influence it. Thanks for John from SixFifty (https://sixfifty.org.uk/) to do the talk! The Plan: 6.30pm - Arrive - Beer, Pizza and Socialising 7pm - Talk - John Sandall: "SixFifty's Story: Why Are UK Elections So Hard To Predict?" When Theresa May announced plans on April 18th for the UK to hold a general election it was met with much cynicism. However, as self-confessed psephologists (and huge fans of Nate Silver's FiveThirtyEight datablog), we instead were thrilled at the opportunity. SixFifty (https://sixfifty.org.uk/) is a collaboration of data scientists, software engineers, data journalists and political operatives brought together within hours of the snap general election being announced. Our goals: • Understand why forecasting elections in the UK using open data is notoriously difficult, and to see how far good statistical practice and modern machine learning methods can take us. • Make political and demographic data more open and accessible by showcasing and releasing cleaned versions of the datasets we're using. • We also hope that by communicating our methodology at a non-technical level we will contribute to improving statistical literacy, especially around concepts fundamental to elections, polling and open data. In this talk we will cover our approach to creating an open polling data pipeline, the challenges we faced especially around data provenance, the infrastructural design decisions made to remain lean under strict resource and time limitations, and the various technologies used to transform PDF polling tables into an election forecast more accurate than any other published prediction using open data. 8pm -More beer, pizza and socialising 8.30pm - To a local watering hole Hope to see you all there!

  • August Lightning Talks!


    Hi Everyone! We're back with our second meetup. Given its August and a lot of people are away, we have decided to keep it light with a few lightning talks followed by discussion over beer and pizza! If you are keen to do a lightning talk please reach out! Plan - 6.30pm - Arrive - Beers/Pizza/Socialising 7pm - Lightning Talks Talk 1 - Tareq Abedrabbo - Designing Apis for Data Driven Systems Talk 2 - David Dawson - Practical Event Systems - Microservices for the Data Architect Talk 3 - Alison Wells - Orchestration of Data Flows using Airflow 8pm - More beer, pizza and Discussion! We can either move to the pub or stay at the office! Hope to see you all there!

  • Applied Data Engineering #1


    Hi Everyone! Were really pleased to announce our first meetup and get the ball rolling! Thanks to the guys at OpenCredo for hosting and providing the beer, pizza AND speaker! Plan is as follows; 6.30pm - Arrive - Beer, Pizza and Socialising 7pm - Talk - David Borsos - Lead Consultant @ OpenCredo "Detecting stolen AWS credentials with Spark Structured Streaming" In this talk I will present the idea of finding anomalies in an event stream generated by Amazon Web Services. The goal is to detect leaks of access credentials resulting in the creation of additional extra infrastructure and extra cost. I am going to explore the details of how such an architecture looks like and how can it be composed of modern stream processing tools such as AWS Kinesis Streams and Spark Structured Streaming 8pm - More beer, pizza and socialising 8.30pm - To a local watering hole Hope to see you all there!