Scaling data pipelines at Connected Home / Master BigQuery & Redshift by Blendo


Details
Hello wonderful big data developers and enthusiasts. We hope this email finds everyone well!
We are happy to announce our second event for 2017! This time, we welcome Angelos Petheriotis (https://gr.linkedin.com/in/apetheriotis) (Big Data Engineer at Connected Home (https://www.hivehome.com/) of the Centrica (https://www.centrica.com/) / British Gas (http://www.britishgas.co.uk/)) who will present us the lessons learned during the design of pipelines that handle billions of messages a day through the use of Kafka Connect and Kafka Streams. Our second speaker is Kostas Pardalis (https://gr.linkedin.com/in/kostaspardalis) (Co-Founder and Software Engineer at Blendo.co (https://www.blendo.co/)) who will talk about the pros and cons of Google BigQuery and Amazon Redshift, two very popular analytical data warehousing technologies.
Our venue is the auditorium on the ground floor of ALBA Graduate Business School. The venue has around 110 seats but there is space for people to stand as well. Please RSVP early but do remember to keep your RSVP up to date to allow other people who would like to attend a chance to come if your plans change.
We are really looking forward to seeing you there!
Adrianos (https://www.linkedin.com/in/adrianosdadis) | Euangelos (https://www.linkedin.com/in/eualin) | Stavros (https://www.linkedin.com/in/stavroskontopoulos)
Agenda:
1st Talk: Processing 4 Billion Messages a Day: Lessons Learned
Designing a pipeline that handles billions of messages from IoT devices offers exciting challenges to engineers. The system needs to operate at scale and recover from failures seamlessly in order to reliably deliver content to the rest of the company and the customers.
In this talk we are analyzing how the Connected Home data back-end has been designed as an Event Based system running on top of Kafka. Furthermore we are going to describe why we are replacing our Spark pipelines with Kafka Connect and Kafka Streams and the tools we use around this new ecosystem.
We are going to conclude with describing how we collaborate with our data scientist team, how theirs models get into production pipelines and what lessons learned from our journey in implementing and operating the system.
Angelos Petheriotis is a Scala enthusiast who enjoys working on fast-data projects and particularly using the Spark and Kafka ecosystem. His passion is mostly on back-end development, mostly for high-performance, distributed and scalable systems. He has significant exposure in writing multithreaded applications and he has been involved in the analysis and re-design of systems in order to improve performances.
https://secure.meetupstatic.com/photos/event/9/6/f/7/600_459878647.jpeg
2nd Talk: Amazon Redshift Vs Google BigQuery
Some of the most interesting innovation in cloud computing is taking place in the space of analytical data warehousing. Google and Amazon are leading the race with Redshift and BigQuery respectively. In this presentation we will go through the pros and cons of both technologies, point out their similarities and differences and see how these affect the life of both data engineers and data analysts.
Blendo is a new breed of integration-as-a-service platforms that enables companies to extract a multiple of data (sales, marketing, product, customer support, etc) from different cloud services, integrate it and load it into their own cloud-based data warehouses for analysis.
https://secure.meetupstatic.com/photos/event/9/6/1/e/600_459878430.jpeg
Schedule:
7:00-7:15 - Socialising
7:15-7:20 - Welcome
7:20-8:05 - 1st Talk
8:10-8:55 - 2nd Talk
9:00++ - Drinks and Pizzas
Sponsors:
A massive thank you to our sponsors:
https://secure.meetupstatic.com/photos/event/9/7/8/0/600_459878784.jpeg
Wanna Join?
We are always looking for speakers for our meetups. If you would like to give a talk this year please contact with Adrianos, Euangelos or Stavros.

Scaling data pipelines at Connected Home / Master BigQuery & Redshift by Blendo