• August Edition, Sydney Data Engineering meetup

    Deputy have kindly invited us to their Ultimo offices. 🏠 Host sponsor: Deputy 🍕 Food and Drink sponsor: Deputy Speakers include: 🎤 Claire Carroll - Community Manager, Fishtown Analytics (DBT) Data engineers shouldn’t write DDL Writing DDL to transform data in a warehouse is usually considered within the domain of a data engineer. In this talk I’ll talk through why this task is always more complex than it seems, and how you can avoid it altogether. I’ll also discuss the other changes in the data stack that are contributing to a changing role of a data engineer, and where I see this role heading in the future. 🎤 Syed Jaffry - presenting ingredients for a successful data lake and then will dive into a demo of AWS Lake Formation! 🎤Ingrid Anzola - Principal Data Engineer, Deputy Data Engineering is evolving with such a fast pace and, with the increasing demand for the role across industries, I want to share my experience during the last 24 months, hiring and being hired for the right position. Why so many Data Engineer candidates don't succeed during the interviews and how to assess the right balance between the proper mindset vs. experience. In this talk, I want to start the conversation of the do's and don'ts in the Data Engineering hiring process and some tips to set your long term relationship once you have found the right match! I have had the privilege to see the evolution of data management within companies since the 90s and I think Data Engineers are here to stay for a long time. Stay tuned for further details.

  • July Edition, Sydney Data Engineering meetup

    MongoDB Australia

    MongoDB have kindly invited us to their Haymarket offices. Speaker 1: Jessica Flanagan - Chief Data Architect, Deckard Technologies Data pipelines need to be flexible, modular and easily monitored. They are not just set-and-forget. The team that monitors a pipeline might not have developed it and may not be experts on the dataset. End users must have confidence in the output. This talk is a practical walkthrough of a suggested pipeline architecture on AWS using Step functions, Spot instance, AWS Batch, Glue, Lambda and Data Dog. I'll be covering techniques using AWS and DataDog, but many of the approaches are applicable in an Apache Airflow/Kibana environment. I have 15 years software development experience with the last ten years spent working with medium to large data. I have a passion for exploration, optimisation and best practices. I enjoy developing and mentoring teams and experimenting with new technology. I prefer to be technology agnostic and am focused on finding the right solution for a problem. Speaker 2: Sam Harley - Senior Solutions Architect, MongoDB What’s New in MongoDB v4.2 Take a closer look at the key capabilities introduced at MongoDB World 2019, and included as part of the MongoDB version 4.2 release; including a deeper discussion around distributed transactions, the Atlas Data Lake, Atlas Full Text Search capability, field level encryption, native K8s integration, materialised views, wild card indexes and much, much more. Speaker 3: Masadur Rahaman Sayem - Solutions Architect, AWS Use case of a ledger database. An introduction to Amazon QLDB. In this session, we talk about the kinds of problems that Amazon Quantum Ledger Database (QLDB) can solve and answer your questions about when and why you would use a ledger database. Join expert guest Masudur Rahaman Sayem, Solutions Architect at AWS, to learn about benefits and use cases for Amazon QLDB. Stay tuned for further details on the speaker lineup!

  • June edition, Sydney Data Engineering meetup

    Microsoft Reactor

    Microsoft Reactor have kindly offered to host us this month. Speaker 1: Data Wrangling for ETL enthusiasts Mohamed Kabiruddin Data wrangling is a significant problem when working with big data, especially if you haven’t been exposed to the challenges it brings, or you don’t have the right tools to clean and validate data in an effective and efficient way. In this session, we'll explore data wrangling concepts and Azure data services to achieve the desired data transformations. This session will take you through data ingestion-transformation techniques and tools available to build your modern data platform on Azure. We'll dive deep into using Azure Data Factory (dataflows) and Azure Databricks to transform data to meet your enterprise business needs. Speaker 2: Mick Badran, handled.com.au Learn about how you go about creating, extending and calling Azure ML Web Services as well as the various forms of integration you can use to include them in your solutions. Mick is currently a Microsoft Azure MVP with a passion for MS integration technologies such as BizTalk, Azure & ServiceFabric, having an equal balance between the consulting and training worlds. Prior to this, Mick was a BizTalk MVP. Mick has been a Technical Integration Specialist for the last 15 years with hands on project/solution/architect experience using middleware. He's been there in the trenches and seen the good, the bad and the ugly! Remember to bring some awesome questions!

  • May Edition, Sydney Data Engineering meetup

    WeWork George St

    Hydrogen Group have kindly offered to host us at their WeWork offices for the May meetup. Fast Data Engineering - Peter Hanssens Serverless has been associated with low cost, but low latency? Peter will take you through event driven processing and a hypothetical live scoring scenario for an e-commerce site to show you what gotcha's you can find when doing the data engineering of integrating data science in an application. Peter Hanssens is 5x AWS certified including CSA Pro and Big Data and is the found of the Sydney Data Engineering meetup. Why is Data Engineering so hard? - David Tout, Machine Beam. A quick review of some of the complexities involved in seemingly simple tasks, and a reminder of the basic data principles Data Engineers often forget. David has well over a decade of experience managing data pipelines & teams for sports data analysis. He is now sharing his knowledge to the world through a newly founded Data Consultancy; https://www.machinebeam.com. From newbie to Data Engineer, Nikita Sharma My journey and learnings through data engineering. A getting started talk for beginner data engineers. Nikita is pursuing Masters of Data Science at UTS Doors to open at 6pm - see you there!

  • Sydney Data Engineering Meetup, April Edition

    IAG / Darling Park

    IAG have kindly offered to host us this month at their Darling Park offices. 1st Presenters: Simon Aubury & Kieran Clulow Turning the wheel - Streaming 4.4 billion events with Apache Kafka Learnings from building a single view of the customer for IAG. Realtime customer, policy and vehicle information using Apache Kafka and MongoDB 2nd Speaker: Nikita Sharma From newbie to Data Engineer My journey and learnings through data engineering. A getting started talk for beginner data engineers. Nikita is pursuing Masters of Data Science at UTS 3rd Speaker: Grant Ingersoll, CTO and co-founder of Lucidworks Search as the new Data Warehouse Examine the mapping of search engine capabilities onto modern data warehousing and data lake needs and weigh the pros and cons of such an approach by looking at use cases and capabilities of open source engines like Apache Solr and Elasticsearch. Grant Ingersoll is the CTO and co-founder of Lucidworks as well as an active member of the Lucene community – a Lucene and Solr committer, co-founder of the Apache Mahout machine learning project, and a long standing member of the Apache Software Foundation. Grant is also the co-author of “Taming Text” from Manning Publications. Doors will open at 6pm - remember to bring along some great questions!

  • Sydney Data Engineering Meetup - March Edition

    Google Australia

    Google have kindly invited us to their Pyrmont offices for the March edition of the Sydney Data Engineering meetup. We've got 3 exciting talks on for the night so it'll be one that you won't want to miss out on. Talk 1: Data Engineering to predict the next 5 mins of a cricket game. Speakers: Drew Jarrett & Shu Lu (Customer Solutions Engineers at Google) Talk 2: Itzik Feldman, Data Engineering Product Manager Data Engineering, 2019 and Beyond Talk 3: Fallacies of Distributed computing and it's impact on cloud based solutions Presenter: Amo Abeyaratne, Practice Lead - Data & Analytics, Google Cloud Professional Services Quick note on the night: Pick up your name badge from the reception at ground floor and proceed to level 2 for the Data Engineering meetup. There will be refreshments at the venue.

  • January Data Engineering Meetup @ Atlassian


    Atlassian have been kind enough to host us again in January to get us going again in 2019. 1st Speaker - Yash Sharma, Data Engineer at Atlassian & Apache Committer. Yoda : Data Quality at Atlassian. Atlassian's data quality journey and challenges. Ensuring quality of data as Atlassian grows and moves towards self serve. 2nd Speaker - Simon Aubury, Data Engineering Architect at IAG Simon Aubury is a Data Engineer Architect at IAG. The rest of the time he’s playing with IoT and random project hacking. Using KSQL, Apache Kafka, a Raspberry Pi plus a software defined radio to find the plane that wakes my cat 3rd Speaker - Brendan Haire, Head of Search and ML at Atlassian Journey to ML at Atlassian Atlassian’s journey and challenges in applying ML techniques to improve their products starting with a smarter @mentions service. Everything from defining and measuring success through to data wrangling, battling organisational and architectural complexity to finally getting something that customers can use. Doors will open at 6pm and will commence talks shortly after some refreshments. See you there!

  • December Sydney Data Engineering Meetup

    Canva have kindly invited us to host the meetup at their wonderful offices again - this time across the road from the previous location. We will have some great speakers for this meetup so stay tuned for further details! 1st Speaker - Greg Roodt from Canva Introduction to Argo: Kubernetes native workflows and pipelines 2nd Speaker - Louis Rankin from Rezdy Louis will share some of the key learnings from getting a data engineering practice up and running while the business tries to scale and will cover off the platforms used and the thinking behind those decisions. Louis' role at Rezdy is to help shape how Insights and Data can be used to drive growth. 3rd Speaker - Guy Needham from Servian Guy will introduce some key concepts for stream processing and give an overview of using Apache Flink for handling large streams at Fairfax Media. Guy is a data engineer with experience in distributed processing and currently works as a consultant at Servian. Please note the earlier start time - doors will open at 5.30pm and the new location (which is where we held it last time at 2 Lacey St).

  • November Sydney Data Engineering Meetup

    Campaign Monitor

    Campaign Monitor have been gracious enough to host us in their offices this month. 1st Speaker - Pranavi Chandramohan, Campaign Monitor Navigating the Stream With increased demand from businesses and customers to provide real time analytics and reporting, the need for stream processing has become inevitable. At Campaign Monitor, we have started to move from batch processing towards a stream processing model to provide real time reports to our customers. There are many ways to implement stream processing out there; in this talk we will take a look at Kafka streams, Spark structured streaming and Spark Streaming. 2nd Speaker - Deependra Shekhawat, AWS Apache Airflow On AWS - Lessons Learned. In this talk we will share key best practices that we have seen data engineers adopt while deploying apache airflow on AWS. We will specifically discuss how we used AWS ECS (Fargate) to deploy a production Airflow setup using the CeleryExecutor, the lessons we learnt while operating the environment and share some key best practices while deploying and operating an Airflow installation. Deependra Shekhawat is a Techincal Account Manager (TAM) at AWS Sydney. In his role as TAM he has been working very closely with AWS Enterprise Support Customers in Australia and specifically with data engineering teams to help build production grade data processing pipelines using open source software and AWS services. 3rd Speaker - Mike Seddon, AGL Introducing Arc (https://aglenergy.github.io/arc/), an opinionated framework for defining predictable, repeatable and manageable data transformation pipelines. Mike is a Senior Data Engineer at AGL Energy and has co-developed the Arc framework.

  • Rescheduled October Data Engineering Meetup, Sydney

    Airtasker have kindly offered to host us this month. We have 3 awesome speakers: - Dan Gooden - Claire Carroll - Nick Wienholt ****************************** 1st Talk - Dan Gooden: Testing Patterns in Code Driven SQL Data Pipelines Consistent and automated testing builds confidence in datasets, catches change in upstream systems, and ensures reliability so you can build more complex models safely. In this talk I'll cover ideas I've developed over the past few years about useful testing patterns in fast moving, small data teams writing code driven SQL pipelines. Dan Gooden is the Data Lead at Airtasker, where he is responsible for ensuring the company leverages data internally to discover valuable insights, and externally for the benefit of its users of our platform. He has a keen interest in ensuring data has a meaningful relationship to the activities that companies undertake in the world. Before Airtasker, Dan worked for the Domain Group as the Data Engineering Platform Lead, where he was responsible for creating and managing a team that built the data warehouse. Prior to that he contracted for many years in the DW & BI space. ****************************** 2nd Talk - Claire Carroll Sharing beautiful data documentation One of the hardest parts of building a data-driven culture is making sure everyone is speaking the same language – in essence, answering the question “what does this number mean, and where does it come from?” Attempts to share this knowledge usually come in the form of building a “databook”, either built as a bespoke solution, or by using off the shelf products like Confluence. In this talk, I’m going to demonstrate how open source tool dbt has solved this problem. --- Claire is a Data Analyst at Airtasker, and Community Manager for dbt. ****************************** 3rd Talk - Nick Wienholt: Designing and implementing an automated trading system based on many disparate data sources, using multiple machine learning models and executing across multiple exchanges is an interesting engineering challenge, and one in with reference architectures are very much at the embryonic stage. In this presentation, Nick will present a complete architecture based on a number of open-source tools including Redis, Kafka and Spark, and examine a number of the possible design approaches. Nick is a consulting data and quantitive engineering based in Sydney. With a focus on high volume trading systems based on machine learning and alternate data, Nick enjoys working with a variety of clients on both the buy- and sell-side in the financial market and gaming industry. ****************************** We have our own slack group and website which you can find out more details about here: https://sydneydataengineers.github.io/