

About us
Knowledge sharing and networking group for data engineers in Greater Toronto Area. We do twice a month expert-led webinar sessions that cover important and valuable skills/knowledge a data engineer, an AI engineer and a BI engineer should have.
Website: https://www.dataengineersintoronto.org/
Fabric Community page with occasional benefits: https://community.fabric.microsoft.com/t5/Data-Engineers-In-Toronto/gh-p/DataEngineersInToronto
LinkedIn Page: https://www.linkedin.com/company/data-engineers-in-toronto/
LinkedIn Group: https://www.linkedin.com/groups/18627021/
Most meetings are virtual meetings on Microsoft Teams and the joining link is https://vip.dataengineersintoronto.org/webinar
If interested in presenting, fill out your session details on Sessionize, https://sessionize.com/data-engineers-in-toronto
Upcoming events
6

Building Scalable SCD Type 2 Pipelines in MS Fabric DW Using T-SQL
·OnlineOnlineData Engineers in Toronto June 2026 Semimonthly Meeting
Topic: Building Scalable SCD Type 2 Pipelines in MS Fabric DW Using T-SQL
Abstract:
Implementing Slowly Changing Dimension (Type 2) at scale is critical for maintaining historical accuracy in analytics, but doing so efficiently across billions of rows in Microsoft Fabric Data Warehouse requires leveraging its modern ingestion and optimization capabilities.In this session, we’ll build a Fabric-optimized SCD Type 2 pipeline using pure T-SQL patterns. We’ll start by comparing ingestion strategies like OPENROWSET for schema-on-read exploration and COPY INTO for high-throughput, parallelized loading—and explain why COPY INTO is the preferred method for large-scale ingestion in Fabric Warehouses.
Next, we’ll implement incremental load logic without MERGE (since Fabric does not currently support the MERGE statement) by using UPDATE + INSERT patterns combined with hash-based change detection and filtered indexes for current-row lookups. We’ll also cover performance accelerators like batching, minimal logging, and distribution strategies to maximize query performance.
Finally, we’ll demonstrate a full end-to-end pipeline:
- Discover external Parquet/CSV files with OPENROWSET
- Ingest into Fabric Warehouse using COPY INTO
- Apply SCD Type 2 logic using merge-like T-SQL patterns for historical trackingYou’ll leave with a production-ready template and a Fabric-specific performance playbook for handling incremental loads at scale with minimal friction.
Speaker: Jean Joseph, Principal Data & AI Engineer @Tech-Insight-Group LLC
Speaker Profile:
Jean Joseph is a seasoned consultant and senior technical trainer specializing in data engineering and artificial intelligence. With a strong background in database design, administration, and cutting-edge data technologies including machine learning and generative AI.He helps organizations build secure, scalable solutions across both legacy systems and modern cloud platforms. Formerly recognized as a Microsoft MVP and senior technical trainer at Microsoft, Jean brings deep technical insight and a passion for teaching.
He’s also a dynamic speaker, mentor, and the founder of the Cloud Data Driven User Group and the Future Data Driven Summit, where he champions innovation and promotes responsible use of emerging tech within the data community.
The meeting is over Microsoft Teams, and the joining link is https://teams.microsoft.com/l/meetup-join/19%3ameeting_NzZkYWIyOTAtODk1MC00MjVmLWJlNjUtNTRiODZmODA2Zjdh%40thread.v2/0?context=%7b%22Tid%22%3a%22bd9727e8-f539-4c76-983c-6c30130c0bee%22%2c%22Oid%22%3a%229e8d5a64-e773-4ca2-90f6-9a266129171e%22%7d
See you at the meeting!
36 attendees
Data Mesh as the Foundation for AI/ML in Financial Services
·OnlineOnlineData Engineers in Toronto June 2026 Semimonthly Meeting
Topic: Data Mesh as the Foundation for AI/ML in Financial Services
Abstract:
Financial institutions want AI/ML at scale, but brittle data pipelines, silos, and compliance demands slow progress. This talk shows how a Data Mesh—domain-oriented ownership, data-as-product, self-serve platforms, and federated governance—becomes the foundation for reliable, reusable ML features and trustworthy models. We’ll map mesh principles to FS use cases—fraud detection, risk, personalization—and show patterns for feature stores, lineage, quality, and access controls that satisfy regulators while accelerating delivery. Attendees will get a pragmatic blueprint: where to start, how to sequence capabilities, metrics that prove value, and pitfalls to avoid on the road from pilots to production.Speaker: Santosh Durgam, Data Engineering & Analytics Leader
Speaker Profile:
Santosh Durgam is a data engineering & analytics leader with 20+ years building governed, high-scale data platforms across retirement/401(k), broader financial services, and healthcare. He leads cross-functional teams that deliver production-grade data lakes, lineage-aware pipelines, and ML-enabled analytics on cloud—translating governance into measurable business outcomes.Recent speaking includes SQL Saturday Minnesota 2025, where he presented “From Ingestion to Insights: Building Robust Data Pipelines in AWS” to an in-person community audience. He has also contributed to international research forums and science conferences, and is invited to speak at ICDPN-2025 (International Conference on Data Processing & Networking), engaging practitioners and scholars on data engineering, governance, and analytics at scale. Santosh actively publishes/curates work via Google Scholar and shares practical playbooks for data quality, metadata/lineage, and operating models that connect data platforms to financial decisioning.
Beyond delivery, Santosh serves the community as a peer reviewer of scholarly work on data/ML methodologies and as a judge/mentor for select industry and academic competitions, reinforcing peer validation and public recognition. He champions modern data culture—mentoring engineers and product leaders, and advocating automation (incl. AI agents) to elevate reliability, speed, and auditability in regulated environments. Santosh recently completed his Executive MBA, sharpening strategy and value-creation at the intersection of data, risk, and growth
The meeting is over Microsoft Teams, and the joining link is https://teams.microsoft.com/l/meetup-join/19%3ameeting_NzZkYWIyOTAtODk1MC00MjVmLWJlNjUtNTRiODZmODA2Zjdh%40thread.v2/0?context=%7b%22Tid%22%3a%22bd9727e8-f539-4c76-983c-6c30130c0bee%22%2c%22Oid%22%3a%229e8d5a64-e773-4ca2-90f6-9a266129171e%22%7d
See you at the meeting!
8 attendees
Introduction to PySpark in Microsoft Fabric
·OnlineOnlineData Engineers in Toronto July 2026 Semimonthly Meeting
Topic: Introduction to PySpark in Microsoft Fabric
Abstract:
With all of the engineering features in Microsoft Fabric, which medium should you use to move and transform data? Low-code data flows and pipelines? Good old relational SQL? What about this newfangled PySpark everyone is buzzing about?If the last option piques your curiosity and you haven't tried it, this is the session for you. I'll cover basic Python principles that will make even complicated Python easy to read. Then I will explain important topics to understand when managing your Spark environment. Finally, I'll showcase Fabric features and community content that can support your next steps in learning to implement PySpark in Fabric.
Speaker: Jared Kuehn, Data Engineer
Speaker Profile:
For over a decade, Jared has been a data engineering consultant implementing Microsoft products. For more than three decades, he has been honing his skills in theater and other performing arts. As a speaker, Jared marries these two disciplines together, creating dynamic presentations that add entertainment to education. This symbiotic relationship can improve presentation engagement, support attendees in knowledge retention, and foster a culture of passion for the data industry.As a speaker, Jared has spoken at events such as:
-Fabcon Vegas (Microsoft Fabric Community Conference)
-DataCon Seattle (Microsoft Data Conference)
-PASS Summit
-SQL Saturdays, Multiple
-Microsoft Fabric Global Online Conference
-Future Data Driven Summit
-GroupBy Conference
-and more!He also runs the YouTube channel DataBard, focused on making Data fun and teaching techniques along the way.
Technically-speaking, Jared has multiple Microsoft certifications in both on-prem and cloud technologies. He has extensive knowledge in:
-Microsoft Fabric
-SQL Query Development and Performance tuning
-Kimball Data Modeling
-ETL design patterns
-Azure SQL DB and Azure SQL MIIn his spare time, he continues to perform in community theater productions, as well as musically for his church. Upon request, he would be happy to assist with more theatrical portions of events, such as performing musical numbers.
The meeting is over Microsoft Teams, and the joining link is https://teams.microsoft.com/l/meetup-join/19%3ameeting_NzZkYWIyOTAtODk1MC00MjVmLWJlNjUtNTRiODZmODA2Zjdh%40thread.v2/0?context=%7b%22Tid%22%3a%22bd9727e8-f539-4c76-983c-6c30130c0bee%22%2c%22Oid%22%3a%229e8d5a64-e773-4ca2-90f6-9a266129171e%22%7d
See you at the meeting!
7 attendees
Running GitHub Actions Offline: Debug CI/CD workflows Locally
·OnlineOnlineData Engineers in Toronto August 2026 Semimonthly Meeting
Topic: Running GitHub Actions Offline: Debug CI/CD workflows right from your local machine
Abstract:
Ever wanted to try out your GitHub Actions without waiting for the cloud to spin up the runners? Well, you can! Testing GitHub Actions locally lets you build, test, and debug your CI/CD workflows right from your machine. Through a tool such as act, you can simulate GitHub’s environment for your workflows locally using Docker. This means no more pushes of dozens of commits just to see if your syntax in the YAML file is wrong. You can make changes, execute, and validate your automation as many times as you want, even on a plane where there's no Wi-Fi. Just install Docker, install act, load your secrets locally, and run commands like act push to test your events in the workflows. That's faster, lightweight, and the greatest productivity booster. Taking GitHub Actions offline is not about isolation; it's about freedom and control over your automation pipeline.Speaker: Steve Yonkeu, Software Engineer & Microsoft MVP
Speaker Profile:
Steve Yonkeu is a backend software engineer dedicated to delivering excellence in architecture, performance, and modularity. For over 5 years he has been leading teams and projects and open source communities worldwide. Steve is also the founder of Django Cameroon and Python Cameroon communities today. He is also an organizer at PyCon Africa and speaker at PyConUS.The meeting is over Microsoft Teams, and the joining link is https://teams.microsoft.com/l/meetup-join/19%3ameeting_NzZkYWIyOTAtODk1MC00MjVmLWJlNjUtNTRiODZmODA2Zjdh%40thread.v2/0?context=%7b%22Tid%22%3a%22bd9727e8-f539-4c76-983c-6c30130c0bee%22%2c%22Oid%22%3a%229e8d5a64-e773-4ca2-90f6-9a266129171e%22%7d
See you at the meeting!
3 attendees
Past events
25

