Big Data in Motion


Details
๐ ๐๐ข๐ ๐ก๐ญ. ๐ ๐๐จ๐ฐ๐๐ซ๐๐ฎ๐ฅ ๐๐๐ญ๐ ๐๐ฏ๐๐ง๐ญ๐ฌ.
Join us this ๐๐๐ฒ ๐๐ at IT Step Academy for a jam-packed session featuring:
๐๐ฉ๐ฅ๐ข๐ง๐ค ๐ข๐ง ๐๐ซ๐๐๐ญ๐ข๐๐
A meetup to explore how organizations tackle one of the toughest challenges in data engineering: resolving messy, duplicate, and fragmented records across massive datasets. This session will walk you through the real-world implementation of scalable entity resolution using ๐๐ฉ๐ฅ๐ข๐ง๐ค, a powerful Python library built for probabilistic record linkage. Learn key techniques such as blocking, comparison strategies, and inference generation โ all based on the Fellegi-Sunter model.
๐๐๐๐๐ฅ๐๐ซ๐๐ญ๐ข๐ง๐ ๐๐๐ญ๐ ๐๐ซ๐จ๐๐๐ฌ๐ฌ๐ข๐ง๐ ๐ฐ๐ข๐ญ๐ก ๐๐ฒ๐๐ฉ๐๐ซ๐ค
โ It is the 2nd session of the ๐๐๐ฟ๐ถ๐ฎ๐ป๐ฃ๐ ๐๐ฎ๐๐ฎ ๐๐ผ๐ต๐ผ๐ฟ๐ ๐ฆ๐ฒ๐ฟ๐ถ๐ฒ๐. Get hands-on with distributed data processing using PySpark, the Python API for Apache Spark. This workshop is perfect for Python users who want to scale their data workflows beyond single-machine environments. You'll explore Sparkโs architecture, run transformations on large datasets, and gain practical experience with Spark DataFrames, aggregations, joins, and performance optimizations.
Perfect for students, analysts, Python users, and data professionals!
Date: May 31, 2025
Time: 5:30 PM โ 9:00 PM
Venue: IT Step Academy Philippines
Registration:
FREE for the meetup
PHP 100 for the workshop
Sign up now โ https://forms.gle/kUs55tq1mBvPEkG7A
Come for the knowledge, stay for the community!

Sponsors
Big Data in Motion