Skip to content

Distributed Computing Workflows on Big Data with FugueSQL and Python

Photo of Michael duPont
Hosted By
Michael d.
Distributed Computing Workflows on Big Data with FugueSQL and Python

Details

Welcome Pythonistas!

In recent years, the role of data analysts working on big data has been expanding. Hive allows analysts to manipulate big datasets. DBT allows data analysts to take ownership of some data engineering processes. More and more tools are coming out that extend the capabilities of SQL, and allow it to be applied in other parts of the data ecosystem. In this interactive workshop, we'll introduce the partitipants to FugueSQL, a lanaguage that allows analysts to work on distributed computing problems. FugueSQL allows users to express computation workflows with a SQL-like language. This allows users to operate on Pandas, Spark, and Dask DataFrames with a language that they are familiar with. We'll demo in Jupyter Notebook how to use FugueSQL along with native Python for end-to-end Extract, Transform, Load (ETL) pipelines.

Project demo: https://www.kaggle.com/kvnkho/thinkful-workshop-data-analytics

Fugue: https://github.com/fugue-project/fugue
Fugue Docs: https://fugue-tutorials.readthedocs.io/en/latest/README.html

Kevin Kho is an Open Source Community Engineer at Prefect, a workflow orchestration startup. Previously, he was a data scientist at Paylocity, where on adding machine learning features to their Human Capital Management (HCM) Suite. He was previously a mentor at Thinkful. He also organizes the Orlando Machine Learning and Data Science Meetup.

-----

We meet monthly for good discussion and Python shenanigans. You can show off a project you're working on or any problems that we can help solve. We're always looking for people to give lightning, beginner, and skill-based talks. Message us if you're interested in speaking!

You can watch our past meetings at watch.pyorl.org

See you all there 😃

Photo of The Orlando Python User Group group
The Orlando Python User Group
See more events