In recent years, the role of data analysts working on big data has been expanding. Hive allows analysts to manipulate big datasets. DBT allows data analysts to take ownership of some data engineering processes. More and more tools are coming out that extend the capabilities of SQL, and allow it to be applied in other parts of the data ecosystem. In this interactive workshop, we'll introduce the partitipants to FugueSQL, a lanaguage that allows analysts to work on distributed computing problems. FugueSQL allows users to express computation workflows with a SQL-like language. This allows users to operate on Pandas, Spark, and Dask DataFrames with a language that they are familiar with. We'll demo in Jupyter Notebook how to use FugueSQL along with native Python for end-to-end Extract, Transform, Load (ETL) pipelines.
Kevin Kho is an Open Source Community Engineer at Prefect, a workflow orchestration startup. Previously, he was a data scientist at Paylocity, where on adding machine learning features to their Human Capital Management (HCM) Suite. He was previously a mentor at Thinkful. He also organizes the Orlando Machine Learning and Data Science Meetup.