Big data with AWS Glue and Athena for understanding NYC taxi data


Details
In this workshop, we will learn cataloging datasets using AWS Glue crawlers. We will interactively author ETL scripts in SageMaker notebook on our local machine while being connected to an AWS Glue development endpoint. We will then deploy ETL scripts into production by turning your ETL script into managed AWS Glue jobs and add appropriate AWS Glue scheduling and triggering conditions. Finally, we will query these new datasets from Amazon Athena using AWS Glue Data Catalog.
Athena cost https://aws.amazon.com/athena/pricing/ ~ 6cents for 10 queries
AWS Glue Cost https://aws.amazon.com/glue/pricing/ ~ 50cents for one hour execution
# Prerequisites
We will use Cloud 9. Alternatively, you might want to set up your local machine. This is a longer process and is explained https://blog.programming-tools-meetup.cloud/dev-machine-setup/
We would like to thanks Vanguard for the venue and the catering.
This workshop is modified version of Reinvent 2018 talk https://www.slideshare.net/AmazonWebServices/serverless-data-prep-with-aws-glue-ant313-aws-reinvent-2018

Big data with AWS Glue and Athena for understanding NYC taxi data