Skip to content

Workshop | Introduction to Data Analysis for Aspiring Data Scientists: Part 4

Photo of Karen Bajza
Hosted By
Karen B.
Workshop | Introduction to Data Analysis for Aspiring Data Scientists: Part 4

Details

Join us for Part 4 of our online learning series: Introduction to Data Analysis for Aspiring Data Scientists. This is the final online workshop in this series for anyone and everyone interested in learning about data analysis.

Part 4: Introduction to Apache Spark

Abstract: This workshop covers the fundamentals of Apache Spark, the most popular big data processing engine. In this workshop, you will learn how to ingest data with Spark, analyze the Spark UI, and gain a better understanding of distributed computing. We will be using data released by the NY Times (https://github.com/nytimes/covid-19-data). No prior knowledge of Spark is required, but Python experience is highly recommended.

Who should attend this workshop: Anyone and everyone, CS students and even non-technical folks are welcome to join.

What you need: Although no prep work is required, we recommend basic python knowledge and signing up for community edition prior to joining. Watch Part 1 to learn about Python: https://youtu.be/HBVQAlv8MRQ & sign up for Community Edition here: https://databricks.com/try-databricks

LINK TO JOIN: https://databricks.zoom.us/j/98945262498

Agenda: 10AM PDT - 11AM PDT (GMT-8)

10:00AM - 10:50AM - Workshop led by Kelly
10:50AM - 11:00AM - Q&A

Instructor: Kelly O’Malley is a Solutions Engineer at Databricks where she helps startups architect and implement big data pipelines. Prior to joining Databricks she worked as a Software Engineer in the defense industry writing network code. She completed her BS in Computer Science at UCLA. Outside of the tech world, Kelly enjoys cooking, diy projects, and spending time at the beach.

TA: Denny Lee is a Staff Developer Advocate at Databricks. He is a hands-on distributed systems and data sciences engineer with extensive experience developing internet-scale infrastructure, data platforms, and predictive analytics systems for both on-premise and cloud environments. He also has a Masters of Biomedical Informatics from Oregon Health and Sciences University and has architected and implemented powerful data solutions for enterprise Healthcare customers. His current technical focuses include Distributed Systems, Apache Spark, Deep Learning, Machine Learning, and Genomics.

TA: Brooke Wenig is the Machine Learning Practice Lead at Databricks. She guides and assists customers in implementing machine learning pipelines, as well as teaching Distributed Machine Learning & Deep Learning courses. She received an MS in Computer Science from UCLA with a focus on distributed machine learning. She speaks Mandarin Chinese fluently and enjoys cycling.

Photo of Data + AI Online Meetup group
Data + AI Online Meetup
See more events