Data Engineers Meetup: PySpark 101 & using LLMs in data Pipelines


Details
Join us at Prefect to talk about Data Processing with Apache Spark!
Data Engineers DC is a professional group that meets monthly to discuss topics including all things related to Data Engineering such as open data, data gathering, data munging, and the creation, storage and maintenance of datasets. We combine presentations with hands-on workshops, always seeking to make our data munging lives easier.
---
Location:
Prefect - 2112 Pennsylvania Avenue NW
Washington, DC 20037
Note: Bring a photo ID, as these are required by building security.
Agenda:
5:30-6:15pm: Food & Networking
6:16-6:30pm: Introductions
6:30-7:00pm: PySpark 101 - Mike Jadoo
7:00pm-7:30pm: Using LLMs in Data pipelines
Talk: PySpark 101 - Mike Jadoo
Unlock the pyspark for big data. This is a beginner-friendly presentation designed to introduce you to Apache Spark, a fast and scalable distributed computing framework. This talk covers the fundamentals of PySpark, including:
• Apache Spark Overview – Understand the core concepts and benefits of Spark for big data processing.
• PySpark Essentials – Learn about RDDs (Resilient Distributed Datasets) for distributed computation and DataFrames for optimized, structured data handling. Using SQL.
• Machine Learning with MLlib – Explore basic Spark’s scalable machine learning library for analytics and predictive modeling.
Perfect for beginners in data engineering and analytics, this course will equip you with the foundational skills to process and analyze large datasets efficiently using PySpark.
Talk: Using LLM's in Data Pipelines - Rahul Singh
---
Data Engineers DC is a program of DC2. Learn more at www.dc2.org
---
Sign up to give a 5 minute micro-presentation! https://forms.gle/8YSBJr5LGsfr3qKMA
Fun meetup syntax bug:
Link: https://forms.gle/8YSBJr5LGsfr3qKMA
Text formatted as a link: link
Link formatted as a link: https://forms.gle/8YSBJr5LGsfr3qKMA
(Data processing is hard. If you've found bugs like this in your pipeline, sign up using the above link to talk about it for five minutes!)

Data Engineers Meetup: PySpark 101 & using LLMs in data Pipelines