Scalable Data Harvesting for AI

Name: Scalable Data Harvesting for AI
Start: 2026-05-28T18:00:00+02:00
End: 2026-05-28T21:00:00+02:00

Hosted by Giulia and Una G.

Una G. is a Super Organizer

PyLadies Amsterdam

Details

Every AI pipeline starts with one question: where does the data come from?

In this hands-on workshop, we'll answer that question using Scrapy, one of Python's most powerful web scraping frameworks. You'll go from a blank project to a working spider that harvests structured data from the web, ready to feed into your next AI or data science project.

We'll cover:
• Scrapy vs. lighter tools (Requests & Beautiful Soup), and when each makes sense
• Extracting data from HTML using CSS selectors
• Following links and handling pagination at scale
• Writing clean, structured output to a file
• The ethical and legal side of web scraping

Bonus (Gold Star): a chapter on scraping JavaScript-rendered pages for those who want to go further.

🐍 Python basics assumed — no prior scraping experience needed

Agenda
- 18:00 Doors Open
- 18:30 Start of the Workshop
- 20:15 Workshop Closing & Announcements
- 20:30 Networking
- 21:00 Event Closing

GitHub Repo
Scalable Data Harvesting for AI

Stream
YouTube Stream

📧 Contact
Are you interested in speaking at one of our events? Have a good idea for a Meetup? Get in touch with us at [amsterdam@pyladies.com](mailto:amsterdam@pyladies.com)

💬 Find us on the PyLadies Global workspace:

https://slackin.pyladies.com enter your email address.
Accept the email invitation
Go to workspace https://pyladies.slack.com
Join channel #city-amsterdam
Scalable

PyLadies Amsterdam

Scalable Data Harvesting for AI

PyLadies Amsterdam

Details

Related topics

You may also like