Skip to content

Details

Every AI pipeline starts with one question: where does the data come from?

In this hands-on workshop, we'll answer that question using Scrapy, one of Python's most powerful web scraping frameworks. You'll go from a blank project to a working spider that harvests structured data from the web, ready to feed into your next AI or data science project.

We'll cover:
• Scrapy vs. lighter tools (Requests & Beautiful Soup), and when each makes sense
• Extracting data from HTML using CSS selectors
• Following links and handling pagination at scale
• Writing clean, structured output to a file
• The ethical and legal side of web scraping

Bonus (Gold Star): a chapter on scraping JavaScript-rendered pages for those who want to go further.

🐍 Python basics assumed — no prior scraping experience needed

Agenda
- 18:00 Doors Open
- 18:30 Start of the Workshop
- 20:15 Workshop Closing & Announcements
- 20:30 Networking
- 21:00 Event Closing

GitHub Repo
Scalable Data Harvesting for AI

Stream
YouTube Stream

📧 Contact
Are you interested in speaking at one of our events? Have a good idea for a Meetup? Get in touch with us at [amsterdam@pyladies.com](mailto:amsterdam@pyladies.com)

​💬 Find us on the PyLadies Global workspace:

  1. https://slackin.pyladies.com enter your email address.
    Accept the email invitation
  2. ​Go to workspace https://pyladies.slack.com
  3. ​Join channel #city-amsterdam
  4. Scalable

Related topics

Events in Amsterdam, NL
Data Mining
Open Source Python
Computer Programming
Women in Technology
Web Crawling And Scraping

You may also like