Web Scraping Part I: In-Browser Scraping and Working with XPath

This is a past event

20 people went


This two-part workshop aims to introduce attendees to Web Scraping (https://en.wikipedia.org/wiki/Web_scraping), a technique to automate extracting data from websites.

This part one is an introduction to the concepts, using browser extensions to quickly get started on scraping. No programming experience is required.

What you'll learn:

• What is web scraping and why is it useful

• Use browser extensions and web tools to quickly scrape data off a web page

• Use XPath/XQuery to select elements on a page

• Export extracted data to file to process in OpenRefine, Excel or other software

Part two, on November 10 (https://www.meetup.com/code4libtoronto/events/234501201/), will be a deeper dive into web scraping with the Python programming language.

If you attended our Library Carpentry workshop in July 2016 (https://code4libtoronto.github.io/2016-07-28-librarycarpentry/), this will be a recap of the Web Scraping lesson.

Everyone is invited, although some familiarity with HTML/XML will be helpful.

Required software

• Bring your own laptop to the event

• Make sure you have the Chrome browser installed on it

• Download and enable the Scraper extension (https://chrome.google.com/webstore/detail/scraper/mbigbapnjcgaffohmbkdlecaccepngjd)

Note: Registration is capped at 20 attendees. You can indicate your interest by RSVP'ing through Meetup, however this will *not* guarantee your spot in the workshop.

In order to confirm your registration:

To guarantee your spot at the workshop, please email Kim Pham at [masked] with the following details:

1. Full Name

2. Contact email

3. Title (Student, Librarian, Archivist, N/A etc.)

4. Organization (if applicable)

5. Why are you interested in this workshop?

6. What is your previous experience with web scraping, HTML, XML, and Python?