Scraping data from the web with R and Rvest


Details
This month we'll be learning how to scrape the web using the R programming language and Rvest, a library for harvesting data from the web.
Web scraping is a process of extracting information from a website that might otherwise not be downloadable -- if you can view it on a website, you can harvest it. R is a programming language popular amongst data journalists. We'll be using it's Rvest package to do the scraping, which is part of the Tidyverse -- a set of packages for working with data in R that share common approaches.
We'll cover the basics of HTML, and how to select specific elements on a page, and then how to use Rvest to get the data out.
All of our events are suitable for beginners, and no programming experience is required. Bring a laptop along as this a practical, hands-on workshop. Please also download and install the R programming language and RStudio Desktop before you arrive. Alternatively you can use RStudio on the web without installing anything by signing up for Posit Cloud.
Before the event, check out the shared doc we'll be using, and sign up for a Dropbox account if you don't already have one so you can edit it. Then please add links to the show and tell section! These can be great data stories you've seen, new tools, jobs you’re hiring for, announcements -- anything others might be interested in. We'll check out what everyone has shared at the start of the event.
Schedule
7:00 🚪 Doors open
7:30 🗣 Show and tell
7:40 💻 Tutorial
9:00 🍺 Drinks at the Bricklayer's Arms

Scraping data from the web with R and Rvest