How To Scrape Web Pages with Python's Beautiful Soup
Details
The web provides us with more data than any of us can read, so we often want to work with that information programmatically in order to make sense of it. Sometimes, that data is provided to us by website creators via a .csv file or through an API. Other times, we need to collect text from the web ourselves.
In this workshop, we'll go over how to work with the Requests and Beautiful Soup Python packages in order to make use of data from web pages. The Requests module lets you integrate your Python programs with web services, while the Beautiful Soup module is designed to make screen-scraping get done quickly. With Python and these two libraries, we'll go through how to collect a web page and work with the textual information available there. We'll also cover some basic etiquette and ethical best practices around web scraping.
Bio
Lisa Tagliaferri is Director of Developer Education at Sourcegraph. She is the author of “How To Code in Python,” and maintains popular educational open source projects. She was previously a researcher in the digital humanities at the City University of New York, MIT, and Harvard.
