Name: Web Scraping with Trafilatura and SpaCy
Start: 2023-02-21T18:30:00-05:00
End: 2023-02-21T20:00:00-05:00
Location: North Bellmore Public Library

You've invested a lot of time and effort in learning how to extract just the good bits of text from a target site using BeautifulSoup. What happens when you want to add data from another target? Will any of your hard work transfer over? Do you really want to find out how many different names someone can give a class in a div tag? Will all your work go down the drain when your target changes their tech? Is there a better way? Yes! We'll play with the Trafilatura library, which was designed to make just-the-text-you-want extraction fast, simple, and universal. We'll use it to get data from some local LI sites, and then use the incredible SpaCy library to find Named Entities in the text.

David Gallagher

PyData Long Island

PyData

Technology

Open Source

Python

Software Development

Web Development

Computer Programming

Data Science

Geospatial

Data Science using Python

Open Source Python

Data Journalism

Citizen Journalism

Data Visualization

Data Analytics

Data Mining

Web Scraping with Trafilatura and SpaCy

North Bellmore Public Library

Share this event

Web Scraping with Trafilatura and SpaCy

Details