Skip to content

Details

This will be an in-person event at The Graduate (adjacent to The Dirty Duck), University of Warwick. Shared pizzas and snacks will be provided. If you would like any drinks or additional food, these will be available from the Dirty Duck and will have to be covered by yourself.

Our speaker, Nicole Schwitter, will introduce how R can be used to collect data from web pages. See full details below.

Speaker:
Nicole Schwitter, PhD student, Department of Sociology, University of Warwick

Abstract:
The rise of the internet and mass digitalisation have led to vast amounts of digital data in recent years. These novel digital sources of data are used to gain new insights into old and new questions of data-driven sciences: From election results to press releases, social media posts or user reviews, research now often makes use of data that is online. Many modern commercial websites await user input and/or display dynamic web content which is generated on the fly via JavaScript technologies. Standard techniques of web-scraping which are well-suited to collect information from static pages will fail in these instances. In such cases, it is necessary to automate the browser to visit websites, click buttons, and fill in forms by itself - a task the tool Selenium fulfils. This talk will give a brief overview of approaches towards web scraping and an introduction to the R package RSelenium. The presentation will highlight use cases of RSelenium, show its potential, and give a starting point to those who have never used it.

Related topics

Big Data
Data Science
Open Source
User Group
Statistical Computing

Sponsors

R Consortium

R Consortium

Funding

You may also like