R data analysis workshop of a massive grammaticality judgements database


Details
Dear all,
The Adventures in R team will be running an R workshop on a recent data set produced by Josh Hartshorne, Josh Tenenbaum and Steven Pinker. This consists of grammaticality judgements of English sentences by over 600,000 English speakers. Most of the participants speak English as a second language, and there is relatively detailed information regarding their linguistic backgrounds (e.g. first language). The data set therefore constitutes an extraordinary resource with which to investigate how the structure / typology of an individual’s first language impacts on their learning of a second. The workshop, hosted by Nick Riches will take place on Tuesday 4th December (week 10), and will run from 12.00 till 2.00 in Room 1.71B in King George VI building.
A large part of the workshop will focus on how to transform the data into a format which makes it easy to analyse. This is a process informally referred to as ‘data-munging’ or ‘data-wrangling’ (https://en.wikipedia.org/wiki/Data_wrangling). R is an exceptionally useful package dealing with large data sets and getting them into the right shape. Towards the end of the workshop we will use generalised linear models to investigate how the participants’ first language impacts on their grammaticality judgements of specific English constructions. The overall aim of the workshop is to equip attendees with sufficient knowledge and skills to analyse the data by themselves and explore their own hypotheses. With this in mind, if there is sufficient interest, the Adventures in R team will organise a further workshop in Semester 2, in which attendees will present and discuss their analyses.
Attendance presupposes basic knowledge of how to use R. We are assuming that participants will know how to install R and R studio, install packages, create and run simple R scripts (e.g. reading in data, and creating new variables). If you need to brush up in this area, Lauren has created a series of excellent tutorials which you can work through (https://verbingnouns.github.io/notebooks/rfficehours/tutorial-1.html). We also aim to run some more introductory sessions earlier in the semester (keep a look out for emails). The room does not contain any computers and participants are expected to bring a laptop with R and RStudio installed and working correctly. Generally we find that R works better on standalone computers rather than campus networked PCs, though occasionally there can be installation headaches and hiccups.
If you wish to attend please signal your interest via the meetup group. If you are not a member of this group, please join.
Places will be capped at 26.
Further instructions on how to prepare for the workshop will be disseminated closer to the workshop.
See you soon
Nick

R data analysis workshop of a massive grammaticality judgements database