Loops, Lists, Parallelism and Visualization (with "biggish" data)


Details
Hi everyone, Our next meetup is all set for April 2. Here are some notes from the presenter, Peter Li, to help you prepare for the presentation. Thank you and looking forward, Seema.
In this tutorial, we'll walk through code which computes and visualizes the similarity of countries' voting records in the UN General Assembly. The motivation is to make the case for Base R and Base Graphics, and to avoid becoming too reliant on third-party libraries/packages.
To that end, we'll consider three examples. First, we'll discuss "for loops" in R and consider solutions to two common problems: 1) performance problems associated with "growing" rather than pre-allocating a storage structure for the output of a "for loop"; and
- the need for a way to deal with loops that generate a "ragged" or variable number of observations. Second, we'll discuss lapply() and how it can be used as a substitute for "for loops". Understanding lapply() is what will allow us to simply and easily move from serial to parallel computation. Third, we'll discuss Base Graphics and its drawing-by-hand idiom. We'll do so by visualizing the temporal changes of who votes with whom in UNGA.
If you're interested in doing some actual coding, please have R already installed on your computer. Feel free to use whatever R environment you're comfortable with (e.g., R GUI, R GUI + text editor, RStudio, Emacs, etc.).
There is no need to install any packages for this session. All the code we'll discuss is part of the base distribution of R. This includes the "library(parallel)" package, which is maintained by members of the R Core team and has been part of the base distribution of R since version 2.14.0 (the current version is 3.0.3).
While not necessary for this session, I recommend, as a reference,
taking a look at Hadley Wickham's list of core R vocabulary
( http://adv-r.had.co.nz/Vocabulary.html ).

Loops, Lists, Parallelism and Visualization (with "biggish" data)