R workshop XX: Parallel Computing with R


Details
Contributed by
Yuan Huang, PM Intern of SupStat Inc,
Tong He, Data Scientist of SupStat Inc.
Vivian Zhang, CTO of SupStat Inc, will deliver this workshop.
Content:
Our slides can be found:
http://nycdatascience.com/slides/parallel_R/index.html#1
http://nycdatascience.com/slides/parallel_R/examples_general.html
http://nycdatascience.com/slides/parallel_R/example_crossvalidataion.html
http://nycdatascience.com/slides/parallel_R/example_web.html
We will go over the steps toward parallel computing.
1.Whether the problem is parallel-able ?
2.Tips to improve the parallel computing's efficiency.
3.Implementation in R.
We will discuss how to do load balance, how to reduce parallel over-head, how to make sure each nodes have different random number and the few statistical models to be paralleled.
And do a overview of
1.Rmpi ( R interface to MPI; flexible; powerful, but more complex.)
2.Snow (will be used for backends with foreach package today)
3.multicore (work only on a single node and Linux-like machine)
4.parallel (hybrid package containing snow and multicore)
5.foreach (parallel backends doSNOW / doMPI / doMC)
In the end, we will give examples by using foreach package:
- Bootstrapping: calculate CI for median.
2.Random Forest
- Calculate the pairwise distance
4.Cross Validataion
- Web scrapper

R workshop XX: Parallel Computing with R