Yuan Huang, PM Intern of SupStat Inc,
Tong He, Data Scientist of SupStat Inc.
Vivian Zhang, CTO of SupStat Inc, will deliver this workshop.
Our slides can be found:
We will go over the steps toward parallel computing.
1.Whether the problem is parallel-able ?
2.Tips to improve the parallel computing's efficiency.
3.Implementation in R.
We will discuss how to do load balance, how to reduce parallel over-head, how to make sure each nodes have different random number and the few statistical models to be paralleled.
And do a overview of
1.Rmpi ( R interface to MPI; flexible; powerful, but more complex.)
2.Snow (will be used for backends with foreach package today)
3.multicore (work only on a single node and Linux-like machine)
4.parallel (hybrid package containing snow and multicore)
5.foreach (parallel backends doSNOW / doMPI / doMC)
In the end, we will give examples by using foreach package:
1. Bootstrapping: calculate CI for median.
3. Calculate the pairwise distance
5. Web scrapper