Simulation studies in R - Using all cores and other tips

Reading time ~1 minute

After working more seriously with simulations I noticed some updates were necessary to my previous setup. Most notably are the following three:

  • It is very handy to explicitly call the different scenarios instead of using nested loops

  • Storing intermediate results in single files obliviates the need to rerun an almost finished but crashed analysis and seperates very clearly the data-generation from analysis part.

  • Using all availible cores can speed up the processing time, but may render the simulation not reproducible.

So here is my new simulation-study sceleton, that consists of five parts:

  1. Praeamble: Load all the functions that are required

  2. Simulation-function: This is the part, that will most likely be much more complicated in your case. Define the steps that will be repeated for different scenarios. The parameters of this function will be filled in by the scencarios.

  3. Scenario-Description: Explicitly show the range of values that should be passed to the Simulation-Function

  4. Run the analysis: Here you pass all the scenario-descriptions to your simulation-function. Either do this on one or all availible cores. In any case you should set a random seed to make the simulatino reproducible.

  5. Analyze the outputs: Not shown here but You propabely

Here is the complete script:

# 1. Praeamble

# 2. Simulation-function
results<-matrix(NA, 1000,4)
for(i in 1:1000){
#a=1; b=2;c=3;i=1
results[i,1:3]<-cbind(a, b, c)
results[i,4]<-mean(rnorm(100))#THIS MAY BE MORE COMPLEX FOR YOU HEHE!
write.table(results, file=paste(a,"_", b,"_", c, "_res.csv"))  

# 3. Scenario-Description
a<-seq(10, 100, 20)
b<-seq(20, 100, 30)
c<-seq(30, 200, 40)
scenarios<-expand.grid(a, b, c)

# 4.a Run the analysis on one core
for(i in 1:length(scenarios[,1])){sim_fun(scenarios[i,1], scenarios[i,2], scenarios[i,3])}

# 4.b Run the analysis on all availible Cores
cluster<-makeCluster(4, type = "SOCK")
clusterSetupRNG(cluster, seed = 29012001) 

foreach(i= 1:length(scenarios[,1])) %dopar% {sim_fun(scenarios[i,1], scenarios[i,2], scenarios[i,3])}

# compare the time
for(i in 1:length(scenarios[,1])){sim_fun(scenarios[i,1], scenarios[i,2], scenarios[i,3])}

foreach(i= 1:length(scenarios[,1])) %dopar% {sim_fun(scenarios[i,1], scenarios[i,2], scenarios[i,3])}

There are also other tutorials on how to run simulations in R. The one I liked most was Roger Koenkers’ “A simple protocoll for simulations in R” (accessible here) that relies more heavily on R’s built in features to solve some of the problems.

The world is flat F(1,18) = 39.200; p = .335 - or p < .01 or p <.001? - Check your stats!

A reviewers dream has come true. The new __statcheck__-package for [R]( automagically checks the accurate __reporting__ of ...… Continue reading


Published on June 19, 2015

Relaunch on Jekyll

Published on June 04, 2015