parallel processing - rfsrc() command in randomForestSRC package R not using multi core functionality -
i using r (for windows 7, 32 -bit) doing text classification using randomforests
. due large dataset, looked internet speeding model-building , came across randomforestsrc
package.
i have followed steps in installation manual package, yet during execution of rfsrc()
command, 1 of logical cores used r (same randomforest()
), maximum cpu utilization being 25%. have used following command per manual.
options(mc.cores=detectcores()-1, rf.cores = detectcores()-1)
i using windows 7 professional 32 bit service pack 1, on intel i3 2120 cpu 4 logical cores. throw light on missing? other efficient way use randomforest
multicore utilization helpful!
the problem randomforestsrc
uses mclapply
function parallel execution, mclapply
doesn't support parallel execution on windows. randomforestsrc
can use openmp multithreaded parallel execution, isn't built binary distribution cran, have build package source openmp support enabled.
i think 2 options are:
- build
randomforestsrc
openmp support on windows machine; - call random forest function in parallel yourself.
here's simple parallel example using randomforest
package foreach
, doparallel
derived example in foreach
vignette:
library(randomforest) library(doparallel) workers <- detectcores() cl <- makepsockcluster(workers) registerdoparallel(cl) x <- matrix(runif(500), 100) y <- gl(2, 50) ntree <- 1000 rf <- foreach(n=rep(ceiling(ntree/workers), workers), .combine=combine, .multicombine=true, .packages='randomforest') %dopar% { randomforest(x, y, ntree=n) }
this example should work on windows, mac os x , linux. see foreach vignette more information.