dplyr - R: Merge tables and fill empty cells with factor information -
i have relatively complex table merge/expansion problem. below have included example data , desired result tables. have 4 factors (site, date, sample, taxa) , 3 numeric columns (1, 2, 3). need have each site, date, , sample have taxa 1, 2, 100, , 150. through process need fill in empty factor cells appropriate info , fill in numeric columns 0’s.
i apologize large "example" data sets, capture complexity of data set. complete data set large, including 4 site, 15 date, 12 sample, , 167 taxa. solutions using dplyr preferred, open other options. doing in excel takes coon’s age! in advance.
data site date sample taxa 1 2 3 nsv 8-jul-13 pool 1 10 10 10 nsv 8-jul-13 pool 2 10 10 10 nsv 8-jul-13 riffle 1 10 10 10 nsv 8-jul-13 riffle 2 10 10 10 nsv 23-oct-13 pool 1 10 10 10 nsv 23-oct-13 pool 2 10 10 10 nsv 23-oct-13 riffle 1 10 10 10 nsv 23-oct-13 riffle 2 10 10 10 sfp 4-jul-13 pool 1 10 10 10 sfp 4-jul-13 pool 2 10 10 10 sfp 4-jul-13 riffle 1 10 10 10 sfp 4-jul-13 riffle 2 10 10 10 sfp 27-oct-13 pool 1 10 10 10 sfp 27-oct-13 pool 2 10 10 10 sfp 27-oct-13 pool 100 10 10 10 sfp 27-oct-13 pool 150 10 10 10 sfp 27-oct-13 riffle 1 10 10 10 sfp 27-oct-13 riffle 2 10 10 10 sfp 27-oct-13 riffle 100 10 10 10 sfp 27-oct-13 riffle 150 10 10 10 result site date sample taxa 1 2 3 nsv 8-jul-13 pool 1 10 10 10 nsv 8-jul-13 pool 2 10 10 10 nsv 8-jul-13 pool 100 0 0 0 nsv 8-jul-13 pool 150 0 0 0 nsv 8-jul-13 riffle 1 10 10 10 nsv 8-jul-13 riffle 2 10 10 10 nsv 8-jul-13 riffle 100 0 0 0 nsv 8-jul-13 riffle 150 0 0 0 nsv 23-oct-13 pool 1 10 10 10 nsv 23-oct-13 pool 2 10 10 10 nsv 23-oct-13 pool 100 0 0 0 nsv 23-oct-13 pool 150 0 0 0 nsv 23-oct-13 riffle 1 10 10 10 nsv 23-oct-13 riffle 2 10 10 10 nsv 23-oct-13 riffle 100 0 0 0 nsv 23-oct-13 riffle 150 0 0 0 sfp 4-jul-13 pool 1 10 10 10 sfp 4-jul-13 pool 2 10 10 10 sfp 4-jul-13 pool 100 0 0 0 sfp 4-jul-13 pool 150 0 0 0 sfp 4-jul-13 riffle 1 10 10 10 sfp 4-jul-13 riffle 2 10 10 10 sfp 4-jul-13 riffle 100 0 0 0 sfp 4-jul-13 riffle 150 0 0 0 sfp 27-oct-13 pool 1 10 10 10 sfp 27-oct-13 pool 2 10 10 10 sfp 27-oct-13 pool 100 10 10 10 sfp 27-oct-13 pool 150 10 10 10 sfp 27-oct-13 riffle 1 10 10 10 sfp 27-oct-13 riffle 2 10 10 10 sfp 27-oct-13 riffle 100 10 10 10 sfp 27-oct-13 riffle 150 10 10 10
here non-dplyr solution. i'm sure there more elegant ways out there, here base r approach. called input data.frame d:
d2 <- expand.grid(apply(unique(d[,c("site","date")]), 1, paste, collapse=" "), unique(d$sample), unique(d$taxa)) d2 <- cbind(matrix(unlist(strsplit(as.character(d2$var1), " ")), ncol=2, byrow=true), d2[,2:3]) names(d2)<-names(d)[1:4] d2 <- merge(d2,d, all.x=true) d2[which(is.na(d2), arr.ind=true)] <- 0 the output:
site date sample taxa x1 x2 x3 1 nsv 23-oct-13 pool 1 10 10 10 2 nsv 23-oct-13 pool 2 10 10 10 3 nsv 23-oct-13 pool 100 0 0 0 4 nsv 23-oct-13 pool 150 0 0 0 5 nsv 23-oct-13 riffle 1 10 10 10 6 nsv 23-oct-13 riffle 2 10 10 10 7 nsv 23-oct-13 riffle 100 0 0 0 8 nsv 23-oct-13 riffle 150 0 0 0 9 nsv 8-jul-13 pool 1 10 10 10 10 nsv 8-jul-13 pool 2 10 10 10 11 nsv 8-jul-13 pool 100 0 0 0 12 nsv 8-jul-13 pool 150 0 0 0 13 nsv 8-jul-13 riffle 1 10 10 10 14 nsv 8-jul-13 riffle 2 10 10 10 15 nsv 8-jul-13 riffle 100 0 0 0 16 nsv 8-jul-13 riffle 150 0 0 0 17 sfp 27-oct-13 pool 1 10 10 10 18 sfp 27-oct-13 pool 2 10 10 10 19 sfp 27-oct-13 pool 100 10 10 10 20 sfp 27-oct-13 pool 150 10 10 10 21 sfp 27-oct-13 riffle 1 10 10 10 22 sfp 27-oct-13 riffle 2 10 10 10 23 sfp 27-oct-13 riffle 100 10 10 10 24 sfp 27-oct-13 riffle 150 10 10 10 25 sfp 4-jul-13 pool 1 10 10 10 26 sfp 4-jul-13 pool 2 10 10 10 27 sfp 4-jul-13 pool 100 0 0 0 28 sfp 4-jul-13 pool 150 0 0 0 29 sfp 4-jul-13 riffle 1 10 10 10 30 sfp 4-jul-13 riffle 2 10 10 10 31 sfp 4-jul-13 riffle 100 0 0 0 32 sfp 4-jul-13 riffle 150 0 0 0