dplyr - R: Merge tables and fill empty cells with factor information -
i have relatively complex table merge/expansion problem. below have included example data
, desired result
tables. have 4 factors (site
, date
, sample
, taxa
) , 3 numeric columns (1
, 2
, 3
). need have each site
, date
, , sample
have taxa
1, 2, 100, , 150. through process need fill in empty factor cells appropriate info , fill in numeric columns 0’s.
i apologize large "example" data sets, capture complexity of data set. complete data set large, including 4 site
, 15 date
, 12 sample
, , 167 taxa
. solutions using dplyr
preferred, open other options. doing in excel takes coon’s age! in advance.
data site date sample taxa 1 2 3 nsv 8-jul-13 pool 1 10 10 10 nsv 8-jul-13 pool 2 10 10 10 nsv 8-jul-13 riffle 1 10 10 10 nsv 8-jul-13 riffle 2 10 10 10 nsv 23-oct-13 pool 1 10 10 10 nsv 23-oct-13 pool 2 10 10 10 nsv 23-oct-13 riffle 1 10 10 10 nsv 23-oct-13 riffle 2 10 10 10 sfp 4-jul-13 pool 1 10 10 10 sfp 4-jul-13 pool 2 10 10 10 sfp 4-jul-13 riffle 1 10 10 10 sfp 4-jul-13 riffle 2 10 10 10 sfp 27-oct-13 pool 1 10 10 10 sfp 27-oct-13 pool 2 10 10 10 sfp 27-oct-13 pool 100 10 10 10 sfp 27-oct-13 pool 150 10 10 10 sfp 27-oct-13 riffle 1 10 10 10 sfp 27-oct-13 riffle 2 10 10 10 sfp 27-oct-13 riffle 100 10 10 10 sfp 27-oct-13 riffle 150 10 10 10 result site date sample taxa 1 2 3 nsv 8-jul-13 pool 1 10 10 10 nsv 8-jul-13 pool 2 10 10 10 nsv 8-jul-13 pool 100 0 0 0 nsv 8-jul-13 pool 150 0 0 0 nsv 8-jul-13 riffle 1 10 10 10 nsv 8-jul-13 riffle 2 10 10 10 nsv 8-jul-13 riffle 100 0 0 0 nsv 8-jul-13 riffle 150 0 0 0 nsv 23-oct-13 pool 1 10 10 10 nsv 23-oct-13 pool 2 10 10 10 nsv 23-oct-13 pool 100 0 0 0 nsv 23-oct-13 pool 150 0 0 0 nsv 23-oct-13 riffle 1 10 10 10 nsv 23-oct-13 riffle 2 10 10 10 nsv 23-oct-13 riffle 100 0 0 0 nsv 23-oct-13 riffle 150 0 0 0 sfp 4-jul-13 pool 1 10 10 10 sfp 4-jul-13 pool 2 10 10 10 sfp 4-jul-13 pool 100 0 0 0 sfp 4-jul-13 pool 150 0 0 0 sfp 4-jul-13 riffle 1 10 10 10 sfp 4-jul-13 riffle 2 10 10 10 sfp 4-jul-13 riffle 100 0 0 0 sfp 4-jul-13 riffle 150 0 0 0 sfp 27-oct-13 pool 1 10 10 10 sfp 27-oct-13 pool 2 10 10 10 sfp 27-oct-13 pool 100 10 10 10 sfp 27-oct-13 pool 150 10 10 10 sfp 27-oct-13 riffle 1 10 10 10 sfp 27-oct-13 riffle 2 10 10 10 sfp 27-oct-13 riffle 100 10 10 10 sfp 27-oct-13 riffle 150 10 10 10
here non-dplyr
solution. i'm sure there more elegant ways out there, here base r approach. called input data.frame d
:
d2 <- expand.grid(apply(unique(d[,c("site","date")]), 1, paste, collapse=" "), unique(d$sample), unique(d$taxa)) d2 <- cbind(matrix(unlist(strsplit(as.character(d2$var1), " ")), ncol=2, byrow=true), d2[,2:3]) names(d2)<-names(d)[1:4] d2 <- merge(d2,d, all.x=true) d2[which(is.na(d2), arr.ind=true)] <- 0
the output:
site date sample taxa x1 x2 x3 1 nsv 23-oct-13 pool 1 10 10 10 2 nsv 23-oct-13 pool 2 10 10 10 3 nsv 23-oct-13 pool 100 0 0 0 4 nsv 23-oct-13 pool 150 0 0 0 5 nsv 23-oct-13 riffle 1 10 10 10 6 nsv 23-oct-13 riffle 2 10 10 10 7 nsv 23-oct-13 riffle 100 0 0 0 8 nsv 23-oct-13 riffle 150 0 0 0 9 nsv 8-jul-13 pool 1 10 10 10 10 nsv 8-jul-13 pool 2 10 10 10 11 nsv 8-jul-13 pool 100 0 0 0 12 nsv 8-jul-13 pool 150 0 0 0 13 nsv 8-jul-13 riffle 1 10 10 10 14 nsv 8-jul-13 riffle 2 10 10 10 15 nsv 8-jul-13 riffle 100 0 0 0 16 nsv 8-jul-13 riffle 150 0 0 0 17 sfp 27-oct-13 pool 1 10 10 10 18 sfp 27-oct-13 pool 2 10 10 10 19 sfp 27-oct-13 pool 100 10 10 10 20 sfp 27-oct-13 pool 150 10 10 10 21 sfp 27-oct-13 riffle 1 10 10 10 22 sfp 27-oct-13 riffle 2 10 10 10 23 sfp 27-oct-13 riffle 100 10 10 10 24 sfp 27-oct-13 riffle 150 10 10 10 25 sfp 4-jul-13 pool 1 10 10 10 26 sfp 4-jul-13 pool 2 10 10 10 27 sfp 4-jul-13 pool 100 0 0 0 28 sfp 4-jul-13 pool 150 0 0 0 29 sfp 4-jul-13 riffle 1 10 10 10 30 sfp 4-jul-13 riffle 2 10 10 10 31 sfp 4-jul-13 riffle 100 0 0 0 32 sfp 4-jul-13 riffle 150 0 0 0