dplyr - R: Merge tables and fill empty cells with factor information -


i have relatively complex table merge/expansion problem. below have included example data , desired result tables. have 4 factors (site, date, sample, taxa) , 3 numeric columns (1, 2, 3). need have each site, date, , sample have taxa 1, 2, 100, , 150. through process need fill in empty factor cells appropriate info , fill in numeric columns 0’s.

i apologize large "example" data sets, capture complexity of data set. complete data set large, including 4 site, 15 date, 12 sample, , 167 taxa. solutions using dplyr preferred, open other options. doing in excel takes coon’s age! in advance.

 data     site    date    sample  taxa    1   2   3     nsv 8-jul-13    pool    1   10  10  10     nsv 8-jul-13    pool    2   10  10  10     nsv 8-jul-13    riffle  1   10  10  10     nsv 8-jul-13    riffle  2   10  10  10     nsv 23-oct-13   pool    1   10  10  10     nsv 23-oct-13   pool    2   10  10  10     nsv 23-oct-13   riffle  1   10  10  10     nsv 23-oct-13   riffle  2   10  10  10     sfp 4-jul-13    pool    1   10  10  10     sfp 4-jul-13    pool    2   10  10  10     sfp 4-jul-13    riffle  1   10  10  10     sfp 4-jul-13    riffle  2   10  10  10     sfp 27-oct-13   pool    1   10  10  10     sfp 27-oct-13   pool    2   10  10  10     sfp 27-oct-13   pool    100 10  10  10     sfp 27-oct-13   pool    150 10  10  10     sfp 27-oct-13   riffle  1   10  10  10     sfp 27-oct-13   riffle  2   10  10  10     sfp 27-oct-13   riffle  100 10  10  10     sfp 27-oct-13   riffle  150 10  10  10  result     site    date    sample  taxa    1   2   3     nsv 8-jul-13    pool    1   10  10  10     nsv 8-jul-13    pool    2   10  10  10     nsv 8-jul-13    pool    100 0   0   0     nsv 8-jul-13    pool    150 0   0   0     nsv 8-jul-13    riffle  1   10  10  10     nsv 8-jul-13    riffle  2   10  10  10     nsv 8-jul-13    riffle  100 0   0   0     nsv 8-jul-13    riffle  150 0   0   0     nsv 23-oct-13   pool    1   10  10  10     nsv 23-oct-13   pool    2   10  10  10     nsv 23-oct-13   pool    100 0   0   0     nsv 23-oct-13   pool    150 0   0   0     nsv 23-oct-13   riffle  1   10  10  10     nsv 23-oct-13   riffle  2   10  10  10     nsv 23-oct-13   riffle  100 0   0   0     nsv 23-oct-13   riffle  150 0   0   0     sfp 4-jul-13    pool    1   10  10  10     sfp 4-jul-13    pool    2   10  10  10     sfp 4-jul-13    pool    100 0   0   0     sfp 4-jul-13    pool    150 0   0   0     sfp 4-jul-13    riffle  1   10  10  10     sfp 4-jul-13    riffle  2   10  10  10     sfp 4-jul-13    riffle  100 0   0   0     sfp 4-jul-13    riffle  150 0   0   0     sfp 27-oct-13   pool    1   10  10  10     sfp 27-oct-13   pool    2   10  10  10     sfp 27-oct-13   pool    100 10  10  10     sfp 27-oct-13   pool    150 10  10  10     sfp 27-oct-13   riffle  1   10  10  10     sfp 27-oct-13   riffle  2   10  10  10     sfp 27-oct-13   riffle  100 10  10  10     sfp 27-oct-13   riffle  150 10  10  10 

here non-dplyr solution. i'm sure there more elegant ways out there, here base r approach. called input data.frame d:

d2 <- expand.grid(apply(unique(d[,c("site","date")]), 1, paste, collapse=" "),                   unique(d$sample), unique(d$taxa)) d2 <- cbind(matrix(unlist(strsplit(as.character(d2$var1), " ")), ncol=2, byrow=true),             d2[,2:3]) names(d2)<-names(d)[1:4]  d2 <- merge(d2,d, all.x=true)  d2[which(is.na(d2), arr.ind=true)] <- 0 

the output:

   site      date sample taxa x1 x2 x3 1   nsv 23-oct-13   pool    1 10 10 10 2   nsv 23-oct-13   pool    2 10 10 10 3   nsv 23-oct-13   pool  100  0  0  0 4   nsv 23-oct-13   pool  150  0  0  0 5   nsv 23-oct-13 riffle    1 10 10 10 6   nsv 23-oct-13 riffle    2 10 10 10 7   nsv 23-oct-13 riffle  100  0  0  0 8   nsv 23-oct-13 riffle  150  0  0  0 9   nsv  8-jul-13   pool    1 10 10 10 10  nsv  8-jul-13   pool    2 10 10 10 11  nsv  8-jul-13   pool  100  0  0  0 12  nsv  8-jul-13   pool  150  0  0  0 13  nsv  8-jul-13 riffle    1 10 10 10 14  nsv  8-jul-13 riffle    2 10 10 10 15  nsv  8-jul-13 riffle  100  0  0  0 16  nsv  8-jul-13 riffle  150  0  0  0 17  sfp 27-oct-13   pool    1 10 10 10 18  sfp 27-oct-13   pool    2 10 10 10 19  sfp 27-oct-13   pool  100 10 10 10 20  sfp 27-oct-13   pool  150 10 10 10 21  sfp 27-oct-13 riffle    1 10 10 10 22  sfp 27-oct-13 riffle    2 10 10 10 23  sfp 27-oct-13 riffle  100 10 10 10 24  sfp 27-oct-13 riffle  150 10 10 10 25  sfp  4-jul-13   pool    1 10 10 10 26  sfp  4-jul-13   pool    2 10 10 10 27  sfp  4-jul-13   pool  100  0  0  0 28  sfp  4-jul-13   pool  150  0  0  0 29  sfp  4-jul-13 riffle    1 10 10 10 30  sfp  4-jul-13 riffle    2 10 10 10 31  sfp  4-jul-13 riffle  100  0  0  0 32  sfp  4-jul-13 riffle  150  0  0  0 

Popular posts from this blog

c# - ODP.NET Oracle.ManagedDataAccess causes ORA-12537 network session end of file -

matlab - Compression and Decompression of ECG Signal using HUFFMAN ALGORITHM -

utf 8 - split utf-8 string into bytes in python -