r - How to efficiently find pattern in one column and assign a corresponding value to another column in a list of data frames? -
i have list of 15 data frames each 13 columns (time + 6 stations each 3 layers) , 172 rows. want collapse columns (observations @ stations) in 2 columns (one station , 1 observation) applying function on whole list. here use gather tidyr
. in addition, want find pattern (upper, middle or lower layer) in 1 of columns , assign new value (depth) in new column. use ddply
plyr
, grep. problem is veryyyy slow. guess created bottleneck limited r knowledge. bottleneck , how improve it?
an example:
data <- list(a = data.frame(time = (1:180), alpha.upper = sample(1:180), beta.middle = sample(1:180), gamma.lower = sample(1:180)), b = data.frame(time(1:180), alpha.upper = sample(1:180), beta.middle = sample(1:180), gamma.lower = sample(1:180))) > data $a time alpha.upper beta.middle gamma.lower 1 1 133 179 99 2 2 175 147 56 3 3 169 9 24 4 4 116 129 75 5 5 92 65 65 6 6 141 73 49 $b time alpha.upper beta.middle gamma.lower 1 1 111 2 89 2 2 84 81 159 3 3 93 82 84 4 4 44 58 125 5 5 31 33 131 6 6 1 120 63
my code is:
> data2<-lapply(data, function(x) { x<-gather(x,stn,value,-time) x<-arrange(x,time) x<-ddply(x,c("time","stn","value"), function(x) { if (grepl(".upper",x$stn) == true) { x$depth<-1 return(x) } if (grepl(".lower",x$stn) == true) { x$depth<-3 return(x) } if (grepl(".middle",x$stn) == true) { x$depth<-2 return(x) } }) return(x) })
the result should be:
> data2 $a time stn value depth 1 1 alpha.upper 111 1 2 1 beta.middle 2 2 3 1 gamma.lower 89 3 4 2 alpha.upper 84 1 5 2 beta.middle 81 2 6 2 gamma.lower 159 3 $b 1 1 alpha.upper 38 1 2 1 beta.middle 151 2 3 1 gamma.lower 93 3 4 2 alpha.upper 61 1 5 2 beta.middle 56 2 6 2 gamma.lower 66 3
first of let's reproduce data.
dataa <- read.table(text = "time alpha.upper beta.middle gamma.lower 1 133 179 99 2 175 147 56 3 169 9 24 4 116 129 75 5 92 65 65 6 141 73 49", header = t, sep = " ") datab <- read.table(text = "time alpha.upper beta.middle gamma.lower 1 1 111 2 89 2 2 84 81 159 3 3 93 82 84 4 4 44 58 125 5 5 31 33 131 6 6 1 120 63", header = t, sep = " ") mydata <- list(a = dataa, b = datab) # $a # time alpha.upper beta.middle gamma.lower # 1 1 133 179 99 # 2 2 175 147 56 # 3 3 169 9 24 # 4 4 116 129 75 # 5 5 92 65 65 # 6 6 141 73 49 # $b # time alpha.upper beta.middle gamma.lower # 1 1 111 2 89 # 2 2 84 81 159 # 3 3 93 82 84 # 4 4 44 58 125 # 5 5 31 33 131 # 6 6 1 120 63
here named variable mydata
because there fuction data
in standart package utils
, it's better not use name variable.
as far i've got it, need make every data.frame
of list "wide" "long" form. can use gather
tidyr
package , in opinion it's clever choise, situation show how can same result basic r tools.
rebuilddf <- function(df) { # first of see difference between rep(1:3, each = 3) , rep(1:3, times = 3) res_df <- data.frame( time = rep(df$time, each = 3),# first column of new data.frame - # repeat each time mark 3 times # know there 3 # observations single time: upper, middle, lower stn = rep(colnames(df)[-1], times = nrow(df)), # second column # fill words "alpha.upper", # "beta.middle", "gamma.lower" colnames(df)[-1] # repeated nrow(df) times value = as.vector(t(as.matrix(df[,-1]))) ) # # numbers of 2:4 columns of our data.frame # transposed , arranged in vector # result reading row row # understand what's happening matrix can try code # m <- matrix(1:20, nrow = 4) # [,1] [,2] [,3] [,4] [,5] # [1,] 1 5 9 13 17 # [2,] 2 6 10 14 18 # [3,] 3 7 11 15 19 # [4,] 4 8 12 16 20 # as.vector(t(m)) # 1 5 9 13 17 2 6 10 14 18 3 7 11 15 19 4 8 12 16 20 # after add column "depth" # got it, need 1 "upper", 2 "middle" , 3 "lower" # make of 2 nested ifelse functions res_df <- transform(res_df, depth = ifelse(stn == "alpha.upper", 1, ifelse(stn == "beta.middle", 2, 3)) ) return(res_df) }
if names of columns not same, , end of name invariant can modify condition depth
follows:
res_df <- transform(res_df, depth = ifelse(rev(strsplit(stn, "[.]")[[1]])[1] == "upper", 1, ifelse(rev(strsplit(stn, "[.]")[[1]])[1] == "middle", 2, 3) ) ) # work # rev(strsplit(stn, "[.]")[[1]])[1] # may "upper", "middle" or "lower" # here split character string of form "some.name1.upper" or # "some.other.colname.lower" every dot in text, take # first end part of string (rev reversing order)
you may modify condition , use grepl
, believe faster strsplit
.
when we've finished our rebuilddf
function let's watch does.
lapply(mydata, rebuilddf) # $a # time stn value depth # 1 1 alpha.upper 133 1 # 2 1 beta.middle 179 2 # 3 1 gamma.lower 99 3 # 4 2 alpha.upper 175 1 # 5 2 beta.middle 147 2 # 6 2 gamma.lower 56 3 # 7 3 alpha.upper 169 1 # 8 3 beta.middle 9 2 # 9 3 gamma.lower 24 3 # 10 4 alpha.upper 116 1 # 11 4 beta.middle 129 2 # 12 4 gamma.lower 75 3 # 13 5 alpha.upper 92 1 # 14 5 beta.middle 65 2 # 15 5 gamma.lower 65 3 # 16 6 alpha.upper 141 1 # 17 6 beta.middle 73 2 # 18 6 gamma.lower 49 3 # # $b # time stn value depth # 1 1 alpha.upper 111 1 # 2 1 beta.middle 2 2 # 3 1 gamma.lower 89 3 # 4 2 alpha.upper 84 1 # 5 2 beta.middle 81 2 # 6 2 gamma.lower 159 3 # 7 3 alpha.upper 93 1 # 8 3 beta.middle 82 2 # 9 3 gamma.lower 84 3 # 10 4 alpha.upper 44 1 # 11 4 beta.middle 58 2 # 12 4 gamma.lower 125 3 # 13 5 alpha.upper 31 1 # 14 5 beta.middle 33 2 # 15 5 gamma.lower 131 3 # 16 6 alpha.upper 1 1 # 17 6 beta.middle 120 2 # 18 6 gamma.lower 63 3
i want believe it's desired output, though in question show in a
data.frame numbers b
, vice versa.