r - How to efficiently find pattern in one column and assign a corresponding value to another column in a list of data frames? -


i have list of 15 data frames each 13 columns (time + 6 stations each 3 layers) , 172 rows. want collapse columns (observations @ stations) in 2 columns (one station , 1 observation) applying function on whole list. here use gather tidyr. in addition, want find pattern (upper, middle or lower layer) in 1 of columns , assign new value (depth) in new column. use ddply plyr , grep. problem is veryyyy slow. guess created bottleneck limited r knowledge. bottleneck , how improve it?

an example:

  data <- list(a = data.frame(time = (1:180), alpha.upper = sample(1:180),                beta.middle = sample(1:180), gamma.lower = sample(1:180)),                 b = data.frame(time(1:180), alpha.upper = sample(1:180),                 beta.middle = sample(1:180), gamma.lower = sample(1:180)))  > data $a     time alpha.upper beta.middle gamma.lower  1      1         133         179          99  2      2         175         147          56  3      3         169           9          24  4      4         116         129          75  5      5          92          65          65  6      6         141          73          49 $b     time alpha.upper beta.middle gamma.lower 1      1         111           2          89 2      2          84          81         159 3      3          93          82          84 4      4          44          58         125 5      5          31          33         131 6      6           1         120          63 

my code is:

> data2<-lapply(data, function(x) {                x<-gather(x,stn,value,-time)                x<-arrange(x,time)                x<-ddply(x,c("time","stn","value"), function(x) {                  if (grepl(".upper",x$stn) == true)                  {                    x$depth<-1                    return(x)                  }                  if (grepl(".lower",x$stn) == true)                  {                    x$depth<-3                    return(x)                  }                  if (grepl(".middle",x$stn) == true)                  {                    x$depth<-2                    return(x)                  }                })                return(x)              }) 

the result should be:

> data2 $a     time         stn value depth 1      1 alpha.upper   111     1 2      1 beta.middle     2     2 3      1 gamma.lower    89     3 4      2 alpha.upper    84     1 5      2 beta.middle    81     2 6      2 gamma.lower   159     3 $b 1      1 alpha.upper    38     1 2      1 beta.middle   151     2 3      1 gamma.lower    93     3 4      2 alpha.upper    61     1 5      2 beta.middle    56     2 6      2 gamma.lower    66     3   

first of let's reproduce data.

dataa <- read.table(text = "time alpha.upper beta.middle gamma.lower 1 133 179 99 2 175 147 56 3 169 9 24 4 116 129 75 5 92 65 65 6 141 73 49", header = t, sep = " ")  datab <- read.table(text = "time alpha.upper beta.middle gamma.lower 1 1 111 2 89 2 2 84 81 159 3 3 93 82 84 4 4 44 58 125 5 5 31 33 131 6 6 1 120 63", header = t, sep = " ")  mydata <- list(a = dataa, b = datab) # $a #   time alpha.upper beta.middle gamma.lower # 1    1         133         179          99 # 2    2         175         147          56 # 3    3         169           9          24 # 4    4         116         129          75 # 5    5          92          65          65 # 6    6         141          73          49 # $b #   time alpha.upper beta.middle gamma.lower # 1    1         111           2          89 # 2    2          84          81         159 # 3    3          93          82          84 # 4    4          44          58         125 # 5    5          31          33         131 # 6    6           1         120          63 

here named variable mydata because there fuction data in standart package utils , it's better not use name variable.

as far i've got it, need make every data.frame of list "wide" "long" form. can use gather tidyr package , in opinion it's clever choise, situation show how can same result basic r tools.

rebuilddf <- function(df) { # first of see difference between rep(1:3, each = 3) , rep(1:3, times = 3)    res_df <- data.frame(     time = rep(df$time, each = 3),# first column of new data.frame -                                   # repeat each time mark 3 times                                   # know there 3                                   # observations single time: upper, middle, lower      stn = rep(colnames(df)[-1], times = nrow(df)), # second column                                   # fill words "alpha.upper",                               # "beta.middle", "gamma.lower" colnames(df)[-1]                                   # repeated nrow(df) times       value = as.vector(t(as.matrix(df[,-1]))) ) #                                    # numbers of 2:4 columns of our data.frame                                   # transposed , arranged in vector                                   # result reading row row   # understand what's happening matrix can try code   # m <- matrix(1:20, nrow = 4)   #      [,1] [,2] [,3] [,4] [,5]   # [1,]    1    5    9   13   17   # [2,]    2    6   10   14   18   # [3,]    3    7   11   15   19   # [4,]    4    8   12   16   20   # as.vector(t(m))   # 1  5  9 13 17  2  6 10 14 18  3  7 11 15 19  4  8 12 16 20    # after add column "depth"   # got it, need 1 "upper", 2 "middle" , 3 "lower"   # make of 2 nested ifelse functions   res_df <- transform(res_df, depth = ifelse(stn == "alpha.upper", 1,                                               ifelse(stn == "beta.middle", 2, 3)) )   return(res_df) } 

if names of columns not same, , end of name invariant can modify condition depth follows:

res_df <-    transform(res_df,              depth = ifelse(rev(strsplit(stn, "[.]")[[1]])[1] == "upper",                             1,                             ifelse(rev(strsplit(stn, "[.]")[[1]])[1] == "middle", 2, 3)                            ) ) # work  # rev(strsplit(stn, "[.]")[[1]])[1] # may "upper", "middle" or "lower" # here split character string of form "some.name1.upper" or # "some.other.colname.lower" every dot in text, take  # first end part of string (rev reversing order) 

you may modify condition , use grepl, believe faster strsplit.

when we've finished our rebuilddf function let's watch does.

lapply(mydata, rebuilddf)  # $a #    time         stn value depth # 1     1 alpha.upper   133     1 # 2     1 beta.middle   179     2 # 3     1 gamma.lower    99     3 # 4     2 alpha.upper   175     1 # 5     2 beta.middle   147     2 # 6     2 gamma.lower    56     3 # 7     3 alpha.upper   169     1 # 8     3 beta.middle     9     2 # 9     3 gamma.lower    24     3 # 10    4 alpha.upper   116     1 # 11    4 beta.middle   129     2 # 12    4 gamma.lower    75     3 # 13    5 alpha.upper    92     1 # 14    5 beta.middle    65     2 # 15    5 gamma.lower    65     3 # 16    6 alpha.upper   141     1 # 17    6 beta.middle    73     2 # 18    6 gamma.lower    49     3 #  # $b #    time         stn value depth # 1     1 alpha.upper   111     1 # 2     1 beta.middle     2     2 # 3     1 gamma.lower    89     3 # 4     2 alpha.upper    84     1 # 5     2 beta.middle    81     2 # 6     2 gamma.lower   159     3 # 7     3 alpha.upper    93     1 # 8     3 beta.middle    82     2 # 9     3 gamma.lower    84     3 # 10    4 alpha.upper    44     1 # 11    4 beta.middle    58     2 # 12    4 gamma.lower   125     3 # 13    5 alpha.upper    31     1 # 14    5 beta.middle    33     2 # 15    5 gamma.lower   131     3 # 16    6 alpha.upper     1     1 # 17    6 beta.middle   120     2 # 18    6 gamma.lower    63     3 

i want believe it's desired output, though in question show in a data.frame numbers b , vice versa.


Popular posts from this blog

c# - ODP.NET Oracle.ManagedDataAccess causes ORA-12537 network session end of file -

utf 8 - split utf-8 string into bytes in python -

matlab - Compression and Decompression of ECG Signal using HUFFMAN ALGORITHM -