r - WEIRD: sapply not working on dplyr::tbl_df() local data.frame -


i haven't seen post error resulting using dplyr in conjunction sapply before, thought i'd ask if knows why occurs. it's not major issue per se work-around easy, may spare of hassle of wondering on earth going on. sample data's taken here, , i've made variation of code given jbaums in same post.

sample data

mydata <- data.frame(matrix(rlnorm(30*10,meanlog=0,sdlog=1), nrow=30)) colnames(mydata) <- c("categ", "var1","var2", "var3","var4", "var5", "var6", "var7", "var8", "var9") mydata$var2 <- mydata$var2*5 mydata$categ <- sample(1:2) mydata 

looped function making multiple boxplots

sapply(seq_along(mydata)[-1], function(i) {     y <- mydata[, i]     names <- colnames(mydata)[i]     plot(factor(mydata$categ), log(y + 1), main=names, ylab="foo",outpch=na, las=1) }) 

that works perfectly.

now error occurs after using tbl_df()

require(dplyr) mydata2 <- tbl_df(mydata) sapply(seq_along(mydata2)[-1], function(i) {     y <- mydata2[, i]     names <- colnames(mydata2)[i]     plot(factor(mydata2$categ), log(y + 1), main=names, ylab="foo",outpch=na, las=1) })  error in xy.coords(x, y, xlabel, ylabel, log) :    'x' , 'y' lengths differ  

the workaround simple, it's just:

mydata2 <- data.frame(mydata2) ## or lapply(...) 

and code runs smoothely again.

any idea why might happening? gather it's more issue of sapply vs. lapply, found quite intriguing.

cheers,

o.

the difference how [ works each:

> mydata[,2]  [1] 5.0044042 0.8456266 1.6407979 0.3850787 6.0767393 1.8768533 1.0071454 0.3674155 0.6573932 0.3614813 [11] 1.8037157 1.1420720 0.5842170 0.4632418 1.1114478 1.1753951 0.1077499 0.9043782 3.0877567 0.9421167 [21] 1.2429474 1.8952458 0.4592660 0.3842183 1.1274421 2.2946488 2.0904511 0.4132986 0.3421766 0.7592236  > mydata2[,2] source: local data frame [30 x 1]          var1 1  5.0044042 2  0.8456266 3  1.6407979 4  0.3850787 5  6.0767393 6  1.8768533 7  1.0071454 8  0.3674155 9  0.6573932 10 0.3614813 ..       ... 

therefore want reproduce intended behaviour (simplifying instead of preserving):

mydata2[[2]]

note class changes:

> class(mydata) [1] "data.frame" > class(mydata2) [1] "tbl_df"     "tbl"        "data.frame" > class(mydata2[,2]) [1] "tbl_df"     "data.frame" 

from: ?tbl_df

methods  tbl_df implements 2 important base methods:  print prints first 10 rows, , columns fit on screen  [ never simplifies (drops), returns data.frame 

Popular posts from this blog

c# - ODP.NET Oracle.ManagedDataAccess causes ORA-12537 network session end of file -

matlab - Compression and Decompression of ECG Signal using HUFFMAN ALGORITHM -

utf 8 - split utf-8 string into bytes in python -