r - WEIRD: sapply not working on dplyr::tbl_df() local data.frame -
i haven't seen post error resulting using dplyr
in conjunction sapply
before, thought i'd ask if knows why occurs. it's not major issue per se work-around easy, may spare of hassle of wondering on earth going on. sample data's taken here, , i've made variation of code given jbaums in same post.
sample data
mydata <- data.frame(matrix(rlnorm(30*10,meanlog=0,sdlog=1), nrow=30)) colnames(mydata) <- c("categ", "var1","var2", "var3","var4", "var5", "var6", "var7", "var8", "var9") mydata$var2 <- mydata$var2*5 mydata$categ <- sample(1:2) mydata
looped function making multiple boxplots
sapply(seq_along(mydata)[-1], function(i) { y <- mydata[, i] names <- colnames(mydata)[i] plot(factor(mydata$categ), log(y + 1), main=names, ylab="foo",outpch=na, las=1) })
that works perfectly.
now error occurs after using tbl_df()
require(dplyr) mydata2 <- tbl_df(mydata) sapply(seq_along(mydata2)[-1], function(i) { y <- mydata2[, i] names <- colnames(mydata2)[i] plot(factor(mydata2$categ), log(y + 1), main=names, ylab="foo",outpch=na, las=1) }) error in xy.coords(x, y, xlabel, ylabel, log) : 'x' , 'y' lengths differ
the workaround simple, it's just:
mydata2 <- data.frame(mydata2) ## or lapply(...)
and code runs smoothely again.
any idea why might happening? gather it's more issue of sapply
vs. lapply
, found quite intriguing.
cheers,
o.
the difference how [
works each:
> mydata[,2] [1] 5.0044042 0.8456266 1.6407979 0.3850787 6.0767393 1.8768533 1.0071454 0.3674155 0.6573932 0.3614813 [11] 1.8037157 1.1420720 0.5842170 0.4632418 1.1114478 1.1753951 0.1077499 0.9043782 3.0877567 0.9421167 [21] 1.2429474 1.8952458 0.4592660 0.3842183 1.1274421 2.2946488 2.0904511 0.4132986 0.3421766 0.7592236 > mydata2[,2] source: local data frame [30 x 1] var1 1 5.0044042 2 0.8456266 3 1.6407979 4 0.3850787 5 6.0767393 6 1.8768533 7 1.0071454 8 0.3674155 9 0.6573932 10 0.3614813 .. ...
therefore want reproduce intended behaviour (simplifying instead of preserving):
mydata2[[2]]
note class changes:
> class(mydata) [1] "data.frame" > class(mydata2) [1] "tbl_df" "tbl" "data.frame" > class(mydata2[,2]) [1] "tbl_df" "data.frame"
from: ?tbl_df
methods tbl_df implements 2 important base methods: print prints first 10 rows, , columns fit on screen [ never simplifies (drops), returns data.frame