select - Selecting rows or columns with data.table R? -
imagine have data.table, example:
library(data.table) rrr <-data.table(1:15,runif(15),rgeom(15,0.5),rbinom(15,2,0.5)) v1 v2 v3 v4 1: 1 0.33577273 0 0 2: 2 0.66739739 2 1 3: 3 0.07501655 0 0 4: 4 0.43195663 2 1 5: 5 0.39525841 3 2 6: 6 0.15189738 1 1 7: 7 0.02637279 0 1 8: 8 0.44165623 0 1 9: 9 0.98710570 2 0 10: 10 0.62402805 1 0 11: 11 0.84829465 3 2 12: 12 0.02170976 0 1 13: 13 0.74608925 0 2 14: 14 0.29102296 2 0 15: 15 0.83820646 1 1
how can data.table it, rows contain "0" @ column? (or value)
if had single column use:
rrr[v4==0,] v1 v2 v3 v4 1: 1 0.33577273 0 0 2: 3 0.07501655 0 0 3: 9 0.98710570 2 0 4: 10 0.62402805 1 0 5: 14 0.29102296 2 0
but if want columns @ once because have many?
this doesn't need.
rrr[,sapply(rrr,function(xx)(xx==0)), with=true] v1 v2 v3 v4 [1,] false false true true [2,] false false false false [3,] false false true true [4,] false false false false [5,] false false false false [6,] false false false false [7,] false false true false [8,] false false true false [9,] false false false true [10,] false false false true [11,] false false false false [12,] false false true false [13,] false false true false [14,] false false false true [15,] false false false false
maybe loop , complicated paste?. though, prefer use simple data.table syntax.
similarly, how data.table columns contain '0' @ row?
i know how columns (as whole) fulfills condition, such being numeric,
rrr[,sapply(rrr,function(xx)is.numeric(xx)),with=false]
but method doesn't work if want test condition elementwise.
in case interested, system.time() bigger random data.table different solutions provided far, slight modifications.
set.seed(1) n <- 1000000 rrr <- data.table(matrix(rgeom(100*n,0.5), ncol=100)) getting rows > rrr[rrr[,rowsums(rrr==0)>0]] user system elapsed 2.72 0.55 3.27 > rrr[rowsums(rrr==0)>0] user system elapsed 2.58 0.70 3.28 > rrr[apply(rrr,mar=1,function(xx)any(xx==0))] user system elapsed 10.81 0.19 11.00 > rrr[apply(rrr[,paste0('v',1:ncol(rrr)),with=false],function(xx)any(xx==0),mar=1)] user system elapsed 10.49 0.30 10.83 getting columns > rrr[,sapply(rrr,function(xx)any(xx==0)), with=false] user system elapsed 0.81 0.31 1.12 > `[.listof`(rrr,colsums(rrr==0)>0) user system elapsed 2.14 0.27 2.41 > rrr[,colsums(rrr==0)>0, with=false] user system elapsed 2.26 0.48 2.75 > rrr[, .sd, .sdcols=sapply(rrr, function(x) any(x==0))] #only version 1.9.5, seems same solution first one. user system elapsed 0.78 0.36 1.14 > rrr[, .sd, .sdcols=sapply(rrr, function(x) any(!as.logical(x)))] user system elapsed 0.41 0.25 0.66 > rrr[reduce('|',lapply(rrr,function(xx)(xx==0)))] user system elapsed 3.11 0.33 3.44 > rrr[,apply(rrr[,paste0('v',1:ncol(rrr)),with=false],function(xx)any(xx==0),mar=2),with=false] user system elapsed 3.48 0.80 4.28
i haven't included yet:
rrr[, := any(unlist(lapply(.sd, function(x) x==0))), seq_len(nrow(rrr))][i==true][,i:=null]
it took several minutes , stopped it, , "tags" rows instead of extracting them , it's complex solution.
i'll wait faster or simpler solutions , hear comments , likings.
sapply supposed slower isn't. results change if data.table contains other kind of data.
speed if can stop test (==0) first occurrence happens within every row or column. guess can't without loops or low level access or bitwise operation.
i've thought of new method.
- sapply(rrr,function(xx)which(xx==0))
- i need combine results of a) union of lists, don't know how number of columns.
- and rows rrr["a)"]
i guess it's gonna slower if number of zeroes big.
maybe try rrr[unique(unlist(sapply(rrr,function(xx)which(xx==0))))]
it's slow.
an option opposite rrr[(rrr==0)] <- na; na.omit(rrr)
the rowsums
function can used here:
rrr[rowsums(!rrr)>0]
how works: !rrr
matrix true
@ zero. in general case, can replace !rrr
whatever logical condition want check. example, see if element equal 3
, take rowsums
of rrr==3
.
i think rowsums(test(x))>0
same apply(rrr,1,function(x)any(!test(x)))
; both coerce object matrix. find rowsums
version easier read , think i've heard people praise efficiency.
for columns, similarly:
rrr[, colsums(!rrr)>0, with=false]