select - Selecting rows or columns with data.table R? -

imagine have data.table, example:

library(data.table)  rrr <-data.table(1:15,runif(15),rgeom(15,0.5),rbinom(15,2,0.5))      v1      v2    v3  v4  1:  1 0.33577273  0  0  2:  2 0.66739739  2  1  3:  3 0.07501655  0  0  4:  4 0.43195663  2  1  5:  5 0.39525841  3  2  6:  6 0.15189738  1  1  7:  7 0.02637279  0  1  8:  8 0.44165623  0  1  9:  9 0.98710570  2  0 10: 10 0.62402805  1  0 11: 11 0.84829465  3  2 12: 12 0.02170976  0  1 13: 13 0.74608925  0  2 14: 14 0.29102296  2  0 15: 15 0.83820646  1  1 

how can data.table it, rows contain "0" @ column? (or value)
if had single column use:

rrr[v4==0,]     v1    v2      v3  v4 1:  1 0.33577273  0  0 2:  3 0.07501655  0  0 3:  9 0.98710570  2  0 4: 10 0.62402805  1  0 5: 14 0.29102296  2  0 

but if want columns @ once because have many?

this doesn't need.

rrr[,sapply(rrr,function(xx)(xx==0)), with=true]          v1      v2     v3    v4 [1,]  false false  true  true [2,]  false false false false [3,]  false false  true  true [4,]  false false false false [5,]  false false false false [6,]  false false false false [7,]  false false  true false [8,]  false false  true false [9,]  false false false  true [10,] false false false  true [11,] false false false false [12,] false false  true false [13,] false false  true false [14,] false false false  true [15,] false false false false 

maybe loop , complicated paste?. though, prefer use simple data.table syntax.

similarly, how data.table columns contain '0' @ row?

i know how columns (as whole) fulfills condition, such being numeric,


but method doesn't work if want test condition elementwise.

in case interested, system.time() bigger random data.table different solutions provided far, slight modifications.

set.seed(1) n <- 1000000 rrr <- data.table(matrix(rgeom(100*n,0.5), ncol=100))  getting rows    > rrr[rrr[,rowsums(rrr==0)>0]]     user  system elapsed     2.72    0.55    3.27  > rrr[rowsums(rrr==0)>0]     user  system elapsed     2.58    0.70    3.28  > rrr[apply(rrr,mar=1,function(xx)any(xx==0))]    user  system elapsed     10.81    0.19   11.00        > rrr[apply(rrr[,paste0('v',1:ncol(rrr)),with=false],function(xx)any(xx==0),mar=1)]   user  system elapsed    10.49    0.30   10.83   getting columns > rrr[,sapply(rrr,function(xx)any(xx==0)), with=false]     user  system elapsed     0.81    0.31    1.12  > `[.listof`(rrr,colsums(rrr==0)>0)     user  system elapsed     2.14    0.27    2.41  > rrr[,colsums(rrr==0)>0, with=false]     user  system elapsed     2.26    0.48    2.75  > rrr[, .sd, .sdcols=sapply(rrr, function(x) any(x==0))]      #only version 1.9.5, seems same solution first one.    user  system elapsed     0.78    0.36    1.14  > rrr[, .sd, .sdcols=sapply(rrr, function(x) any(!as.logical(x)))]    user  system elapsed     0.41    0.25    0.66  > rrr[reduce('|',lapply(rrr,function(xx)(xx==0)))]    user  system elapsed     3.11    0.33    3.44  > rrr[,apply(rrr[,paste0('v',1:ncol(rrr)),with=false],function(xx)any(xx==0),mar=2),with=false]    user  system elapsed     3.48    0.80    4.28   

i haven't included yet:

rrr[, := any(unlist(lapply(.sd, function(x) x==0))), seq_len(nrow(rrr))][i==true][,i:=null]    

it took several minutes , stopped it, , "tags" rows instead of extracting them , it's complex solution.

i'll wait faster or simpler solutions , hear comments , likings.

sapply supposed slower isn't. results change if data.table contains other kind of data.

speed if can stop test (==0) first occurrence happens within every row or column. guess can't without loops or low level access or bitwise operation.

i've thought of new method.

  1. sapply(rrr,function(xx)which(xx==0))
  2. i need combine results of a) union of lists, don't know how number of columns.
  3. and rows rrr["a)"]

i guess it's gonna slower if number of zeroes big.

maybe try rrr[unique(unlist(sapply(rrr,function(xx)which(xx==0))))] it's slow.

an option opposite rrr[(rrr==0)] <- na; na.omit(rrr)

the rowsums function can used here:


how works: !rrr matrix true @ zero. in general case, can replace !rrr whatever logical condition want check. example, see if element equal 3, take rowsums of rrr==3.

i think rowsums(test(x))>0 same apply(rrr,1,function(x)any(!test(x))); both coerce object matrix. find rowsums version easier read , think i've heard people praise efficiency.

for columns, similarly:

rrr[, colsums(!rrr)>0, with=false] 

Popular posts from this blog

c# - ODP.NET Oracle.ManagedDataAccess causes ORA-12537 network session end of file -

utf 8 - split utf-8 string into bytes in python -

matlab - Compression and Decompression of ECG Signal using HUFFMAN ALGORITHM -