select - Selecting rows or columns with data.table R? -


imagine have data.table, example:

library(data.table)  rrr <-data.table(1:15,runif(15),rgeom(15,0.5),rbinom(15,2,0.5))      v1      v2    v3  v4  1:  1 0.33577273  0  0  2:  2 0.66739739  2  1  3:  3 0.07501655  0  0  4:  4 0.43195663  2  1  5:  5 0.39525841  3  2  6:  6 0.15189738  1  1  7:  7 0.02637279  0  1  8:  8 0.44165623  0  1  9:  9 0.98710570  2  0 10: 10 0.62402805  1  0 11: 11 0.84829465  3  2 12: 12 0.02170976  0  1 13: 13 0.74608925  0  2 14: 14 0.29102296  2  0 15: 15 0.83820646  1  1 

how can data.table it, rows contain "0" @ column? (or value)
if had single column use:

rrr[v4==0,]     v1    v2      v3  v4 1:  1 0.33577273  0  0 2:  3 0.07501655  0  0 3:  9 0.98710570  2  0 4: 10 0.62402805  1  0 5: 14 0.29102296  2  0 

but if want columns @ once because have many?

this doesn't need.

rrr[,sapply(rrr,function(xx)(xx==0)), with=true]          v1      v2     v3    v4 [1,]  false false  true  true [2,]  false false false false [3,]  false false  true  true [4,]  false false false false [5,]  false false false false [6,]  false false false false [7,]  false false  true false [8,]  false false  true false [9,]  false false false  true [10,] false false false  true [11,] false false false false [12,] false false  true false [13,] false false  true false [14,] false false false  true [15,] false false false false 

maybe loop , complicated paste?. though, prefer use simple data.table syntax.

similarly, how data.table columns contain '0' @ row?

i know how columns (as whole) fulfills condition, such being numeric,

rrr[,sapply(rrr,function(xx)is.numeric(xx)),with=false] 

but method doesn't work if want test condition elementwise.


in case interested, system.time() bigger random data.table different solutions provided far, slight modifications.

set.seed(1) n <- 1000000 rrr <- data.table(matrix(rgeom(100*n,0.5), ncol=100))  getting rows    > rrr[rrr[,rowsums(rrr==0)>0]]     user  system elapsed     2.72    0.55    3.27  > rrr[rowsums(rrr==0)>0]     user  system elapsed     2.58    0.70    3.28  > rrr[apply(rrr,mar=1,function(xx)any(xx==0))]    user  system elapsed     10.81    0.19   11.00        > rrr[apply(rrr[,paste0('v',1:ncol(rrr)),with=false],function(xx)any(xx==0),mar=1)]   user  system elapsed    10.49    0.30   10.83   getting columns > rrr[,sapply(rrr,function(xx)any(xx==0)), with=false]     user  system elapsed     0.81    0.31    1.12  > `[.listof`(rrr,colsums(rrr==0)>0)     user  system elapsed     2.14    0.27    2.41  > rrr[,colsums(rrr==0)>0, with=false]     user  system elapsed     2.26    0.48    2.75  > rrr[, .sd, .sdcols=sapply(rrr, function(x) any(x==0))]      #only version 1.9.5, seems same solution first one.    user  system elapsed     0.78    0.36    1.14  > rrr[, .sd, .sdcols=sapply(rrr, function(x) any(!as.logical(x)))]    user  system elapsed     0.41    0.25    0.66  > rrr[reduce('|',lapply(rrr,function(xx)(xx==0)))]    user  system elapsed     3.11    0.33    3.44  > rrr[,apply(rrr[,paste0('v',1:ncol(rrr)),with=false],function(xx)any(xx==0),mar=2),with=false]    user  system elapsed     3.48    0.80    4.28   

i haven't included yet:

rrr[, := any(unlist(lapply(.sd, function(x) x==0))), seq_len(nrow(rrr))][i==true][,i:=null]    

it took several minutes , stopped it, , "tags" rows instead of extracting them , it's complex solution.

i'll wait faster or simpler solutions , hear comments , likings.

sapply supposed slower isn't. results change if data.table contains other kind of data.


speed if can stop test (==0) first occurrence happens within every row or column. guess can't without loops or low level access or bitwise operation.

i've thought of new method.

  1. sapply(rrr,function(xx)which(xx==0))
  2. i need combine results of a) union of lists, don't know how number of columns.
  3. and rows rrr["a)"]

i guess it's gonna slower if number of zeroes big.

maybe try rrr[unique(unlist(sapply(rrr,function(xx)which(xx==0))))] it's slow.

an option opposite rrr[(rrr==0)] <- na; na.omit(rrr)

the rowsums function can used here:

rrr[rowsums(!rrr)>0] 

how works: !rrr matrix true @ zero. in general case, can replace !rrr whatever logical condition want check. example, see if element equal 3, take rowsums of rrr==3.

i think rowsums(test(x))>0 same apply(rrr,1,function(x)any(!test(x))); both coerce object matrix. find rowsums version easier read , think i've heard people praise efficiency.


for columns, similarly:

rrr[, colsums(!rrr)>0, with=false] 

Popular posts from this blog

c# - ODP.NET Oracle.ManagedDataAccess causes ORA-12537 network session end of file -

matlab - Compression and Decompression of ECG Signal using HUFFMAN ALGORITHM -

utf 8 - split utf-8 string into bytes in python -