Is there a Python equivalent to the mahalanobis() function in R? If not, how can I implement it? -


i have following code in r calculates mahalanobis distance on iris dataset , returns numeric vector 150 values, 1 every observation in dataset.

x=read.csv("iris data.csv") mean<-colmeans(x) sx<-cov(x) d2<-mahalanobis(x,mean,sx)   

i tried implement same in python using 'scipy.spatial.distance.mahalanobis(u, v, vi)' function, seems function takes one-dimensional arrays parameters.

i used iris dataset r, suppose same using.

first, these r benchmark, comparison:

x <- read.csv("irisdata.csv") x <- x[,c(2,3,4,5)] mean<-colmeans(x) sx<-cov(x) d2<-mahalanobis(x,mean,sx)   

then, in python can use:

from scipy.spatial.distance import mahalanobis import scipy sp import pandas pd  x = pd.read_csv('irisdata.csv') x = x.ix[:,1:]  sx = x.cov().values sx = sp.linalg.inv(sx)  mean = x.mean().values  def mahalanobisr(x,meancol,ic):     m = []     in range(x.shape[0]):         m.append(mahalanobis(x.ix[i,:],meancol,ic) ** 2)     return(m)  mr = mahalanobisr(x,mean,sx) 

i defined function can use in other sets, (observe use pandas dataframes inputs)

comparing results:

in r

> d2[c(1,2,3,4,5)]  [1] 2.134468 2.849119 2.081339 2.452382 2.462155 

in python:

in [43]: mr[0:5] out[45]:  [2.1344679233248431,  2.8491186861585733,  2.0813386639577991,  2.4523816316796712,  2.4621545347140477] 

just careful in r squared mahalanobis distance.


Popular posts from this blog

c# - ODP.NET Oracle.ManagedDataAccess causes ORA-12537 network session end of file -

matlab - Compression and Decompression of ECG Signal using HUFFMAN ALGORITHM -

utf 8 - split utf-8 string into bytes in python -