Is there a Python equivalent to the mahalanobis() function in R? If not, how can I implement it? -
i have following code in r calculates mahalanobis distance on iris dataset , returns numeric vector 150 values, 1 every observation in dataset.
x=read.csv("iris data.csv") mean<-colmeans(x) sx<-cov(x) d2<-mahalanobis(x,mean,sx)
i tried implement same in python using 'scipy.spatial.distance.mahalanobis(u, v, vi)' function, seems function takes one-dimensional arrays parameters.
i used iris dataset r, suppose same using.
first, these r benchmark, comparison:
x <- read.csv("irisdata.csv") x <- x[,c(2,3,4,5)] mean<-colmeans(x) sx<-cov(x) d2<-mahalanobis(x,mean,sx)
then, in python can use:
from scipy.spatial.distance import mahalanobis import scipy sp import pandas pd x = pd.read_csv('irisdata.csv') x = x.ix[:,1:] sx = x.cov().values sx = sp.linalg.inv(sx) mean = x.mean().values def mahalanobisr(x,meancol,ic): m = [] in range(x.shape[0]): m.append(mahalanobis(x.ix[i,:],meancol,ic) ** 2) return(m) mr = mahalanobisr(x,mean,sx)
i defined function can use in other sets, (observe use pandas dataframes inputs)
comparing results:
in r
> d2[c(1,2,3,4,5)] [1] 2.134468 2.849119 2.081339 2.452382 2.462155
in python:
in [43]: mr[0:5] out[45]: [2.1344679233248431, 2.8491186861585733, 2.0813386639577991, 2.4523816316796712, 2.4621545347140477]
just careful in r squared mahalanobis distance.