python - How to calculate distance between 2D matrices -


hello community,

i'm new (as member) site, if think might better post on http://datascience.stackexchange.com, let me know.

i tackling machine learning problem requires calculate distance between nxm-dimensional elements, in order implement classification algorithms.

the element's attribute 2d matrix (matr), i'm searching best algorithm calculate distance between 2d matrices. see bellow "easy" solution convert 2d 1d (vector) , implement distance algorithm, i'm searching more convenient (if exists).

so far have used following approaches:

  1. euclidean distance between each element.

    import numpy np def dist_euclidean(elem1, elem2):     t_sum=0     in range(len(elem1.matr)):         j in range(len(elem1.matr[0])):             t_sum+= np.square(elem1.matr[i][j]-elem2.matr[i][j])     return np.sqrt(t_sum) 
  2. cosine similarity, in had convert (nxm) 2d matrix (1xnm) vector.

    from scipy.spatial import distance def dist_cosine(elem1, elem2):     temp1=[]     temp2=[]     in range(len(elem1.matr)):         temp1.extend(elem1.matr[i])         temp2.extend(elem2.matr[i])     return distance.cosine(temp1, temp2) 
  3. kl divergence (wiki), found implementation 1d matrix (vector), did following conversions:

    • found entropy between each corresponding row , average them.

      import numpy np scipy.stats import entropy def dist_kl_row_avg(elem1, elem2):     y=[]     in range(len(elem1.matr)):         y.append(entropy(elem1.matr[i], elem2.matr[i]))     return np.average(y) 
    • convert (nxm) 2d matrix (1xnm) vector appending rows , calculating total entropy.

      import numpy np scipy.stats import entropy def dist_kl_1d_total(elem1, elem2):     temp1=[]     temp2=[]     in range(len(elem1.matr)):         temp1.extend(elem1.matr[i])         temp2.extend(elem2.matr[i])     return entropy(temp1, temp2) 
  4. ks test (wiki), found implementation 1d matrix (vector), did same conversions in kl implementation:

    • found entropy between each corresponding row , average them.

      import numpy np scipy.stats import ks_2samp def dist_ks_row_avg(elem1, elem2):     y=[]     z=[]     in range(len(elem1.matr)):         y.append(ks_2samp(elem1.matr[i], elem2.matr[i]))     z=[x[0]/x[1] x in y]     return np.average(z) 
    • convert (nxm) 2d matrix (1xnm) vector appending rows , calculating total entropy.

      import numpy np scipy.stats import ks_2samp def dist_ks_1d_total(elem1, elem2):     temp1=[]     temp2=[]     in range(len(elem1.matr)):         temp1.extend(elem1.matr[i])         temp2.extend(elem2.matr[i])     y = ks_2samp(temp1, temp2)     return y[0]/y[1] 

all of above work in problem got curious since couldn't find more specific satisfied me.


edit 1. pltrdy suggested, here more info regarding problem.

the initial data of each element series of codes ex(c->b->d->b->a) converted transition matrix normalized each row. each cell in our matrix represents probability of transition code [i] code [j]. example:

in: a->c->b->b->a->c->c->a out:          b     c   0     0     1  b  0.5   0.5   0  c  0.33  0.33  0.33 

having in mind, final goal classify different code series. series not have same length made same codes. transition probability matrix has same dimensions in every case. had initial question in order find suitable distance algorithm, going produce best classification results.

given 2 different transition matrices a , b , probability distribution x row vector, distribution after 1 step according a xa, , distribution after 1 step according b xb. take (twice) maximum statistical distance on x between these with

numpy.linalg.norm(a - b, numpy.inf) 

Comments

Popular posts from this blog

sql - invalid in the select list because it is not contained in either an aggregate function -

Angularjs unit testing - ng-disabled not working when adding text to textarea -

How to start daemon on android by adb -