python - Merging dataframes using a table -


i have matrix of similarity (which built dataframe):

mat = pd.dataframe(index = df.a.values,columns = df.a.values) mat[:] = [[1,0.2,0.3],[0.7,1,0.6],[0,0.4,1]]     id1     id2     id3 id1 1.0     0.2     0.3 id2 0.7     1.0     0.6 id3 0.0     0.4     1.0 

and create dataframe contains same index single column containing closest id:

    id      closest 0   id1     id3 1   id2     id1 2   id3     id2 

the idea every row in similarity matrix second highest value (the first 1 on diagonal), , retrieve name of corresponding column.

i know set diagonal zero, , use this:

def closest(x):     return np.where(x == x.max())  temp = mat.apply(lambda x: closest(x)) df['closest'] = df.index[[w[0][0] w in temp.values]].tolist() 

but can't find how filter diagonal without reassigning it..

note: values in matrix between 0 , 1, , 1 on diagonal

subtract identity matrix, use dataframe.idxmax() find index of largest value in each row.

import numpy np import pandas pd  index = ['id1', 'id2', 'id3'] mat = pd.dataframe([[1, 0.2, 0.3],[0.7, 1, 0.6],[0, 0.4, 1]],                    index=index, columns=index)  (mat - np.identity(3)).idxmax(axis=1) 

output:

id1    id3 id2    id1 id3    id2 dtype: object 

Comments

Popular posts from this blog

sql - invalid in the select list because it is not contained in either an aggregate function -

Angularjs unit testing - ng-disabled not working when adding text to textarea -

How to start daemon on android by adb -