python - Merging dataframes using a table -
i have matrix of similarity (which built dataframe):
mat = pd.dataframe(index = df.a.values,columns = df.a.values) mat[:] = [[1,0.2,0.3],[0.7,1,0.6],[0,0.4,1]] id1 id2 id3 id1 1.0 0.2 0.3 id2 0.7 1.0 0.6 id3 0.0 0.4 1.0
and create dataframe contains same index single column containing closest id
:
id closest 0 id1 id3 1 id2 id1 2 id3 id2
the idea every row in similarity matrix second highest value (the first 1 on diagonal), , retrieve name of corresponding column.
i know set diagonal zero, , use this:
def closest(x): return np.where(x == x.max()) temp = mat.apply(lambda x: closest(x)) df['closest'] = df.index[[w[0][0] w in temp.values]].tolist()
but can't find how filter diagonal without reassigning it..
note: values in matrix between 0 , 1, , 1 on diagonal
subtract identity matrix, use dataframe.idxmax()
find index of largest value in each row.
import numpy np import pandas pd index = ['id1', 'id2', 'id3'] mat = pd.dataframe([[1, 0.2, 0.3],[0.7, 1, 0.6],[0, 0.4, 1]], index=index, columns=index) (mat - np.identity(3)).idxmax(axis=1)
output:
id1 id3 id2 id1 id3 id2 dtype: object
Comments
Post a Comment