You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Aug 31, 2021. It is now read-only.
Doesn't look like this is maintained, but you can add the below to matrix_distance.py to implement jaccard distance. I've tested it and it works for my own dataset but please don't use without a sanity check first! i.e. test a couple of points and compare to brute force similarity search to make sure the distances align.
class JaccardDistance(MatrixMetricSearch):
"""A matrix that implements jaccard distance search against it.
jaccard_distance = 1 - jaccard_similarity
Note: We want items that are more similar to be closer to zero so we are
going to instead return 1 - jaccard_similarity. We do this so similarity
and distance metrics can be treated the same way.
"""
def __init__(self, features, records_data):
super(JaccardDistance, self).__init__(features, records_data)
@staticmethod
def features_to_matrix(features):
"""
Args:
val: A list of features to be formatted.
Returns:
The transformed matrix.
"""
return _sparse.csr_matrix(features)
@staticmethod
def vstack(matrix_list):
"""
Args:
val: A list of features to be formatted.
Returns:
The transformed matrix.
"""
return _sparse.vstack(matrix_list)
def _transform_value(self, v):
return v
def _distance(self, a_matrix):
"""Vectorised sparse jaccard distance"""
X = _sparse.csr_matrix(a_matrix).astype(int)
Y = _sparse.csr_matrix(self.matrix).astype(int)
intersect = X.dot(Y.T)
x_sum = X.sum(axis=1).A1
y_sum = Y.sum(axis=1).A1
xx, yy = _np.meshgrid(x_sum, y_sum)
union = ((xx + yy).T - intersect)
return (1 - intersect / union).A
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Doesn't look like this is maintained, but you can add the below to
matrix_distance.py
to implement jaccard distance. I've tested it and it works for my own dataset but please don't use without a sanity check first! i.e. test a couple of points and compare to brute force similarity search to make sure the distances align.usage:
to add to
matrix_distance.py
:The text was updated successfully, but these errors were encountered: