baserec.base.similarity package¶
Submodules¶
baserec.base.similarity.compute_similarity module¶
@author: Maurizio Ferrari Dacrema & Ceshine Lee
baserec.base.similarity.compute_similarity_cython module¶
@author: Maurizio Ferrari Dacrema
baserec.base.similarity.compute_similarity_euclidean module¶
@author: Maurizio Ferrari Dacrema & Ceshine Lee
-
class
baserec.base.similarity.compute_similarity_euclidean.ComputeSimilarityEuclidean(dataMatrix, topK=100, shrink=0, normalize=False, normalize_avg_row=False, similarity_from_distance_mode='lin', row_weights=None, **args)¶ Bases:
object-
compute_similarity(start_col=None, end_col=None, block_size=100)¶ Compute the similarity for the given dataset :param self: :param start_col: column to begin with :param end_col: column to stop before, end_col is excluded :return:
-
baserec.base.similarity.compute_similarity_euclidean_test module¶
@author: Maurizio Ferrari Dacrema & Ceshine Lee
-
class
baserec.base.similarity.compute_similarity_euclidean_test.MyTestCase(methodName='runTest')¶ Bases:
unittest.case.TestCase-
test_euclidean_similarity_float()¶
-
test_euclidean_similarity_integer()¶
-
-
baserec.base.similarity.compute_similarity_euclidean_test.areSparseEquals(Sparse1, Sparse2)¶
baserec.base.similarity.compute_similarity_python module¶
@author: Maurizio Ferrari Dacrema & Ceshine Lee
-
class
baserec.base.similarity.compute_similarity_python.ComputeSimilarityPython(dataMatrix, topK=100, shrink=0, normalize=True, asymmetric_alpha=0.5, tversky_alpha=1.0, tversky_beta=1.0, similarity='cosine', row_weights=None)¶ Bases:
objectComputes the cosine similarity on the columns of dataMatrix
If it is computed on URM=|users|x|items|, pass the URM as is.
If it is computed on ICM=|items|x|features|, pass the URM transposed.
- Available similarity measures (the similarity parameter):
“cosine” computes Cosine similarity (this is the default)
“adjusted” computes Adjusted Cosine, removing the average of the users
“asymmetric” computes Asymmetric Cosine
“pearson” computes Pearson Correlation, removing the average of the items
“jaccard” computes Jaccard similarity for binary interactions using Tanimoto
“dice” computes Dice similarity for binary interactions
“tversky” computes Tversky similarity for binary interactions
“tanimoto” computes Tanimoto coefficient for binary interactions
- Asymmetric Cosine as described in:
Aiolli, F. (2013, October). Efficient top-n recommendation for very large scale binary rated datasets. In Proceedings of the 7th ACM conference on Recommender systems (pp. 273-280). ACM.
(Note from Ceshine: since the similarities are calculated between columns, the asymmetric cosine measure doesn’t seem to make sense here?)
- Parameters
dataMatrix – Numpy matrix
topK (int, optional) – Keep only the Top K entries, by default 100
shrink (int, optional) – The shrinkage parameter helps to avoid overfitting when only few ratings are available, by default 0
normalize (bool, optional) – If True divide the dot product by the product of the norms, by default True
asymmetric_alpha (float, optional) – Coefficient alpha for the asymmetric cosine, by default 0.5
tversky_alpha (float, optional) – tversky_alpha, by default 1.0
tversky_beta (float, optional) – tversky_beta, by default 1.0
similarity (str, optional) – type of similarity measure to use, by default “cosine”
row_weights (Sequence, optional) – Multiply the values in each row by a specified value, by default None
-
applyAdjustedCosine()¶ Remove from every data point the average for the corresponding row :return:
-
applyPearsonCorrelation()¶ Remove from every data point the average for the corresponding column :return:
-
compute_similarity(start_col=None, end_col=None, block_size=100)¶ Compute the similarity for the given dataset :param self: :param start_col: column to begin with :param end_col: column to stop before, end_col is excluded :return:
-
useOnlyBooleanInteractions()¶