baserec.base.similarity package

Submodules

baserec.base.similarity.compute_similarity module

@author: Maurizio Ferrari Dacrema & Ceshine Lee

class baserec.base.similarity.compute_similarity.ComputeSimilarity(dataMatrix, use_implementation='density', similarity=None, **args)

Bases: object

compute_similarity(**args)
class baserec.base.similarity.compute_similarity.SimilarityFunction(value)

Bases: enum.Enum

An enumeration.

ADJUSTED_COSINE = 'adjusted'
COSINE = 'cosine'
EUCLIDEAN = 'euclidean'
JACCARD = 'jaccard'
PEARSON = 'pearson'
TANIMOTO = 'tanimoto'

baserec.base.similarity.compute_similarity_cython module

@author: Maurizio Ferrari Dacrema

class baserec.base.similarity.compute_similarity_cython.ComputeSimilarityCython

Bases: object

compute_similarity()

Compute the similarity for the given dataset :param self: :param start_col: column to begin with :param end_col: column to stop before, end_col is excluded :return:

baserec.base.similarity.compute_similarity_euclidean module

@author: Maurizio Ferrari Dacrema & Ceshine Lee

class baserec.base.similarity.compute_similarity_euclidean.ComputeSimilarityEuclidean(dataMatrix, topK=100, shrink=0, normalize=False, normalize_avg_row=False, similarity_from_distance_mode='lin', row_weights=None, **args)

Bases: object

compute_similarity(start_col=None, end_col=None, block_size=100)

Compute the similarity for the given dataset :param self: :param start_col: column to begin with :param end_col: column to stop before, end_col is excluded :return:

baserec.base.similarity.compute_similarity_euclidean_test module

@author: Maurizio Ferrari Dacrema & Ceshine Lee

class baserec.base.similarity.compute_similarity_euclidean_test.MyTestCase(methodName='runTest')

Bases: unittest.case.TestCase

test_euclidean_similarity_float()
test_euclidean_similarity_integer()
baserec.base.similarity.compute_similarity_euclidean_test.areSparseEquals(Sparse1, Sparse2)

baserec.base.similarity.compute_similarity_python module

@author: Maurizio Ferrari Dacrema & Ceshine Lee

class baserec.base.similarity.compute_similarity_python.ComputeSimilarityPython(dataMatrix, topK=100, shrink=0, normalize=True, asymmetric_alpha=0.5, tversky_alpha=1.0, tversky_beta=1.0, similarity='cosine', row_weights=None)

Bases: object

Computes the cosine similarity on the columns of dataMatrix

  • If it is computed on URM=|users|x|items|, pass the URM as is.

  • If it is computed on ICM=|items|x|features|, pass the URM transposed.

Available similarity measures (the similarity parameter):
  • “cosine” computes Cosine similarity (this is the default)

  • “adjusted” computes Adjusted Cosine, removing the average of the users

  • “asymmetric” computes Asymmetric Cosine

  • “pearson” computes Pearson Correlation, removing the average of the items

  • “jaccard” computes Jaccard similarity for binary interactions using Tanimoto

  • “dice” computes Dice similarity for binary interactions

  • “tversky” computes Tversky similarity for binary interactions

  • “tanimoto” computes Tanimoto coefficient for binary interactions

Asymmetric Cosine as described in:

Aiolli, F. (2013, October). Efficient top-n recommendation for very large scale binary rated datasets. In Proceedings of the 7th ACM conference on Recommender systems (pp. 273-280). ACM.

(Note from Ceshine: since the similarities are calculated between columns, the asymmetric cosine measure doesn’t seem to make sense here?)

Parameters
  • dataMatrix – Numpy matrix

  • topK (int, optional) – Keep only the Top K entries, by default 100

  • shrink (int, optional) – The shrinkage parameter helps to avoid overfitting when only few ratings are available, by default 0

  • normalize (bool, optional) – If True divide the dot product by the product of the norms, by default True

  • asymmetric_alpha (float, optional) – Coefficient alpha for the asymmetric cosine, by default 0.5

  • tversky_alpha (float, optional) – tversky_alpha, by default 1.0

  • tversky_beta (float, optional) – tversky_beta, by default 1.0

  • similarity (str, optional) – type of similarity measure to use, by default “cosine”

  • row_weights (Sequence, optional) – Multiply the values in each row by a specified value, by default None

applyAdjustedCosine()

Remove from every data point the average for the corresponding row :return:

applyPearsonCorrelation()

Remove from every data point the average for the corresponding column :return:

compute_similarity(start_col=None, end_col=None, block_size=100)

Compute the similarity for the given dataset :param self: :param start_col: column to begin with :param end_col: column to stop before, end_col is excluded :return:

useOnlyBooleanInteractions()

Module contents