Coder Social home page Coder Social logo

effcossim's Introduction

Efficient Pairwise Cosine Similarity Computation

The (i, j)-entry of the output matrix is the cosine distance between the i-th row of A and the j-th row of B. This function is only a wrapper, it uses the implementation of cosine_similarity from scikit-learn and the implementation of awesome_cossim_topn from sparse_dot_topn. For more details, please check:

To install this package:

pip install effcossim

Sample code:

from numpy import array
from effcossim.pcs import pairwise_cosine_similarity, pp_pcs

A = array([
    [1, 2, 3], 
    [0, 1, 2],
    [5, 1, 1]
])

B = array([
    [1, 1, 2], 
    [0, 1, 2],
    [5, 0, 1], 
    [0, 0, 4]
])

# scikit-learn implementation
M1 = pairwise_cosine_similarity(
    A=A, B=B, 
    efficient=False, 
    dense_output=True
)

# sparse_dot_topn implementation
M2 = pairwise_cosine_similarity(
    A=A, B=B, 
    efficient=True, 
    n_top=4, 
    lower_bound=0.5, 
    n_jobs=2, 
    dense_output=True
)

When efficient=True, in each row of the output matrix only the top n_top entries above lower_bound are retained (lower memory impacts). Furthermore, if n_jobs is larger than 1, parallel computations are applied (higher speed).

If multiple comparisons are required, the parallel implementation can be used.

l1 = [random(m=10000, n=1000, density=0.3,) for _ in range(6)]
l2 = [random(m=10000, n=1000, density=0.3,) for _ in range(6)]

L = pp_pcs(
    l1=l1, 
    l2=l2, 
    n_workers=2, 
    efficient=True, 
    n_top=10, 
    lower_bound=0.3, 
    n_jobs=2, 
    dense_output=False
)

The output is a list where the k-th element is the output of

pairwise_cosine_similarity(l1[k], l2[k])

For further examples, check the notebook.

effcossim's People

Contributors

ngshya avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.