Coder Social home page Coder Social logo

fast-nmtf's Introduction

fast-nmtf

Fast optimization of non-negative matrix tri-factorization.

Installation

This project relies on numpy and scipy libraries. For best results, we recommend installing it inside the Anaconda environment. Anaconda simplifies the environment setup by providing optimized libraries for matrix operations (such as Intel MKL).

   git clone https://github.com/acopar/fast-nmtf
   cd fast-nmtf
   conda env create -f environment.yml
   conda activate fast-nmtf
   pip install -e .

Data

To download preprocessed benchmark datasets, use the provided get_datasets.sh script.

    scripts/get_datasets.sh

This script downloads datasets that have already been preprocessed and converted into npz (numpy compressed) format:

Example

    python fnmtf/factorize.py -t cod -k 20 data/aldigs.npz

The following optimization techniques can be set with option -t.:

  • mu: multiplicative updates
  • als: alternating least squares
  • pg: projected gradient
  • cod: coordinate descent

Reproduce results

To exactly reproduce the experiments, where each dataset is run ten times on each of the optimization techniques, run the following command. This may take days depending on your configuration.

    bash scripts/full.sh

Long test will evaluate convergence (using the same factorization rank=20). This will take hours to complete (less than 10 times faster compared to full test).

    bash scripts/long.sh

There is a shorter version of the experiments, which has a lower threshould for convergence (epsilon=10^-5), max iterations set to 2000. This test will complete in a few hours.

    bash scripts/short.sh

After the experiments are done, you can visualize the output using the following two commands:

    python fnmtf/visualize.py

Command line arguments

  • -t [arg]: Optimization technique [mu, als, pg, cod]
  • -s: Use sparse matrices
  • -k [arg]: factorization rank, positive integer
  • -p [arg]: number of parallel workers
  • -S [arg]: random seed
  • -e [arg]: stopping criteria threshould (higher means more iterations), default=6
  • -m [arg]: minimum number of iterations
  • data: last argument is path to the dataset (required)

Retrieve factors

After the factorization is finished, U, S, and V factors are stored in results/<dataset>/<technique>/<factor>.csv. For example, if you selected cod technique, the results can be viewed using the following commands, where U is left factor, S is middle factor and V is right factor.

    cat results/aldigs/cod/U.csv
    cat results/aldigs/cod/S.csv
    cat results/aldigs/cod/V.csv

For convenience, all three factors are also saved in results/aldigs/cod.pkl as a tuple of numpy matrices and can be loaded with load_file function provided in loader.py.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.