Coder Social home page Coder Social logo

tanpeng1995 / dimensionality-reduction Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ashwinipokle/dimensionality-reduction

0.0 1.0 0.0 11 KB

Contains MPI, OpenMP and Hybrid ( MPI + OpenMP) implementations of PCA and SVD using QR decomposition.

Makefile 2.18% C++ 97.82%

dimensionality-reduction's Introduction

This repository contains multi-thread, multi-node and hybrid (multi-core + multi-node) implementations of SVD and PCA using QR decomposition for Tall and Skinny matrices (TSQR).
This work has been inspired by work of Benson, Austin R., David F. Gleich, and James Demmel in "Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures." and extends their work to introduce task parallelism.

This method of calculating PCA / SVD works only if cols >> rows, and this should be true even at lowest granularity i.e. even for individual core / worker thread, cols >> rows, otherwise this algorithm produces indeterminate results. 

This code uses Eigen C++ template library for performing linear algebra operations.

The makefile in each individual folder works only if Eigen library has been added on path. Otherwise the files can be compiled in the following way by mentioning the path to Eigen folder explicitly.

g++ -I /path/to/eigen/library serial.cc -o serial
g++ -I /path/to/eigen/library -fopenmp omp.cc -o omp
mpicxx -I /path/to/eigen/library mpi.cc -o mpi
mpicxx -I /path/to/eigen/library -fopenmp hybrid.cc -o hybrid

Command-line arguments
While executing the number of rows & cols of input matrix need to be mentioned. Besides this, number of threads needs to be mentioned if applicable. Also path to the input data file needs to be given as command line argument. Individual numbers should be seperated by white space. Currently does not support csv as input.

IMP :
Currently, the input matrix is being read in a naive way and might be very slow for large matrices owing to large number of disk writes. If the IO time is very high, I will write another method for file IO using buffers and supply it soon. 

For testing a simple bash script can be written similar to the one given below - 

#!/bin/bash

make

# For executing OpenMP code, write ./<name of executable> <number of rows> <number of columns> <number of threads>

echo "omp output"
./omp 4000000000 4 8
./omp 2500000000 10 8
./omp 500000000 50 8
./omp 150000000 100 8

# For executing MPI code, write mpirun -np <number of processors> <name of executable> <number of rows> <number of columns>

echo "mpi output"
mpirun -np 8 mpi 4000000000 4
mpirun -np 8 mpi 2500000000 10
mpirun -np 8 mpi 500000000 50 
mpirun -np 8 mpi 150000000 100

# For executing Hybrid code, write mpirun -np <number of processors> <name of executable> <number of rows> <number of columns> <number of threads>

echo "hybrid output"
mpirun -np 8 hybrid 4000000000 4 4
mpirun -np 8 hybrid 2500000000 10 4
mpirun -np 8 hybrid 500000000 50 4
mpirun -np 8 hybrid 150000000 100 4

The outputs (time for executing PCA / SVD ) would be directed to files in the cwd. 

The time for execution does not account for disk IOs involved (in the beginning while reading input). It only accounts time for computing SVD/PCA.

Also, as a part of sanity tests, the orthogonality tests should be done. i.e. calculating L2-norm of orthogonal matrices U & V after subtracting them from identity matrix.
    
	|U^T * U - I|_2 

This value will determine how orthogonal the matrices produced by this algorithm are and will validate the correctness of the algorithm.

dimensionality-reduction's People

Contributors

ashwinipokle avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.