Coder Social home page Coder Social logo

swsachith / harp Goto Github PK

View Code? Open in Web Editor NEW

This project forked from dsc-spidal/harp

0.0 2.0 0.0 782.26 MB

A collective communication library plugined into Hadoop

Home Page: https://dsc-spidal.github.io/harp/

Java 96.02% Shell 1.63% Python 0.21% CSS 0.40% JavaScript 0.28% HTML 1.45%

harp's Introduction

HARP

@Copyright 2013-2018 Inidana University Apache License 2.0

WHAT IS HARP?

Harp is a HPC-ABDS (High Performance Computing Enhanced Apache Big Data Stack) framework aiming to provide distributed machine learning and other data intensive applications.

Highlights

  1. Plug into Hadoop ecosystem.
  2. Rich computation models for different machine learning/data intensive applications
  3. MPI-like Collective Communication operations
  4. High performance native kernels supporting many-core processors (e.g., Intel Xeon and Xeon Phi)

Online Documentation

Please find the full documentation of Harp at https://dsc-spidal.github.io/harp/ including quick start, programming guide, and examples.

Open Channels

  1. Google group - [email protected]
  2. Slack - https://apache-harp.slack.com

Installation of Harp

Install from Binaries

Please download the binaries of Harp from https://github.com/DSC-SPIDAL/harp/releases.

Copy the jar files to $HADOOP_HOME

## the core modules 
cp core/harp-hadoop-0.1.0.jar $HADOOP_HOME/share/hadoop/mapreduce/
cp core/harp-collective-0.1.0.jar $HADOOP_HOME/share/hadoop/mapreduce/
cp core/harp-daal-interface-0.1.0.jar $HADOOP_HOME/share/hadoop/mapreduce/
## the application modules 
cp ml/harp-java-0.1.0.jar $HADOOP_HOME/
cp ml/harp-daal-0.1.0.jar $HADOOP_HOME/
cp contrib-0.1.0.jar $HADOOP_HOME/ 

Install from Source Code

  1. Install Maven by following the maven official instruction
  2. Compile harp by Maven with different hadoop versions
## x.x.x could be 2.6.0, 2.7.5, and 2.9.0 
mvn clean package -Phadoop-x.x.x
  1. Copy compiled modules jar files to $HADOOP_HOME
cd harp/
## the core modules 
cp core/harp-hadoop/target/harp-hadoop-0.1.0.jar $HADOOP_HOME/share/hadoop/mapreduce/
cp core/harp-collective/target/harp-collective-0.1.0.jar $HADOOP_HOME/share/hadoop/mapreduce/
cp core/harp-daal-interface/target/harp-daal-interface-0.1.0.jar $HADOOP_HOME/share/hadoop/mapreduce/
## the application modules 
cp ml/java/target/harp-java-0.1.0.jar $HADOOP_HOME/
cp ml/daal/target/harp-daal-0.1.0.jar $HADOOP_HOME/
cp contrib/target/contrib-0.1.0.jar $HADOOP_HOME/ 

Add third party dependencies

Harp depends on a group of third party libraries. Make sure to install them before launching the applications

cd third_party/
## JAR files
cp *.jar $HADOOP_HOME/share/hadoop/mapreduce/
## DAAL 2018
## copy daal java API lib
cp daal-2018/lib/daal.jar $HADOOP_HOME/share/hadoop/mapreduce/
## copy native libs to HDFS
hdfs dfs -mkdir -p /Hadoop
hdfs dfs -mkdir -p /Hadoop/Libraries
hdfs dfs -put daal-2018/lib/intel64_lin/libJavaAPI.so /Hadoop/Libraries
hdfs dfs -put tbb/lib/intel64_lin/gcc4.4/libtbb* /Hadoop/Libraries

Installation of Harp-DAAL-Experimental (from source code)

Harp-DAAL-Experimental only supports an installation from source code for now. Please follow the steps

  1. Pull the DAAL source code branch: daal_2018 branch
git clone -b daal_2018 [email protected]:DSC-SPIDAL/harp.git
mv harp harp-daal-exp
cd harp-daal-exp

or git pull the submodule from third_party/daal-exp/

cd harp/
git submodule update --init --recursive
cd third_party/daal-exp/
  1. Compile the native library either by icc or gnu
## use COMPILER=gun if icc is not available
make daal PLAT=lnx32e COMPILER=icc
  1. Setup DAALROOT environment variable by sourcing scripts from DAAL release codes.
source ../__release_lnx/daal/bin/daalvars.sh intel64
  1. Compile harp-daal-experimental modules at Harp. Makesure that line 17 of harp/pom.xml file is uncommented and DAALROOT is setup by step 3.
### check DAALROOT
echo $DAALROOT
### re-run maven to compile
mvn clean package -Phadoop-x.x.x
  1. Install compiled libraries.
## copy Java API to Hadoop folder
cp ../__release_lnx/daal/lib/daal.jar $HADOOP_HOME/share/hadoop/mapreduce/
## copy harp-daal-exp libs
cp experimental/target/experimental-0.1.0.jar $HADOOP_HOME/
## copy native libs to HDFS 
hdfs dfs -mkdir -p /Hadoop
hdfs dfs -mkdir -p /Hadoop/Libraries
hdfs dfs -put ../__release_lnx/daal/lib/intel64_lin/libJavaAPI.so /Hadoop/Libraries
hdfs dfs -put ../__release_lnx/tbb/lib/intel64_lin/gcc4.4/libtbb* /Hadoop/Libraries
hdfs dfs -put harp/third_party/omp/libiomp5.so /Hadoop/Libraries/
hdfs dfs -put harp/third_party/hdfs/libhdfs.so* /Hadoop/Libraries/

The experimental codes have only been tested on Linux 64 bit platforme with Intel icc compiler and GNU compiler.

Run example of K-means

Make sure that harp-java-0.1.0.jar has been copied to $HADOOP_HOME. Start the Hadoop service

cd $HADOOP_HOME
sbin/start-dfs.sh
sbin/start-yarn.sh

The usage of K-means is

hadoop jar harp-java-0.1.0.jar edu.iu.kmeans.regroupallgather.KMeansLauncher
  <num of points> <num of centroids> <vector size> <num of point files per worker>
  <number of map tasks> <num threads> <number of iteration> <work dir> <local points dir>

For example:

hadoop jar harp-java-0.1.0.jar edu.iu.kmeans.regroupallgather.KMeansLauncher 1000 10 100 5 2 2 10 /kmeans /tmp/kmeans

harp's People

Contributors

all-seeing-code avatar anchalkhandelwal avatar argowtham avatar bfeng avatar dexterrules avatar ethanlm avatar imingtsou avatar kyhhdm avatar lemondream avatar lyny avatar marvram avatar mayank-kgp avatar mihaivavram avatar prawalgangwar avatar sabrao avatar supunkamburugamuve avatar swsachith avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.