Coder Social home page Coder Social logo

parameter_server's Introduction

Parameter Server

Parameter server is a distributed machine learning framework. It scales machine learning applications to industry-level problems, namely 10 to 100 billions of samples and features, 100 to 1000 machines. It is a joint project by CMU SML-Lab, Baidu IDL, and Google.

Features

  • Efficient communication. All communications are asynchronous. It is optimized for machine learning tasks to reduce network traffic and overhead.
  • Flexible consistency models. The system provides flexible consistency models to allow the algorithm designer to balance algorithmic convergence rate and system efficiency, where the best trade-off depends on data, algorithm, and hardware.
  • Elastic Scalability. New nodes can be added without restarting the running framework.
  • Fault Tolerance and Durability. Recovery from and repair of non-catastraphic machine failures within several seconds, without interrupting computation.
  • Ease of Use. The globally shared parameters are represented as (potentially sparse) vectors and matrices to facilitate development of machine learning applications. The linear algebra data types come with high-performance multi-threaded linear algebra libraries.

How to Build

Requirement:

  • compiler: gcc >= 4.8 (prefer 4.9 with better support for ) or llvm >= 3.4 (tested on Apple LLVM version 5.1).
  • system: Should work on both Linux and Mac OS (tested on Ubuntu 12.10, 13.10, RHEL 4U3, Max OS X 10.9)
  • depended libraries: zeromq, gflags, glogs, gtest, protobuf, zlib, snappy, eigen3 and optional mpi. We provide install.sh to build them from sources automatically.

Build parameter server:

cd src && make -j8

The -j option specifies how many threads are used to build the projects, you may change it to a more proper value.

If you want to static link all library, use:

make -j8 args=static                                                                                                              

Input Data

The system can read both raw binary data and protobuf data. It also supports several text formats. In a text format, each example is presented as a line of plain text.

parameter server format

The format of one instance:

label;group_id feature[:value] feature[:value] ...;groud_id ...;...;
  • label: +1/-1 for binary classification, 0,1,2,... for multiclass classification, a float value for regression. And certainly it can be empty.
  • group_id: an integer identity of a feature group, each instance should contains at least one feature group.
  • feature: for sparse data, it is an 64-bit integer presenting the feature id, while for dense data, it is a float feature value
  • weight: only valid for non-binary sparse data, it is a float feature value.

More in progress

The libsvm format uses the sparse format. It presents each instance as:

label feature_id:value feature_id:value ...

vowpal wabbit format

binary format

Check examples on RCV1.

How to start

One way to start the system is using mpirun. To install mpi on Ubuntu, run sudo apt-get install mpich, or sudo port install mpich for Mac OS. The system can be also started via ssh and resource manager such as yarn without mpi (In progress).

mpirun -np 4 ./ps_mpi -num_servers 1 -num_workers 2 -app ../config/rcv1_l1lr.config

The augments:

  • -np: the number of processes created by mpirun. It should >= num_workers + num_servers + 1 (the scheduler). Use -hostfile hosts to specify the machines.
  • -interface: the network interface, run ifconfig to find the available network interfaces
  • -num_workers: the number of worker nodes. Each one will get a part of training data
  • -num_servers: the number of server nodes. Each one will get a part of the model.
  • -app: the application configuration

Use ./ps_mpi --help to see more arguments.

Wrap up

To run sparse logistic regression on RCV1 from scratch:

mkdir your_working_dir
cd your_working_dir
git clone [email protected]:mli/parameter_server.git .
git clone [email protected]:mli/parameter_server_third_party.git third_party
git clone [email protected]:mli/parameter_server_data.git data
third_party/install.sh
cd src && make -j8
mpirun -np 4 ./ps_mpi -num_servers 1 -num_workers 2 -app ../config/rcv1_l1lr.config

parameter_server's People

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.