Coder Social home page Coder Social logo

douban / paracel Goto Github PK

View Code? Open in Web Editor NEW
337.0 39.0 84.0 555 KB

Distributed training framework with parameter server

Home Page: http://paracel.io

License: Other

CMake 2.94% C++ 84.48% Python 9.07% Shell 3.36% Makefile 0.15%
machine-learning distributed-computing graph c-plus-plus

paracel's Introduction

logo https://travis-ci.org/douban/paracel.png Join the chat at https://gitter.im/douban/paracel Coverity Scan Build

Paracel Overview

Paracel is a distributed computational framework, designed for many machine learning problems: Logistic Regression, SVD, Matrix Factorization(BFGS, sgd, als, cg), LDA, Lasso...

Firstly, paracel splits both massive dataset and massive parameter space. Unlike Mapreduce-Like Systems, paracel offers a simple communication model, allowing you to work with a global and distributed key-value storage, which is called parameter server.

Upon using paracel, you can build algorithms with following rules: 'pull parameters before learning & push local updates after learning'. It is rather a simple model(compared to MPI) which is almost painless transforming from serial to parallel.

Secondly, paracel tries to solve the 'last-reducer' problem of iterative tasks. We use bounded staleness and find a sweet spot between 'improve-iter' curve and 'iter-sec' curve. A global scheduler takes charge of asynchronous working. This method is already proved to be a generalization of Bsp/Pregel by CMU.

Another advantage of paracel is fault tolerance while MPI has no idea with that.

Paracel can also be used for scientific computing and building graph algorithms. You can load your input in distributed file system and construct a graph, sparse/dense matrix.

Paracel is originally motivated by Jeff Dean's talk @Stanford in 2013. You can get more details in his paper: "Large Scale Distributed Deep Networks".

More documents could be found below:

Project Homepage

Quick Install

20-Minutes' Tutorial

API Reference Page

paracel's People

Contributors

gitter-badger avatar lembacon avatar mckelvin avatar windreamer avatar xunzhang avatar zzl0 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

paracel's Issues

Sparse data

Hi, I am very interesting in this project. And I think the API is very friendly. I am wondering that how I can support very sparse data in paracel. Because in your lr code, the data format is just "feature_1,feature_2,...,feature_k". Thanks a lot.

Create a new release: 0.3.0.

Recently, I received several emails from Paracel users asking questions about building issues of latest commits. Considering that some companies have deployed Paracel in their production environment, I think we should create a new release for relative stable support, say 0.3.0.

Before doing that, we should update the documents in paracel.io. Also, we need to import the quick_tutorial.html file inside Paracel codebase for future update.

List of bugs need to be fixed before 0.3.0 release:

Running with Docker

Could we add docker files for Paracel? I think main difficulty is mesos mode. Thanks.

Sync document with latest version

Starting from this commit c81b4ff, the deployment section of document is a little lagged with v0.3.0. Some users complained about that! To make it work, we need to update the document. Also honor different document versions for different Paracel releases.

Ensure using Pymesos < 0.2.0

pymesos is planning to release 0.2.x series that not compatible with previous 0.1.x.
We need to make sure pymesos < 0.2.0 as running the python scripts.

Support More I/O Data Formats

Allow more file formats such as gzip/bzip2(pigz/pbzip2), internal sequence file format. Besides, could paracel add support for loading data from other external storage such as a key-value database, a relational database or a low-level specified distributed file system?

when i make the paracel, i find an error: Could NOT find MsgpackC (missing: MsgpackC_CHECK_FINE)

-- Find Mespack-C include path: /usr/local/include
-- Performing Test MsgpackC_CHECK_FINE
-- Performing Test MsgpackC_CHECK_FINE - Failed
-- MsgpackC check: 
CMake Error at /usr/local/share/cmake-3.7/Modules/FindPackageHandleStandardArgs.cmake:138 (message):
  Could NOT find MsgpackC (missing: MsgpackC_CHECK_FINE)
Call Stack (most recent call first):
  /usr/local/share/cmake-3.7/Modules/FindPackageHandleStandardArgs.cmake:378 (_FPHSA_FAILURE_MESSAGE)
  cmakes/FindMsgpackC.cmake:39 (find_package_handle_standard_args)
  CMakeLists.txt:39 (find_package)


-- Configuring incomplete, errors occurred!
See also "/home/tuhangdi/paracel-master/build/CMakeFiles/CMakeOutput.log".
See also "/home/tuhangdi/paracel-master/build/CMakeFiles/CMakeError.log".

Fix defects from coverity scan.

I can not create sub-task-issues in Github, so this is a collected issue listing all related sub-issues. Thanks to coverity, all these issues were generated from coverity scan with paracel v0.3.0.

  • PARACEL-33 - Fix the Resource leak bug for Comm.
  • PARACEL-34 - Fix all Uninitialized Scalar Variable defects.
  • PARACEL-35 - Fix all Uncaught Exception defects.
  • PARACEL-36 - Fix all Uninitialized Scalar Field defects.
  • PARACEL-37 - Fix the Argument Cannot be Negative defect.
  • PARACEL-38 - Fix all Big Parameter Passed by Value defects.

Incorrect hostname using `socket.gethostname()` under macOS Sierra.

The socket library in Python source code return the wrong hostname under macOS Sierra.
For example, print socket.gethostname() returns a, but the actual hostname is a.local.
It seems that the old system call from python source code will get an incorrect name and the actual name ends with .local suffix.

str_split语义问题

在python中split如下:

In [1]: s = "\tasd"
In [2]: s.split('\t')
Out[2]: ['', 'asd']

在paracel中str_split实现如下: https://github.com/douban/paracel/blob/master/include/utils/ext_utility.hpp#L36

string_lst str_split(const paracel::str_type & str, 
                     const char sep) {
  string_lst result;
  size_t st = 0, en = 0;
  while(1) {
    en = str.find(sep, st);
    auto s = str.substr(st, en - st);
    if(s.size()) result.push_back(std::move(s));
    if(en == paracel::str_type::npos) break;
    st = en + 1;
  }
  return result;
}

其中if (s.size()) 这个判断会使得str_split行为和python的split不一样。造成的结果是,"a\tb"用split后得到的list长度可能会变化。

是否修改?

i want to run the matrix_factorization example ,but return errors

[tuhangdi@localhost paracel-master]$ /usr/local/prun.py -p 1 -w 1 -c cfg.json -m local /usr/local/bin/wc
terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injectorboost::property_tree::ptree_bad_path >'
what(): No such node (topk)
[localhost:11886] *** Process received signal ***
[localhost:11886] Signal: Aborted (6)
[localhost:11886] Signal code: (-6)
[localhost:11886] [ 0] /lib64/libpthread.so.0(+0xf370)[0x7fcc180ce370]
[localhost:11886] [ 1] /lib64/libc.so.6(gsignal+0x37)[0x7fcc17d331d7]
[localhost:11886] [ 2] /lib64/libc.so.6(abort+0x148)[0x7fcc17d348c8]
[localhost:11886] [ 3] /lib64/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x165)[0x7fcc188539d5]
[localhost:11886] [ 4] /lib64/libstdc++.so.6(+0x5e946)[0x7fcc18851946]
[localhost:11886] [ 5] /lib64/libstdc++.so.6(+0x5e973)[0x7fcc18851973]
[localhost:11886] [ 6] /lib64/libstdc++.so.6(+0x5eb93)[0x7fcc18851b93]
[localhost:11886] [ 7] /usr/local/bin/wc(ZN5boost15throw_exceptionINS_16exception_detail19error_info_injectorINS_13property_tree14ptree_bad_pathEEEEEvRKT+0xcd)[0x42f76d]
[localhost:11886] [ 8] /usr/local/bin/wc(_ZN5boost16exception_detail16throw_exception_INS_13property_tree14ptree_bad_pathEEEvRKT_PKcS8_i+0x77)[0x42f847]
[localhost:11886] [ 9] /usr/local/bin/wc(_ZN5boost13property_tree11basic_ptreeISsSsSt4lessISsEE9get_childERKNS0_11string_pathISsNS0_13id_translatorISsEEEE+0xba)[0x42fd9a]
[localhost:11886] [10] /usr/local/bin/wc(_ZN7paracel11json_parser5parseIiEET_RKSs+0x45)[0x430735]
[localhost:11886] [11] /usr/local/bin/wc(main+0x26e)[0x41cb7e]
[localhost:11886] [12] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7fcc17d1fb35]
[localhost:11886] [13] /usr/local/bin/wc[0x41d78e]
[localhost:11886] *** End of error message ***

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 11886 RUNNING AT localhost.localdomain
= EXIT CODE: 134
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES

YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions

Tip for OSX building

Today, I installed Paracel on mbp but met a problem. After all, I have successfully set it up. Since tr1 may not be exist under OSX, to build msgpack-c during installation, you need to firstly check out to OXS branch before compiling it. Good Luck!

Import document of v0.3.0.

Update document for version 0.3.0. Import document quick_tutorial.html, api_reference.html into paracel codebase.

Improve CI for paracel

Current Paracel CI has some limitations including:

  • It only checks build, and unit test, could we add some feature test case?
  • It only checks under linux(debian/ubuntu), could we add osx environment support?
  • Add coverity for static checking.

Fix unittest failure under osx

2/13 Test #2: test_utils .......................***Failed 5.54 sec
Start 3: test_partition
7/13 Test #7: test_ring ........................***Failed 0.00 sec
Start 8: test_f_traits
10/13 Test #10: test_paracel_types ...............***Failed 0.01 sec
Start 11: test_paste

Refactor python scripts according to Pylint

Scripts List:

./alg/graph_alg/pagerank/data_clean.py
./alg/misc/steady_state_inversion/pre_fmt.py
./alg/recommendation/als/generate_data.py
./alg/recommendation/als/split_validate.py
./mesos_executor.py
./mesos_scheduler.py
./prun.py
./tool/balltree/setup.py
./tool/balltree/usage.py
./tool/datagen.py

Incomplete Installation

After make install, there are some missing files in prefix folder such as balltree tool and so on. They are useful tools for users and need to be installed.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.