Coder Social home page Coder Social logo

douban / paracel Goto Github PK

View Code? Open in Web Editor NEW
337.0 39.0 84.0 555 KB

Distributed training framework with parameter server

Home Page: http://paracel.io

License: Other

CMake 2.94% C++ 84.48% Python 9.07% Shell 3.36% Makefile 0.15%
machine-learning distributed-computing graph c-plus-plus

paracel's Issues

Fix unittest failure under osx

2/13 Test #2: test_utils .......................***Failed 5.54 sec
Start 3: test_partition
7/13 Test #7: test_ring ........................***Failed 0.00 sec
Start 8: test_f_traits
10/13 Test #10: test_paracel_types ...............***Failed 0.01 sec
Start 11: test_paste

Running with Docker

Could we add docker files for Paracel? I think main difficulty is mesos mode. Thanks.

Ensure using Pymesos < 0.2.0

pymesos is planning to release 0.2.x series that not compatible with previous 0.1.x.
We need to make sure pymesos < 0.2.0 as running the python scripts.

Improve CI for paracel

Current Paracel CI has some limitations including:

  • It only checks build, and unit test, could we add some feature test case?
  • It only checks under linux(debian/ubuntu), could we add osx environment support?
  • Add coverity for static checking.

Support More I/O Data Formats

Allow more file formats such as gzip/bzip2(pigz/pbzip2), internal sequence file format. Besides, could paracel add support for loading data from other external storage such as a key-value database, a relational database or a low-level specified distributed file system?

Incomplete Installation

After make install, there are some missing files in prefix folder such as balltree tool and so on. They are useful tools for users and need to be installed.

when i make the paracel, i find an error: Could NOT find MsgpackC (missing: MsgpackC_CHECK_FINE)

-- Find Mespack-C include path: /usr/local/include
-- Performing Test MsgpackC_CHECK_FINE
-- Performing Test MsgpackC_CHECK_FINE - Failed
-- MsgpackC check: 
CMake Error at /usr/local/share/cmake-3.7/Modules/FindPackageHandleStandardArgs.cmake:138 (message):
  Could NOT find MsgpackC (missing: MsgpackC_CHECK_FINE)
Call Stack (most recent call first):
  /usr/local/share/cmake-3.7/Modules/FindPackageHandleStandardArgs.cmake:378 (_FPHSA_FAILURE_MESSAGE)
  cmakes/FindMsgpackC.cmake:39 (find_package_handle_standard_args)
  CMakeLists.txt:39 (find_package)


-- Configuring incomplete, errors occurred!
See also "/home/tuhangdi/paracel-master/build/CMakeFiles/CMakeOutput.log".
See also "/home/tuhangdi/paracel-master/build/CMakeFiles/CMakeError.log".

Sync document with latest version

Starting from this commit c81b4ff, the deployment section of document is a little lagged with v0.3.0. Some users complained about that! To make it work, we need to update the document. Also honor different document versions for different Paracel releases.

Sparse data

Hi, I am very interesting in this project. And I think the API is very friendly. I am wondering that how I can support very sparse data in paracel. Because in your lr code, the data format is just "feature_1,feature_2,...,feature_k". Thanks a lot.

Import document of v0.3.0.

Update document for version 0.3.0. Import document quick_tutorial.html, api_reference.html into paracel codebase.

Refactor python scripts according to Pylint

Scripts List:

./alg/graph_alg/pagerank/data_clean.py
./alg/misc/steady_state_inversion/pre_fmt.py
./alg/recommendation/als/generate_data.py
./alg/recommendation/als/split_validate.py
./mesos_executor.py
./mesos_scheduler.py
./prun.py
./tool/balltree/setup.py
./tool/balltree/usage.py
./tool/datagen.py

Create a new release: 0.3.0.

Recently, I received several emails from Paracel users asking questions about building issues of latest commits. Considering that some companies have deployed Paracel in their production environment, I think we should create a new release for relative stable support, say 0.3.0.

Before doing that, we should update the documents in paracel.io. Also, we need to import the quick_tutorial.html file inside Paracel codebase for future update.

List of bugs need to be fixed before 0.3.0 release:

Tip for OSX building

Today, I installed Paracel on mbp but met a problem. After all, I have successfully set it up. Since tr1 may not be exist under OSX, to build msgpack-c during installation, you need to firstly check out to OXS branch before compiling it. Good Luck!

str_split语义问题

在python中split如下:

In [1]: s = "\tasd"
In [2]: s.split('\t')
Out[2]: ['', 'asd']

在paracel中str_split实现如下: https://github.com/douban/paracel/blob/master/include/utils/ext_utility.hpp#L36

string_lst str_split(const paracel::str_type & str, 
                     const char sep) {
  string_lst result;
  size_t st = 0, en = 0;
  while(1) {
    en = str.find(sep, st);
    auto s = str.substr(st, en - st);
    if(s.size()) result.push_back(std::move(s));
    if(en == paracel::str_type::npos) break;
    st = en + 1;
  }
  return result;
}

其中if (s.size()) 这个判断会使得str_split行为和python的split不一样。造成的结果是,"a\tb"用split后得到的list长度可能会变化。

是否修改?

Fix defects from coverity scan.

I can not create sub-task-issues in Github, so this is a collected issue listing all related sub-issues. Thanks to coverity, all these issues were generated from coverity scan with paracel v0.3.0.

  • PARACEL-33 - Fix the Resource leak bug for Comm.
  • PARACEL-34 - Fix all Uninitialized Scalar Variable defects.
  • PARACEL-35 - Fix all Uncaught Exception defects.
  • PARACEL-36 - Fix all Uninitialized Scalar Field defects.
  • PARACEL-37 - Fix the Argument Cannot be Negative defect.
  • PARACEL-38 - Fix all Big Parameter Passed by Value defects.

i want to run the matrix_factorization example ,but return errors

[tuhangdi@localhost paracel-master]$ /usr/local/prun.py -p 1 -w 1 -c cfg.json -m local /usr/local/bin/wc
terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injectorboost::property_tree::ptree_bad_path >'
what(): No such node (topk)
[localhost:11886] *** Process received signal ***
[localhost:11886] Signal: Aborted (6)
[localhost:11886] Signal code: (-6)
[localhost:11886] [ 0] /lib64/libpthread.so.0(+0xf370)[0x7fcc180ce370]
[localhost:11886] [ 1] /lib64/libc.so.6(gsignal+0x37)[0x7fcc17d331d7]
[localhost:11886] [ 2] /lib64/libc.so.6(abort+0x148)[0x7fcc17d348c8]
[localhost:11886] [ 3] /lib64/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x165)[0x7fcc188539d5]
[localhost:11886] [ 4] /lib64/libstdc++.so.6(+0x5e946)[0x7fcc18851946]
[localhost:11886] [ 5] /lib64/libstdc++.so.6(+0x5e973)[0x7fcc18851973]
[localhost:11886] [ 6] /lib64/libstdc++.so.6(+0x5eb93)[0x7fcc18851b93]
[localhost:11886] [ 7] /usr/local/bin/wc(ZN5boost15throw_exceptionINS_16exception_detail19error_info_injectorINS_13property_tree14ptree_bad_pathEEEEEvRKT+0xcd)[0x42f76d]
[localhost:11886] [ 8] /usr/local/bin/wc(_ZN5boost16exception_detail16throw_exception_INS_13property_tree14ptree_bad_pathEEEvRKT_PKcS8_i+0x77)[0x42f847]
[localhost:11886] [ 9] /usr/local/bin/wc(_ZN5boost13property_tree11basic_ptreeISsSsSt4lessISsEE9get_childERKNS0_11string_pathISsNS0_13id_translatorISsEEEE+0xba)[0x42fd9a]
[localhost:11886] [10] /usr/local/bin/wc(_ZN7paracel11json_parser5parseIiEET_RKSs+0x45)[0x430735]
[localhost:11886] [11] /usr/local/bin/wc(main+0x26e)[0x41cb7e]
[localhost:11886] [12] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7fcc17d1fb35]
[localhost:11886] [13] /usr/local/bin/wc[0x41d78e]
[localhost:11886] *** End of error message ***

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 11886 RUNNING AT localhost.localdomain
= EXIT CODE: 134
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES

YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions

Incorrect hostname using `socket.gethostname()` under macOS Sierra.

The socket library in Python source code return the wrong hostname under macOS Sierra.
For example, print socket.gethostname() returns a, but the actual hostname is a.local.
It seems that the old system call from python source code will get an incorrect name and the actual name ends with .local suffix.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.