Coder Social home page Coder Social logo

habanero-rice / hclib Goto Github PK

View Code? Open in Web Editor NEW
70.0 70.0 35.0 153.38 MB

A C/C++ task-based programming model for shared memory and distributed parallel computing.

Home Page: http://habanero-rice.github.io/hclib/

License: BSD 3-Clause "New" or "Revised" License

Shell 0.27% C++ 37.56% Makefile 0.62% C 25.42% Assembly 0.15% M4 0.01% Cuda 1.51% Objective-C 3.99% HTML 1.12% MATLAB 0.02% Python 4.14% Roff 12.06% CMake 0.65% TeX 0.49% PostScript 10.86% CSS 0.05% xBase 0.01% Fortran 0.62% Perl 0.44% Raku 0.01%

hclib's People

Contributors

agrippa avatar ahayashi avatar daowen avatar naman avatar sbak5 avatar srirajpaul avatar srki avatar vkaries avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hclib's Issues

Finish Statement never finishes

I have tried to write and run a simple Program which only prints "Hello World" on each Place / Worker but the Program doesn't terminates. I've tried several combinations of async's but none has worked and the finish statement never finishes ... here is my sample Program:

#define GASNET_PAR

#include "hcupc_spmd.h"
#include "hclib.h"
#include "hclib-place.h"
#include <iostream>

using namespace std;

int main(int argc, char **argv) {

    hupcpp::launch(&argc, &argv, [] () {

        hclib::finish([] () {

            int numWorkers = hclib::num_workers();
            cout << "Total Workers: " << numWorkers << endl;

            int numPlaces = hclib::get_num_places(place_type_t::CACHE_PLACE);
            place_t **cachePlaces = (place_t**) malloc(sizeof(place_t*) * numPlaces);
            hclib::get_places(cachePlaces, place_type_t::CACHE_PLACE);

            for (int i = 0; i < numPlaces; i++) {

                place_t *currentPlace = cachePlaces[i];
                if (currentPlace->nChildren != 0) {
                    cout << "CachePlace with children found, skipping" << endl;
                    continue;
                }

                hclib::async([=] () {
                    cout << "Hello I'm Worker " << hclib::current_worker() << " of " << numWorkers << "Workers" << endl;
                });
            }


            free(cachePlaces);

        });
    });

    return 0;
}

and here is my hpt.xml:

<?xml version="1.0"?>
<!DOCTYPE HPT SYSTEM "hpt.dtd">

<HPT version="0.1" info="an HPT for Testing, 2 dual core processors">
    <place num="1" type="mem" >
        <place num="1" type="cache" >
            <worker num="1" />
        </place>
        <place num="1" type="cache" >
            <worker num="1" />
        </place>
    </place>
</HPT>

Additionally there is only the Worker number 0 which is running the async's.
It would be very nice, if you can help me out with this Problem :)

Deadlock when using finish and async_await

When using a sequence of async_await finish async_await , they dead lock intermittently.

A sample code that produces the error is as follows

  #include "hclib_cpp.h"

  int main (int argc, char *argv[])
  {
      int n_rounds = 8, n_tiles=5;
      const char *deps[] = { "system" };
      hclib::launch(deps, 1, [&]() {

          //create a 2D array of promises
          auto tile_array = new hclib::promise_t<void*>**[n_rounds+1];
          for (int tt=0; tt < n_rounds+1; ++tt) {
              tile_array[tt] = new hclib::promise_t<void*>*[n_tiles];
              for (int j=0; j < n_tiles; ++j) {
                  tile_array[tt][j] = new hclib::promise_t<void*>();
              }
          }

            for (int j=0; j < n_tiles; ++j)
                tile_array[0][j]->put(nullptr);

            for (int tt=0; tt < n_rounds; ++tt){
              for (int j=0; j<n_tiles; ++j){
                int left_nbr = (n_tiles+j-1)%n_tiles;
                int right_nbr = (j+1)%n_tiles;
                hclib::async_await([=]{
                  hclib::finish([=]{
                    hclib::async_await([=]{
                        tile_array[tt+1][j]->put(nullptr);
                    }, tile_array[tt][left_nbr]->get_future());

                  });

                }, tile_array[tt][left_nbr]->get_future(),
                   tile_array[tt][right_nbr]->get_future());
              }
            }

      });
  }

to run the program use the commands:

export HCLIB_WORKERS=2
for i in {1..1000}
do
    echo $i
    ./a.out
done

Future wait operation does not restore finish scope

The future wait operation currently does not maintain the thread-local state of the current task. Specifically, the current_finish property is not restored after a potential fiber migration in blocking future wait calls. This can result in the wrong finish scope counter being decremented, and often triggers a segfault later in the execution due to broken synchronization.

The following test program demonstates this behavior:

https://github.com/habanero-rice/hclib/blob/8f649248e5a5d1a0e26dfa3097cce7fec31533db/test/cpp/promise/future4.cpp

Remove HC_MALLOC and HC_FREE macros

These macros are an artifact from when we had a version of the Cilk allocator built in the HC runtime for the HC language. We've never had that code as part of HClib, since we now prefer directly linking (or using LD_PRELOAD) with third-party scalable allocators like TBB's.

This should be a simple matter of find/replace with malloc/free in the C code. The C++ code shouldn't be calling these C functions, so we'd replace with new/delete instead.

Use a different file extension for C++-only headers

Right now it's not possible to tell which headers are C-compatible without actually looking at the code. While this probably isn't much of a problem for the end user, I think it's a problem from a maintenance perspective.

For example, if I want to git grep for something, and I'm only interested in C++ results, it's much harder to sift through the output. (The same is true if I'm only interested in the C code.)

I suggest using .hpp like Boost, but really I'm fine with anything as long as we can differentiate between the C-compatible and C++-only headers.

Sporadic Deadlock When Using Async and Yield

I am running into a possible deadlock scenario when running the attached code without HCLIB_LOCALITY_FILE and with two workers.

The promises and waits were executed just fine but it seems like the tasks themselves never finish for some reason. I am able to reproduce this issue about 7 out of 10 runs on my machine. @srirajpaul was also able to reproduce this.

#include <chrono>
#include <cstdint>
#include <iostream>
#include <stdlib.h>
#include <thread>

#include "hclib_cpp.h"

int main(int argc, char** argv) {
    const char* deps[] = { "system" };
    hclib::launch(deps, 1, []() {
        hclib::finish([]() {
            hclib::async([]() {
                hclib::promise_t<int>* x = new hclib::promise_t<int>();
                hclib::promise_t<int>* y = new hclib::promise_t<int>();

                hclib::async([y] {
                    hclib::yield();
                    y->put(1);
                });

                hclib::async([x] {
                    x->get_future()->wait();
                });

                y->get_future()->wait();
                x->put(10);
            });
        });
        std::cout << "all finished " << std::endl;
    });
    return 0;
}

ESCAPING_ASYNC property not supported

We currently support "escaping" asyncs internally (they're used for our fiber-based finish-scope continuations), but the ESCAPING_ASYNC property does not seem to be supported through the user-facing API.

We should add support for "escaping" asyncs and/or remove the ESCAPING_ASYNC definition. (We don't have to implement this via a property.)

I recommend supporting "escaping" asyncs. Currently, apps that synchronize via DDTs rather than nested finish scopes see a significant amount of contention on the implicit top-level finish scope because very async has to check-in/out of that one finish scope. Using "escaping" asyncs removes this point of contention.

On a side note, I think "escape" might already have a set meaning in the literature, and I'm not sure if this usage conflicts (i.e., I may not have picked the best name for this, and we may want to revisit that topic).

Use __atomic builtins, remove volatile fields

With proper use of __atomic builtins, we should be able to introduce the needed memory barriers, and remove the volatile qualifier from a few places in our code. It's not clear that volatile is actually doing anything useful right now anyway (our usage of the legacy __sync builtins might be sufficient).

memcpy non-trivially-copyable function objects

Using memcpy to clone C++ objects that are not trivially copyable can cause serious problems.

We're currently using memcpy to clone lambdas for async (see inc/hclib-async.h:124).

Here's a simple program demonstrating the problem:

#include <memory>
#include <iostream>
#include <unistd.h>
#include <cassert>

#include "hclib_cpp.h"

struct SimpleObject {
    int value;
    SimpleObject(): value(1) { }
    ~SimpleObject() { value = 0; }
};

int main (int argc, char ** argv) {
    hclib::launch([]() {
        hclib::finish([]() {{
            auto p = std::make_shared<SimpleObject>();
            std::cout << "Value starts as " << p->value << std::endl;
            hclib::async([=](){
                usleep(100);
                std::cout << "Value in async is " << p->value << std::endl;
                assert(p->value == 1);
            });
            // p is dead
        }});
    });
    std::cout << "Exiting..." << std::endl;
    return 0;
}

Expected output:

Value starts as 1
Value in async is 1
Exiting...

Current output:

Value starts as 1
Value in async is 189792480
Assertion failed: (p->value == 1), function operator(), file capture0.cpp, line 56.
Abort trap: 6

C++ tests for promise+scalar type fail with the -O1, O2, O3 options

Hello,

I've noticed that C++ test programs for promise+scalar type fail when optimization options are specified.

The problem

Consider the following program (test/cpp/future1.cpp):

  constexpr int SIGNAL_VALUE = 42
  hclib::promise_t<int> *event = new hclib::promise_t<int>();
  hclib::async([=]() {
   // T1
    int signal = event->get_future()->wait();
    assert(signal == SIGNAL_VALUE);
    printf("signal = %d\n", signal);
  });
  hclib::async([=]() {
    // T2
    sleep(1);
    event->put(SIGNAL_VALUE);
  });

Task T1 waits on a promise with scalar type (promise_t<int>) which is put by Task T2 (event->put(SIGNAL_VALUE)) and we expect that Task T1 receives 42. However, if this program is compiled with optimization options (-O1, -O2, -O3), the assertion fails (or a wrong value is printed).

Steps to reproduce the problem

I confirmed I can reproduce the problem on my laptop (Apple LLVM 8.0.0) and davinci (g++ 6.2.0). Also, It's worth noting that HClib itself is built with the -O3 option (by install.sh). The problem is more about a compiler option you give when compiling test programs.

$ git clone [email protected]:habanero-rice/hclib.git
$ ./install.sh
$ source ./hclib-install/bin/hclib_setup_env.sh
$ cd test/cpp
$ emacs Makefile # add the -O3 option
$ ./test_all.sh # or make -j
$ ./future1

The Cause

The root cause is the current implementation of future_t::wait() (get() has the same issue). Here is the implementation (hclib-future.hpp):

 T &&wait() {
  _ValUnion tmp;
  tmp.vp = hclib_future_wait(this);
  return std::move(tmp.val);
}

The problem is , since the variable tmp is allocated on stack, the value can be easily lost at the end of wait even though the returned value is a rvalue reference.

Possbile Solutions

Here are possible solutions. I'd vote for 1 because 2 would prevent compilation optimizations. However, If returning a rvalue reference is important and/or there are any other solutions, please let me know. Also, if the performance is the case (e.g., A scenario where T is not a primitive type and the overhead of copy constructor would not be small), I'd be happy to try to come up with a better solution and/or create synthetic benchmarks to discuss the performance.

1. Return by value

 T wait() {
  _ValUnion tmp;
  tmp.vp = hclib_future_wait(this);
  return tmp.val;
}

2. The use of volatile

volatile T&& wait() {
  volatile _ValUnion tmp;
   tmp.vp = hclib_future_wait(this);
   return std::move(tmp.val);
}

Thanks,

Akihiro

Repo takes too long to clone

Based on the size of the repo (~100MB), there are obviously some large files checked in somewhere. We should figure out where these are, and whether or not they're really necessary.

Compilation error: missing `xlocale.h`

Hi, everyone,

I'm having problems compiling hclib in Ubuntu 17.10. According to ubuntu-packages it seemed I had to install libc++-dev. The file is indeed installed in my system but hclib's build script is not able to find it:

$ find /usr/include/ -iname 'xlocale.h'
/usr/include/c++/v1/support/solaris/xlocale.h
/usr/include/c++/v1/support/newlib/xlocale.h
/usr/include/c++/v1/support/ibm/xlocale.h
/usr/include/c++/v1/support/musl/xlocale.h
/usr/include/c++/v1/support/xlocale/xlocale.h

Any workarounds? I was expecting configure.sh to either find the right header or at least consider the missing header as an error.


Compilation error:

$ ./install.sh
[...]
/bin/bash ../libtool  --tag=CC   --mode=compile gcc -DHAVE_CONFIG_H -I. -I../../src -I../inc    -Wall -g -O3 -std=c11 -I../../inc -I../../src/inc -I../../src/fcontext  -DHC_ASSERTION_CHECK -DHC_COMM_WORKER_STATS -I/usr/include/libxml2 -Wall -g -O3 -std=c11 -MT libhclib_la-hclib-thread-bind.lo -MD -MP -MF .deps/libhclib_la-hclib-thread-bind.Tpo -c -o libhclib_la-hclib-thread-bind.lo `test -f 'hclib-thread-bind.c' || echo '../../src/'`hclib-thread-bind.c
libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I../../src -I../inc -Wall -g -O3 -std=c11 -I../../inc -I../../src/inc -I../../src/fcontext -DHC_ASSERTION_CHECK -DHC_COMM_WORKER_STATS -I/usr/include/libxml2 -Wall -g -O3 -std=c11 -MT libhclib_la-hclib-thread-bind.lo -MD -MP -MF .deps/libhclib_la-hclib-thread-bind.Tpo -c ../../src/hclib-thread-bind.c  -fPIC -DPIC -o .libs/libhclib_la-hclib-thread-bind.o
../../src/hclib-thread-bind.c:20:10: fatal error: xlocale.h: No such file or directory
 #include <xlocale.h>
          ^~~~~~~~~~~
compilation terminated.
Makefile:695: recipe for target 'libhclib_la-hclib-thread-bind.lo' failed
make[2]: *** [libhclib_la-hclib-thread-bind.lo] Error 1
make[2]: Leaving directory '/tmp/hclib/compileTree/src'
Makefile:770: recipe for target 'all-recursive' failed
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory '/tmp/hclib/compileTree/src'
Makefile:478: recipe for target 'all-recursive' failed
make: *** [all-recursive] Error 1

Add support for Honeycomb simulator

Adding support for Honeycomb simulator requires adding the assembly support for context switching and conditional compilation for a few unsupported features.

Excessive copies of lambdas

We need to refactor the C++ API to avoid passing function objects by value.

Currently, functions like async accept their function object arguments by value, which results in a lot of extra copies. For a typical async (see test copies0), there are two or three extra copies. When using the forasync family of functions, the issue is much worse, with thousands of unneeded copies for a single 3D loop in from of our tests (see test copies1).

build broken on mac os and linux

With one of the recent merges, I'm getting the following error message on both Mac OS and Linux machines:

libtool: compile: g++ -DPACKAGE_NAME="hclib" -DPACKAGE_TARNAME="hclib" -DPACKAGE_VERSION="0.1" "-DPACKAGE_STRING="hclib 0.1"" -DPACKAGE_BUGREPORT="[email protected]" -DPACKAGE_URL="" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=".libs/" -I. -I../../src -Wall -g -O3 -std=c++11 -I../../inc -I../../src/inc -I../../src/fcontext -DHC_ASSERTION_CHECK -DHC_COMM_WORKER_STATS -I/home/jmg3/libxml2/include/libxml2 -Wall -g -O3 -std=c++11 -MT libhclib_la-hclib_cpp.lo -MD -MP -MF .deps/libhclib_la-hclib_cpp.Tpo -c ../../src/hclib_cpp.cpp -fPIC -DPIC -o .libs/libhclib_la-hclib_cpp.o
In file included from ../../inc/hclib_promise.h:5:0,
from ../../inc/hclib-async.h:42,
from ../../inc/hclib_cpp.h:6,
from ../../src/hclib_cpp.cpp:1:
../../inc/hclib_future.h:14:20: error: ‘is_trivially_copyable’ is not a member of ‘std’
HASSERT_STATIC(std::is_trivially_copyable::value,

with the GNU compilers, versions 4.8.5 and 4.9.2. This should be easily reproduce-able by anyone by building on DAVINCI.

Nick, which version of which compilers are you testing with? While this is clearly in the C++11 spec, it appears to be a low-priority feature for many compilers [1]. I would suggest adding an ifdef that checks very specifically for whatever compiler you're testing with.

I'd also recommend at least making sure your changes build on some of our shared Linux systems (e.g. congaree, DAVINCI, STIC, etc) before pushing. My personal policy for pushing to master is to first run all c/ and cpp/ tests on my macbook and on DAVINCI, and ensure there are no compiler errors or warnings with a reasonably modern GNU compiler (e.g. 4.8 or 4.9 versions).

Please look at this ASAP, as I am unable to build and test the master branch on any systems at the moment.

[1] https://gcc.gnu.org/onlinedocs/gcc-4.8.3/libstdc++/manual/manual/status.html#status.iso.2011

Missing HC_CUDA Makro

The Function:
place_t **get_nvgpu_places(int *n_nvgpu_places);

is not excluded with an ifdef for HC_CUDA, so it is not possible to use the HabaneroUPC++ Library without the Cuda Feature.

Refinement of atomic operations in HClib runtime

Hclib uses __sync builtins through wrappers. This is deprecated builtins in gcc.

I think C11 is enforced for HClib which means we can use c11 atomics instead for atomic operations.

Relaxed consistency through 'acquire' and 'release' provides more efficient synchronization of memory writes/reads. I can easily find that some parts of hclib runtime codes are not synchronized correctly with 'release' memory barrier. This can cause hang in weak consistency machines such as ARM/Power. It might be fine on TSO machines such as x86 but I guess it's better to use consistent relaxed consistency model across all of runtime codes for compatibility across common architectures.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.