raftlib / raftlib Goto Github PK

View Code? Open in Web Editor NEW

923.0 61.0 122.0 24.12 MB

The RaftLib C++ library, streaming/dataflow concurrency via C++ iostream-like operators

Home Page: http://raftlib.io

License: Apache License 2.0

C++ 96.78% CMake 2.86% Shell 0.36%

parallel dataflow-programming raftlib c-plus-plus hpc runtime thread-library qthreads pthreads ipc

raftlib's People

Stargazers

Watchers

Forkers

jochenschneider tempbottle adishavit lazin cppopensourceprojects clockey zhufenggnss vittorioromeo yanlinaung ct-clmsn mindis arun11299 aruanruan sara-27 testmana2 bakjos jdavidberger sailfish009 justlinux2010 idreamsfy turgunyusuf gezidan muhislam xyuan joe-warren shiyezheng aiden-beard buanderie liumorgan deadbrainviv wdshin valentactive ymadzhunkov mingsn91 oralien linecode cmakeezer bigdot123456 longcongduoi johanlundsurf wayland-chen coderaldrich afakihcpr xjtueducation audiobucket cih-y2k jianlirong lxq2537664558 matthew-d-jones tklebanoff yecaokinux songhtdo gccfancier mora-5-love artursg myicefrog avsej gxsaccount crixalis2013 liuxw7 banana513a sunbobby b-xiang 2php aocsa artnoc1 luoyouchun forgetpwd8 junxfl f-tischler letterman9 avmi gt-flexr dengf sudison suxinde2009 dreamplayer-zhang ut-lca iomeone xiehechong bijoumd78 karshunibremen fikarzlf konklein emeraldshift adityaatultewari ameyand98 satanson muse117 arnoxinglei brinkqiang2cpp dreamplayerzhang externalrepositories ajunlonglive some-forks faizol apetcho ayv2004 helloloey blentle

raftlib's Issues

Memory leaks

It seems there are some memory leaks in the library.

valgrind --leak-check=full ./sumapp

Some macro definitions make problems

The header file /usr/local/include/raftinc/kernelpreempt.hpp

defines following macro #define restore( k ) longjmp( k->running_state, 1 )

"restore" is a keyword that anybody could use in their code.
Being defined as a macro can cause some problems later on.

For example when including some boost header files I get following error message:

/usr/include/boost/io/ios_state.hpp:47:11: error: expected identifier before ‘->’ token
void restore()

The code at that line gets messed up with the previous macro
void restore() { s_save_.flags( a_save_ ); }

So maybe Raftlib can define macros in a more unique way, say e.g. with a namespace ? Maybe as
#define RAFTrestore( k ) longjmp( k->running_state, 1 )

add leaky buffer raftmanip stream modifier

So networking pipes have the concept of dropping packets when buffers are full. It seems like this could be used in many streaming data applications. Looks like it would be fairly easy to implement with the stuff I already build for the parallel process forcing (in one of the branches).

Does the peek() function provide any timeout ?

Imagine following scenario:

m += a1 >> c["in1"];
m += a2 >> c["in2"];

c loops ports in1 and in2 to consume the corresponding messages from a1 and a2. Now imagine producer a1 does not produce anything. Even though a1 delivers raft::proceed, c gets stuck waiting for a message from it.
I searched in the documentation but could not find any timeout flag for peek().
How would you recommend proceeding ?

Here a little poc:

/**
 *
 * Proof of concept Raftlib
 *
 * Want to have 1 consumer and 2 producers. The consumer should timeout and continue with next
 * producer if first producer does not produce..
 *
 */
#include <raft>
#include <raftio>


struct big_t
{
    int i;
    std::uintptr_t start;
#ifdef ZEROCPY
    char padding[ 100 ];
#else
    char padding[ 32 ];
#endif
};



/**
 * Producer: sends down the stream numbers from 1 to 10
 */
class A1 : public raft::kernel
{
private:
    int i   = 0;
    int cnt = 0;

public:
    A1() : raft::kernel()
    {
        output.addPort< big_t >("out");
    }

    virtual raft::kstatus run()
    {
        i++;

        if ( i <= 10 ) 
        {
            auto &c( output["out"].allocate< big_t >() );
            c.i = i;
            c.start = reinterpret_cast< std::uintptr_t >( &(c.i) );
            output["out"].send();
        }
        else 
        {
            return (raft::stop);
        }

        return (raft::proceed);
    };
};

class A2 : public raft::kernel
{
private:
    int i   = 0;
    int cnt = 0;

public:
    A2() : raft::kernel()
    {
        output.addPort< big_t >("out");
    }

    virtual raft::kstatus run()
    {
        i++;

        if ( i <= 10 ) 
        {
            NULL; //
            sleep(1);
        }
        else 
        {
            return (raft::stop);
        }

        return (raft::proceed);
    };
};

/**
 * Consumer: takes the number from input and dumps it to the console
 */
class C : public raft::kernel
{
private:
    int cnt = 0;
public:
    C() : raft::kernel()
    {
        input.addPort< big_t >("in1");
        input.addPort< big_t >("in2");
    }

    virtual raft::kstatus run()
    {
        if (cnt % 2 == 0)
        {
            try {
                std::cout << "in1: " << cnt << "\n";
                auto &a( input[ "in1" ].peek< big_t >() );
                std::cout << std::dec << a.i << " - " << std::hex << a.start << " - " << std::hex <<  
                    reinterpret_cast< std::uintptr_t >( &a.i ) << "\n";
                input[ "in1" ].recycle(1);
            }
            catch(ClosedPortAccessException& cpae)
            {
                NULL; // continue
            }
        }
        else
        {
            try {
                std::cout << "in2: " << cnt << "\n";
                auto &a( input[ "in2" ].peek< big_t >() );
                std::cout << "yeap\n";
                std::cout << std::dec << a.i << " - " << std::hex << a.start << " - " << std::hex <<  
                    reinterpret_cast< std::uintptr_t >( &a.i ) << "\n";
                input[ "in2" ].recycle(1);
            }
            catch(ClosedPortAccessException& cpae)
            {
                NULL; // continue
            }
        }
        ++cnt;
        return (raft::proceed);
    }
};

int main()
{
    A1 a1;
    A2 a2;
    C c;

    raft::map m;
    
    m += a1 >> c["in1"];
    m += a2 >> c["in2"];

    m.exe();

    return( EXIT_SUCCESS );
}

add iterators for kernel_pair_t return type

Source and destination iterators should be available to the receiver of a kernel_pair_t object so that fork/join operators will return the right number of kernels.

Nested parallelism

I wonder if this is possible:

A <= (B <= C >> D >=E)

E.g. this could work in this way:
A: reads filenames from a file system and send each to B
B: reads file and sends every record to C
C modifies record and sends to D
D modifies record and sends to E
E uploads records to DB and when all records belonging to a certain file are processed, marks file as done.

So the parallelism is on file level and on record level..

I know in the documentation says "not yet implemented nested split/joins" but just double checking..

data drop / fifo overrun

Are there any plans to offer any thing like just latching in the last value on the ports, or somewhat equivalently, a FIFO that replaces the oldest value when it overflows?

Would be useful when you are processing data live. The specific use case I'm looking at is using it to prototype opencv filters.

More memory leaks

Thanks for fixing the previous leak, now it works fine.
I have found a couple more:

valgrind --leak-check=full ./readtest cmake_install.cmake
valgrind --leak-check=full ./rbzip2 -i alice.txt -o alice.bz

How to add a spy to port ?

Hi, I have the following scenario.
I have 4 kernels that do processing of some sort. Each of them has one input and one output ( first and last are single output/ single input).

 m += a >> b >> c >> d;

I am designing a 'Spy' type of kernels that I want to attach to specific ports and monitor the content that passes though. They will not modify anything, just read the content. Image this will be used for visualization purposes, testing, logging or data dump.

One solution I found is to create an sky kernel and include it in the production chain.

 m += a >> spy >> b >> c >> d;

However, this has the disadvantage that I have to edit the production chain code.

I would much more like to write something like:

spy.attach( a, b);

Is there any way I can do this with raftlib ? Can you provide an example. Thank in advance.

Compiling RaftLib for Windows

Currently RaftLib is using the Makefile system, which works well for Linux and OS X. However, I also need to have RaftLib available in Windows, and available for Visual Studio projects.

At the moment, how can I compile RaftLib in such a way that it can be readily used by Visual-Studio-based projects? If I compile using cygwin Makefile system, then can the library still be usable by Visual Studio?

In the long run, would it be a good idea to migrate RaftLib to Cmake, so that it can automatically generate build scripts for any systems?

commit / cleanup / test networking code

Self explanatory. There's a prototype that was built in the course of my thesis....but I need to make sure that it is extensible. Far too often I find code unreadable, messes of spaghetti code...I don't want that in this project.

Suggestion : Dynamic Pipelines

Hello,
I have pipeline defined in textfile like OperationA para1 para2|OperationB para1 para2 para3|OperationC para1 where OperationA takes 2 input parameters , OperationB takes 3 parameters

(I have already defined these operations as functions, all these functions have same returntype )

Can someone provide me pointers, how can I read this from text and create a pipeline dynamically in code for this.

(The operation names can vary , depending upon whats defined in text file)

force thread/process/pool by kernel

In the RaftLib C++Now tutorial (full slide deck here: http://www.jonathanbeard.io/pdf/cppnow2016.pdf) I proposed this syntax for partitioning a VM space:

Seems intuitive at first, however when you have multiple ports on a single compute kernel then you begin to have issues (i.e. you need to specify the same thing on multiple stream graph edges). So I'm looking at something that works more like this:

raft::map m;

raft::kernel a,c;
/** all of b forced to a single process, call acts as a decorator **/
auto b_proc( raft::parallel::process( b ) );
m += a >> b_proc >> c;
m.exe();

The end goal is of course to be able to force a few kernels to a specific type of resource (i.e., isolated process, thread, etc.). This isn't normally necessary, however, it sometimes comes up when building applications that utilize special IO devices or those that have strange VM behavior.

Any thoughts?

stream persistency, logging/restart

Given that many "big-data" frameworks use some sort of RDD, I think it's time once we get the beta release out to add in selectable persistence and restart zones (won't be as naive as most of the Apache Spark/Storm resilience methods) that are similar to HPC fast checkpoint/restart mechanisms. Currently most frameworks use methods that unnecessarily stress the IO system. The exact implementation is TBD, however I'd like input on how the programmer should specify critical persistency points within the application.

For the example below...we have critical specified however it could be some other descriptor. I'm thinking three would do, to specify low, moderate, critical on restart point. That way the runtime can use the data rates on the edges to dynamically adjust the interval time based on the desired checkpointing criticality.

raft::kernel a,b;
raft::map m;

m += a >> raft::checkpoint::critical >> b;
m.exe();

Would love some input though.

Missing include

On the dev branch, raftinc/defs.hpp is missing an #include <memory> for std::unique_ptr. This causes compilation to fail with g++ 6.4 on Linux (Debian sid).

Need a more real-life-like example

All examples provided include a kernel that is part of the library distribution, e.g. raft::random_variate(), raft::print(), raft::filereader(), etc...

Having to check the source code to know how to build a producer shouldn't really be the way to go.

It would be good to have one or (preferable) more examples showing how to create all kind of kernels..

E.g. like file attached...

g++ -g -o poc poc.cpp -lraft -pthread

poc.zip

edited to fix link - jcb 3 July 2017

Proper behavior for output port being sent to two input ports

I'm not really sure from the documentation what the expected behavior is for this, but it's crashing right now so that likely isn't it.

If you have one output node and you connect it to two different kernels; something like:

m += a >> b; m += a >> c;

what should the behavior be for this construct? My thinking on this was that it should duplicate the stream; it seemed like splitting (send some to B, some to C) had a more explicit syntax.

I have a test case written up for this here: master...jdavidberger:duplicateTest. It also has an assert setup where the UB is that causes the crash.

I suspect that this is already somewhat on the radar, the thing I really wanted to know here is if the behavior I expected was the actual target behavior. If it is, I can also PR the testsuite addition.

For now I'm just using an explicit block that duplicates all inputs to n which works as expected.

One input split in several outputs

Hey Jonathan, hope you're doing well..

Say we have A >> B >> C
A sends one message to B.
B then splits this message into several hundreds of other messages that needs to send to C.

The problem I face is the following:

If I return raft::proceed after B sends a message to C, then the framework does not start B again (because B already consumed the input message from A)

If B sends all messages to C in a loop without returning raft::proceed it might happen that the queue gets saturated with so many messages.

So what is the best way to proceed in this scenario ?

Thanks in advance..

add diff type stream

Transferring large items is well inefficient. A simple way to get around this is to pass a pointer, but now each worker kernel can modify that location. It's also not immediately obvious to the runtime that you can't send that pointer to another networked location (which won't work). A solution would be to add a non-pointer type that saves only diff changes that worker kernels make to the local "copy" then transmit the "patch" of the diff and the copy only when going over network links. You could also pre-transmit the bulk of the data then transmit the "patch" as it's ready, minimizing latency for networks.

OpenCV / Surveyor pattern / Nanomsg

Hi guys,

Hope you are all well !

I was checking out your cool library for a project I am working on based on OpenCV and Dlib.

In a nutshell, I would like to create a graph of distributed pre and post processing tasks on a stream web-camera image on iOS . In a nutshell, I would like to use RaftLib as in your example with openCV, and add a layer the surveyor protocol from the nanomsg library, to execute a flow like the following:

Events flow example:

Skip frame, execute chain of processes every 3 frames
Resize frame
Filter frame, eg: if the image is blurry, stop the chain if not valid quality
Rotate frame
Mirror frame
Distributed Processing frame
- dlib's faciallandmark detection process (timeout 1s)
- inverted visual dictionary (simple bag of words based on Features2d module) (timeout 1s)
Processing locks
- if face detected stop the pre-processing steps, and block request for marker detection.
- if a sub-process return no results, return to the default processing flow.

Questions:

Would it be possible to do such processing flow with RaftLib ?
Is it possible to couple Raftlib with puffin-buffer and puffin-stream to aggregate results ?
As I saw that Raftlib is really performance oriented, what would be the performances bottlenecks in such flow based processing ?

Refs:

http://www.learnopencv.com/speeding-up-dlib-facial-landmark-detector/
https://github.com/nanomsg/nanomsg
https://github.com/nanomsg/nng (Incoming new version of nanomsg)
https://github.com/daniel-j-h/DistributedSearch (Distributed Search based on the nanomsg suveyor pattern)
https://github.com/designerror/puffin-buffer#example
https://github.com/designerror/puffin-stream#example

Have a great day !

Cheers,
Richard

qthreads

Convert qthreads to cmake, add in pool scheduler....the basic idea is to use qthreads for user-space threading vs. re-coding assembly every time we move to a new platform. Multiple qthreads can run inside a single kernel thread. Will need to re-examine the exception/ (OS) signal mechanism once this happens and evaluate if there need to be changes.

This might also open up another avenue of research/development as it creates another degree of freedom when scheduling. Do we put a kernel inside a kernel thread, or are there any advantages in leaving it as a heavy weight thread? Right now the idea is to keep heavily communicating kernels highly local so they share as much info via cache as possible.

Need to support non-POD pointers such as smart pointers

RaftLib is an actor-oriented framework for stream processing, and one of the most important applications are video stream processing, which uses data structures such as cv::Mat from OpenCV or similar data structures, that use non-POD pointers such as smart pointers within the structure.

However, as illustrated in Issue #4, currently non-POD pointers are not supported and the presence of such in objects in queues will cause segfault.

This makes raftlib unusable in any substantial applications that involves complex data structures, and is a shopstopper for RaftLib. Therefore we need to add support of non-POD pointers to RaftLib as top priority

What's the difference between returning raft::proceed and not.

Imagine 2 scenarios:

a. Producer sends an object down the stream
b. Sleeps 10 seconds
c. Returns raft::proceed
d. The library calls my producer again
e. Back to (a)

a. Producer in a loop while(!terminate)
b. Sends an object down the stream and sleeps 10 seconds.
c. Back to (a)

Is there any difference between 1 and 2 ?

make link syntax less clunky, more c++ like

Issue

I've had several requests to make the syntax for linking compute kernels less clunky and more C++ like. That means keeping the map::link syntax but also supporting something with a bit more polish.

Potential Solution

int
main( int argc, char **argv )
{  
   //generic random kernel instantiations
   kernel f;
   kernel g;
   //explicit declaration of map for example, potentially keep hidden one for compatibility
   map m;
   //in order version
   m >> f >> "a"_o >> "b"_i >> g;
   //out order version
   m >> f >> "a"_o >> raft::ooo >> "b"_i >> g;
   m.exe();
   return( EXIT_SUCCESS );
}

Exception when unused output port but segmentation fault when unused input port

Test done on sumapp.cpp example:

if I add one output port
output.addPort< T >( "sum", "test" );

as expected I get an exception:

terminate called after throwing an instance of 'AmbiguousPortAssignmentException'
what(): One port expected, more than one found!
Output port from source kernel (Sum) has more than a single port.

but if I add another input port:
input.addPort< T > ( "input_a", "input_b", "test" );

I get a segmentation fault.

Not sure if that is the right behavior?

Consider documenting comparisons with Intel TBB Flow Graph

Segmentation fault when no input port available

On the example you sent me if we comment out the input port on the consumer, we get a segmentation fault. It would be nice to have an exception instead..

/**
 *
 * Proof of concept Raftlib
 *
 * Want to have a 3 kernels stream which produces and sends 10 numbers down the stream.
 * The 10 numbers should be created by the first kernel and destroyed by the last.
 *
 */
#include <raft>
#include <raftio>


struct big_t
{
    int i;
    std::uintptr_t start;
#ifdef ZEROCPY
    char padding[ 100 ];
#else
    char padding[ 32 ];
#endif
};



/**
 * Producer: sends down the stream numbers from 1 to 10
 */
class A : public raft::kernel
{
private:
    int i   = 0;
    int cnt = 0;

public:
    A() : raft::kernel()
    {
        output.addPort< big_t >("out");
    }

    virtual raft::kstatus run()
    {
        i++;

        if ( i <= 10 ) 
        {
            auto &c( output["out"].allocate< big_t >() );
            c.i = i;
            c.start = reinterpret_cast< std::uintptr_t >( &(c.i) );
            output["out"].send();
        }
        else
        {   
            return (raft::stop);
        }

        return (raft::proceed);
    };
};

/**
 * Consumer: takes the number from input and dumps it to the console
 */
class C : public raft::kernel
{
private:
    int cnt = 0;
public:
    C() : raft::kernel()
    {
        //input.addPort< big_t >("in");
    }

    virtual raft::kstatus run()
    {
        auto &a( input[ "in" ].peek< big_t >() );
        std::cout << std::dec << a.i << " - " << std::hex << a.start << " - " << std::hex <<  
            reinterpret_cast< std::uintptr_t >( &a.i ) << "\n";
        input[ "in" ].recycle(1);

        input[ "in" ].recycle(1);
        return (raft::proceed);
    }
};

int main()
{
    A a;
    C c;

    raft::map m;
    
    m += a >> c;

    m.exe();

    return( EXIT_SUCCESS );
}

Question about the map of kernels

Hi Jonathan, hope you're doing fine.

Having following random map:

a --> b --> c    // I.e.: a >> b >> c
        +-> d    // I.e.: b >> d
  +-> e --> f    // I.e.: a >> e >> f

a is the unique source and c, d and f are three different destinations that receive messages from a.

Does the library provide any way for a to know all its destinations ? I.e.: c, d and f

What I want to do is, whenever c, d and f are ready with certain messages (e.g. "commit"), then a can start sending the next batch of messages.

Thanks in advance...

stream consistency modifiers

We really don't always need perfect FIFO behavior....performance wise we can improve perf quite a bit if we simply add a stream modifier that enables specification of ordering then more performance can be had, especially for the MPMC type FIFOs. I've talked in the past about having a "fuzzy" or "approximate" stream type.

I'd like to come up with a range for this one that specifies everything from normal FIFO behavior to something that looks like a unordered list all the way to a FIFO could potentially do unsafe things like overwrite valid values in the FIFO if the application can tolerate it (the trade-off being better performance for lower quality data integrity).

namespace raft
{
namespace consistency
{
    enum type : manip_vec_t { 
      seq_cst /** sequentially consistent, default **/,
      relaxed /** no ordering constraint, only atomicity and validity of data from kernel a -> b **/,
      drop_mrt /** drop most recently transmitted, use it to overwrite if queue full **/,
      drop_random /** randomly overwrite elements in queue if full **/,
      approximate  /** this could mean approximate floating points, compression, etc., **/
    };
} /** end namespace consistency **/
} /** end namespace raft **/

raft::kernel a,b;
raft::map m;
/** don't care about ordering between a->b only that the messages get there **/
m += a >> raft::manip< raft::consistency::relaxed >::value >> b;
m.exe();

One design point for things like approximate is the best interface to return the epsilon of approximation for the FIFO. One thought is to create a std::numeric_limits like interface to extract it from the port. The other is an equally okay port function that returns the epsilon as a floating point number...and zero if not approximate type.

Mistakenly reports "C++11 alignas unsupported" with g++ 6.3

It seems that __alignof_is_defined isn't defined despite alignof being supported. This causes a truckload of warnings.

Since the ALIGNOF macro is never used, is there any reason not to remove it?

I'm using g++ 6.3 on Debian unstable, if that's helpful. No messages are emitted when using clang++ 4.0.

Question: Supporting dynamic changes to the topology once the runtime is started?

The use case I am thinking of relates to demultiplexing media transport streams (MPEG Transport Streams).

In this scenario an MPEG TS multiplexed stream can contain multiple elementary video and audio streams (utilising various codecs) that can appear or disappear as the multiplexed stream progresses.

As an example, in case a new audio stream is discovered it would be desirable to add an output on the demultiplexer kernel and connect a new decoder kernel to this output. Similarly to remove kernels when an elementary stream disappears.

break into multiple packages

So I had a great suggestion over the weekend to break the RaftLib framework into multiple packages, that can be downloaded,compiled, and integrated as needed. This could take the form of a cmake that selectively compiles pieces, a bootstrap like solution that build pieces, or even a "package manager" of sorts that builds and installs the matching dependencies.....if there are any thoughts feel free to comment.

seg fault -> bug report via e-mail

abc@abc-virtual-machine:/mnt/hgfs/svn/visionframework/examples$ gdb
./testraft10GNU gdb (Ubuntu 7.10-0ubuntu1) 7.10
Copyright (C) 2015 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/.
Find the GDB manual and other documentation resources online at:
http://www.gnu.org/software/gdb/documentation/.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./testraft10...done.
(gdb) run
Starting program: /mnt/hgfs/svn/visionframework/examples/testraft10
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fffdfa4e700 (LWP 9071)]
[New Thread 0x7fffdf24d700 (LWP 9072)]
[New Thread 0x7fffde6c8700 (LWP 9073)]
[New Thread 0x7fffddec7700 (LWP 9074)]
[New Thread 0x7fffdd6c6700 (LWP 9075)]
[New Thread 0x7fffdcec5700 (LWP 9076)]
[New Thread 0x7fffcffff700 (LWP 9077)]
[New Thread 0x7fffcf7fe700 (LWP 9078)]
[New Thread 0x7fffceffd700 (LWP 9079)]
[New Thread 0x7fffce7fc700 (LWP 9080)]
[New Thread 0x7fffcdffb700 (LWP 9081)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffceffd700 (LWP 9079)]
0x00000000004053ae in __gnu_cxx::__exchange_and_add (__mem=0x81c, __val=-1)
at /usr/include/c++/5/ext/atomicity.h:49
49 { return __atomic_fetch_add(__mem, __val, __ATOMIC_ACQ_REL); }
(gdb)
(gdb) backtrace
#0 0x00000000004053ae in __gnu_cxx::__exchange_and_add (__mem=0x81c, __val=-1)

at /usr/include/c++/5/ext/atomicity.h:49

#1 0x0000000000405445 in __gnu_cxx::__exchange_and_add_dispatch (__mem=0x81c,

__val=-1) at /usr/include/c++/5/ext/atomicity.h:82

#2 0x000000000040a395 in

std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release
(this=0x814) at /usr/include/c++/5/bits/shared_ptr_base.h:147
#3 0x000000000042c042 in

std::__shared_count<(__gnu_cxx::_Lock_policy)2>::operator=
(this=0x7fffd00054d8, __r=...)
at /usr/include/c++/5/bits/shared_ptr_base.h:678
#4 0x000000000042800d in std::__shared_ptr<cv::Mat,

(__gnu_cxx::_Lock_policy)2>::operator= (this=0x7fffd00054d0)
at /usr/include/c++/5/bits/shared_ptr_base.h:867
#5 0x0000000000428037 in std::shared_ptrcv::Mat::operator= (

this=0x7fffd00054d0) at /usr/include/c++/5/bits/shared_ptr.h:93

#6 0x0000000000428061 in Foocv::Mat::operator= (this=0x7fffd00054d0)

at testraft10.cpp:33

#7 0x0000000000429c1e in RingBufferBase<Foocv::Mat,

(Type::RingBufferType)0>::local_push (this=0x7fffd0001640,
ptr=0x7fffceffca30,
signal=@0x7fffceffc9dc: raft::none)
at /usr/local/include/raft_dir/ringbufferheap.tcc:576
#8 0x000000000040876c in FIFO::push<Foocv::Mat&> (this=0x7fffd0001640,

item=..., signal=raft::none) at /usr/local/include/raft_dir/fifo.hpp:202

#9 0x000000000042b31a in OneToMany<cv::Mat, 2ul>::run (this=0x6c77d0)

---Type to continue, or q to quit---
at testraft10.cpp:91
#10 0x0000000000439382 in Schedule::kernelRun(raft::kernel*, bool

volatile&, __jmp_buf_tag () [1], __jmp_buf_tag () [1]) ()
#11 0x000000000043981c in simple_schedule::simple_run(void*) ()
#12 0x00007ffff6b386aa in start_thread (arg=0x7fffceffd700)

at pthread_create.c:333

#13 0x00007ffff6656eed in clone ()

at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

(gdb)

0.pdf

Only first of many parallel kernel stages is executed.

I have a pipeline laid out something like this:

map += videoSource <= (convertColor >> resize) >= videoSink;

Where videoSource is a multi-port output, convertColor and resize are all 1-1 input-to-output setups that are clonable and videoSink is a parallel_k fan-in stage.

For some reason, when I run my application only videoSource, convertColor and videoSink are run in this configuration. If I use only convertColor or resize individually, they are each run but if I try to use any two together for the joined parallel kernel, only the first is run.

Am I using the incorrect syntax or missing something else?

Build python interface

Requested from HPCC, build python interface

Addressing Feedback from CppCon presentation proposal

I'm looking for some good feedback, on perhaps the hardest part of any open source software project....documentation.

One review in particular is a bit worrisome for me, since the library is intended to be "intuitive" and easy to use. It was also the only non-accept rating, being "borderline." From the RaftLib org members (and the web in general) are there any improvements we can make to the documentation (either the raftlib.io front page, the wiki, or through more blogposts perhaps) to make RaftLib easier to use and understand from the start? The review in question is below and please respond directly via comments (as always, be respectful and constructive). Thanks!!

I was disappointed in the level of description, examples, etc in the Raftlib information I found. Given that I'm at least somewhat familiar with the topic I expected that I would "get" it right away. It was harder than I thought it should be. Also there is no mention of other approaches - HPX for example. I think I know what he means by "Streams" but maybe I don't. Of course this is partly due to the usage of "stream" in the standard library which has a clear meaning - but I'm hoping the authors "Streams" aren't related. There are a number of data flow implementation methods that I'm thinking are more promising - in particular the Standard library ranges proposal in development. I have the feeling that this is a work in progress rather than something that programmers are expected to be able to use right now.

what is `#include <cmd>`?

source : examples/general/pi/pisim.cpp

error detail:
RaftLib-RaftLib-d6e6fa3/examples/general/pi/pisim.cpp:29:15: fatal error: cmd: No such file or directory
#include

I can't find any about cmd info.

Parallelization with more than one port

Hey Jonathan,
hope your 2018 is going well.

I have a quick question about parallel streams. Say I want to have following configuration:

 m += a <= b ["processOne"]=> c
 m +=      b ["processTwo"]=> d

Then the port processOne is multiplied (e.g. processOne0, processOne1,, etc) as many times as threads in c and the same for processTwo and d.

As I understand the current framework does not support this. Is there any specific reason why it is not supported now? And what do you think would be the efforts for introducing something like this (i.e. do you think I can implement it myself) ?

Thanks a lot for your help..

A bit more detailed documentation needed

I know you guys are focused on bringing more features to the lib, but it would be good to provide a bit better documentation and examples (I sent another post about this), so the ones willing to use the current version can fully understand what they can do with it..

E.g.

I am not sure how to use signals, I can see raft::fileread uses raft::eof, but not sure what is done with it.
I don't see any documentation for the method send()
not sure how to use allocate() with e.g. int
How to know what are the built-in out-of-the-box kernels ? e.g. raft:fileread, raft:.print, etc..
...

integrate hwloc

Need to build hwloc -> scotch arch topology within the src/partition_scotch.cpp file to get real topology info.

The exact point hwloc needs to integrate is here (starts at line 188)

#ifndef USE_HWLOC      
   //TODO add hwloc topology call
   //add version of call that uses a tree of the hardware from hwloc
   //might need format conversion
   if( SCOTCH_archCmplt( &archdat, cores /** num cores **/) != 0 )
   {
      /** TODO, add RaftLib Exception **/
      std::cerr << "Failed to create architecture file\n";
      exit( EXIT_FAILURE );
   }
#else

How to "ZeroCopy" from source to destination ?

Hi there,

Great job with the lib.
Let's say I have A >> B >> C. Is it possible to allocate data on A and just release it on C ?
I don't seem to understand how to do it ..

Thanks..
PS: in a different post I provided a "poc.cpp" where I show (more or less) what I want to do ..

raft::kset syntax

I want to make building complex topologies simpler than it currently is. Specifically, I want to be able to do something like the following:

raft::map m;
raft::kernel src,dst;
raft::kernel a, b, c, d;
m += src <= raft::kset( a, b, c, d ) >= dst;

This means that there are links as follows:

src -> a
src -> b
src -> c
src -> d
a -> dst
b -> dst
c -> dst
d -> dst

This syntax seems fairly natural to me, but I wanted to get input from a larger audience before building it into the run-time. I should also note that this syntax addition does not modify any of the current semantics.

-Jonathan

Remove dependency on boost

Currently the entire projects depends on boost library. It looks like the only functionally that is used from boost is

    boost::core::demangle

The boost demangle is just a thin wrapper of include <cxxabi.h>. Removing dependency on boost will allow RaftLib to be easy to integrate. Currently, if boost 1.59 is not installed ( for example I have 1.58 on Ubuntu 16.04) the boost submodules are initialized. In our company we have strict 'no submodule in submodule' policy. So in order to remove boost dependency, I propose one of the following:

Replace boost demangle with RaftLib Demangle
Instead of reporting demangled name in exception, report mangled name.

I know that in most systems installing boost is not a big deal, but image I have an embedded computer like raspberry pi and only 4GB of disk space.

about running the examples

hi, I clone all the project from the github and compile the project, but I still can't make the examples, I want to know if the examples don't match the api now?

bad alloc exception instead of own library exception when no output ports

E.g. in sumapp.cpp :

input.addPort< T > ( "input_a", "input_b");
// output.addPort< T >( "sum" );

The error message:

terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc

Expected is to have an exception from the library itself pointing out that the problem is the missing output port.

broken link in the website

you forgot the i of wiki in the doc link

prefetch buffer control

Is there anyway to control the port buffer size? For example. I have one kernel to download data and another kernel to process the data. Downloading (prefetching) is much faster than processing which uses a lot of disk space. Is there a way to control the prefetched (port) buffer size and halt the downloading kernel when port buffer is full?

C++17 support for new of over-aligned types can be enabled in other modes with the -faligned-new flag
-Waligned-new has been added to the C++ front end. It warns about new of type with extended alignment without -faligned-new