Coder Social home page Coder Social logo

substantic / rain Goto Github PK

View Code? Open in Web Editor NEW
740.0 44.0 52.0 11.9 MB

Framework for large distributed pipelines

Home Page: https://substantic.github.io/rain/docs/

License: MIT License

Rust 57.48% Cap'n Proto 1.48% Shell 0.16% HTML 0.23% CSS 0.11% JavaScript 0.95% Python 30.73% CMake 0.25% C++ 3.25% Dockerfile 0.12% TypeScript 5.25%
pipelines distributed-computing python-api rust workflows

rain's People

Contributors

aimof avatar aschampion avatar gavento avatar geordi avatar hdhoang avatar kobzol avatar spirali avatar theironborn avatar vojtechcima avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rain's Issues

CI fails because of root access

The recent version added more "safe" read-only mapping data objects to files via unix file permissions. However it seems that our CI runs as root and it can modify files even it has not write permission. This leads to failing tests on CI server. I propose to create a normal user in the container and switch to this user before running the tests.

Improved live and post-mortem monitoring

Improve the online monitoring to:

  • Show session list summary with more info
  • Show more statistics and graphs, e.g. aggregation by task type or group (proposed in #32)
  • Show a timeline by groups or with significant (all or e.g. named?) events.
  • Only display the dependency graph optionally and for reasonably small graphs
    • As a future improvement, we could display the dependency graph of groups, clusters, task types etc.)
  • See the full details of every task and object (on demand)
  • Examine history of the events in some way (a timeline may be enough)
  • Do all of this for finished and closed sessions via re-running the server with the sa

This depends on a streaming API (#11) and may require a complete UI framework.

Capnp message size caps limits one submit size

Capnp limits builders to 8M words (64 MB) which bounds the size of submitted nodes to cca 200 000. This limit is fixed in message.h since Builder does not accept any options and uses the default limit.

๐ŸŒง  INFO 2018-03-17T14:53:11Z Starting local server (0.0.0.0:7210)
๐ŸŒง  INFO 2018-03-17T14:53:11Z Dashboard: http://lomikamen:8080/
๐ŸŒง  INFO 2018-03-17T14:53:11Z Server pid = 95230
๐ŸŒง ERROR 2018-03-17T14:53:11Z Process 'server' terminated with exit code exit code: 101; process outputs can be found in server.{out/err}
๐ŸŒง  INFO 2018-03-17T14:53:11Z Error occurs; clean up started processes ...
Expect 16384 objects of size 32768, total 512.000 MB
Traceback (most recent call last):
  File "scalebench.py", line 61, in <module>
    session.submit()
  File "/aux/gavento/rtest/rain/python/rain/client/session.py", line 164, in submit
    self.client._submit(self._tasks, self._dataobjs)
  File "/aux/gavento/rtest/rain/python/rain/client/client.py", line 82, in _submit
    req.send().wait()
  File "capnp/lib/capnp.pyx", line 1932, in capnp.lib.capnp._RemotePromise.wait (capnp/lib/capnp.cpp:41109)
capnp.lib.capnp.KjException: src/capnp/rpc.c++:1527: context: sending RPC call; callBuilder.getInterfaceId() = 14056942509132787212; callBuilder.getMethodId() = 3
capnp/rpc-twoparty.c++:92: failed: expected size < ReaderOptions().traversalLimitInWords; size = 12047798; Trying to send Cap'n Proto message larger than the single-message size limit. The other side probably won't accept it and would abort the connection, so I won't send it.
stack: 0x7ffff66aacd6 0x7ffff66a88c0 0x7ffff66a7580 0x7ffff663d0e9 0x7ffff661e9f3 0x7ffff661eb1d 0x7ffff661ef4b 0x5555556d4411 0x5555556d493f 0x5555556d493f 0x5555556d9286 0x5555556d9f9f 0x5555557a78f2 0x5555557a9e1d 0x5555557aa5be 0x5555557d84d7 0x555555668c01 0x7ffff6cee2b1 0x5555557721ba

error in when installing rain_server with cargo

@ALL

hi all:

I install rain_server with cargo install -v rain_server, but can not install rain_server successfully.

error as showing below

error: failed to run custom build command for `rain_core v0.3.0`
process didn't exit successfully: `/var/folders/y3/xb_8fx894lq_89d7kv0k3gq00000gp/T/cargo-install.LarbQVDOQLHm/release/build/rain_core-211500adc0948f2e/build-script-build` (exit code: 101)
--- stderr
thread 'main' panicked at 'schema compiler command: Error { kind: Failed, description: "Error while trying to execute `capnp compile`: Failed: No such file or directory (os error 2).  Please verify that version 0.5.2 or higher of the capnp executable is installed on your system. See https://capnproto.org/install.html" }', src/libcore/result.rs:906:4
note: Run with `RUST_BACKTRACE=1` for a backtrace.

warning: build failed, waiting for other jobs to finish...
error: failed to run custom build command for `capnp-rpc v0.8.3`
process didn't exit successfully: `/var/folders/y3/xb_8fx894lq_89d7kv0k3gq00000gp/T/cargo-install.LarbQVDOQLHm/release/build/capnp-rpc-9631959463f9bb8f/build-script-build` (exit code: 101)
--- stderr
thread 'main' panicked at 'capnp compile: Error { kind: Failed, description: "Error while trying to execute `capnp compile`: Failed: No such file or directory (os error 2).  Please verify that version 0.5.2 or higher of the capnp executable is installed on your system. See https://capnproto.org/install.html" }', src/libcore/result.rs:906:4
note: Run with `RUST_BACKTRACE=1` for a backtrace.

warning: build failed, waiting for other jobs to finish...
error: failed to compile `rain_server v0.3.0`, intermediate artifacts can be found at `/var/folders/y3/xb_8fx894lq_89d7kv0k3gq00000gp/T/cargo-install.LarbQVDOQLHm`

Caused by:
  build failed

cargo version :

cargo -V
cargo 0.23.0 (61fa02415 2017-11-22)

Is there something wrong when using cargo install rain_server?

Robust handling of worker and subworker crashes

Currently a crash of a subworker may crash a worker, and a crash of a worker may crash the server. We need to improve this. However, we are not aiming for infrastructure resiliency now. Subworker crash may still fail the task (and so also the session) and worker crash may still lose all the objects and fail all involved sessions. The main goal is to keep the server running and deliver a graceful error.

A robust failure handling will open up the road to retrying tasks (possibly on different workers) and later to worker crash resiliency.

Add stop command to server

The start command conveniently launches a cluster (server + workers), but it makes it a bit cumbersome to stop the cluster.

For a short term, hacky solution, a simple killall on Linux could solve it, or the start command could run the server in-process, so that it would block while the server lives.
But in general I think that the server should have a more general and graceful option to stop itself, defined in its API.

Proposal:
The stop event would stop all the connected workers and then shutdown the server.
The command could look like this:
rain stop server:1234
It would probably make sense to stop individual workers too, possibly with the same command and a --worker flag.

The ability for clients to shutdown the server is a non-issue from a security standpoint, since there is already no client authentication and the cluster supposes that it's run in a trusted environment.

There could also be an option for a "soft stop", that would cause the server to stop scheduling tasks, but it would first wait for the current in-progress tasks to finish (and maybe checkpoint them to disk) before stopping the cluster.

rain start --simple

Process 'server' terminated with exit code exit code: 101; process outputs can be found in server.{out/err}

Add task/data object groups and names, clarify names

Introduce optional task/data object groups and names for better monitoring and possibly debugging. Currently, the data objects have a label indicating the role it plays in the producer task (e.g. log, score, ...).

Introduce:

  • task and data object name - arbitrary string, optional, to be set for any tasks the user wants to distinguish later by their name. May not be unique in the session but is intended to be unique. Always set by the user.

    • Goal: be able to distinguish special tasks in the graph and to query them.
    • Importance: low
    • Perhaps we need to think about possible uses and name vs label on objects (not to be confusing)
  • task and data object groups - a textual label splitting the nodes into disjoint groups (including the None group).

    • Goal: to be able to monitor statistics of different task/DO groups, meaningful progress reporting and stats aggregation
    • Advantage: easy to report compact per-group aggregates, even store them in the log.
    • Variant: add arbitrary tags, aggregate by tags combination. Drawbacks are added complexity (to many tag combinations) and less monitoring meaning (the stats no longer add to all nodes).

Python API for directories

Directories - Python API

This is a proposal for directory manipulation API. Directory can be considered as a tarball blob with content type "dir". The only special "magic" happens when such data object is mapped to a file system,
then it is unpacked as a directory. It also works in the opposite direction, when a data object of content type "dir" is collected from the file system, it has to be directory (in contarst to a file in case of non-dir content type). Of course usual optizations for moving dataobject inside a worker is applied, hence
there is no obsolete packing/unpacking to from/tar tar when it is not needed.

Constructor of constant data

The wrapper over blob (as for example 'pickled') that creates a constant data object:

directory(path="/abc")  # Creates a data object from client's local directory
directory(dict={"fileX": b"xxxx", "fileY": b"yyyy" })  # Creates a directory dataobject from "dict" description.

Getting subparts

Now let "d" be a directory dataobject, then following tasks creates a new dataobject from a file/directory inside "d".

tasks.get_path(d, "subdir/file1.txt", content_type="json")
tasks.get_path(d, "subdir", content_type="dir")

I propose to define a new term "fragment" that is a pointer inside a directory. The following code creates a fragment:

x = d.get_pah("subdir/file1.txt", content_type="json")

Fragment can be used as dataobject, e.g.:

t = tasks.concat((x, x))

But creating a fragment does not generate a task, the task consuming fragment directly gets a subpart of the original object. (It utilizes to name a subdir in inputs of a task).

Write method

The part of the proposed API is also "write" method on DataIntance that writes an object to a local file system. If data object is normal blob, it creates a file, if it is a directory it is extracted as a directory to filesystem.

d.fetch().write("/path")

Make directory

Task that creates directory from given data objects:

tasks.make_directory({"subdir": {"file1": a }})

Crash if empty stdout captured

The following code crashes rain on Ubuntu 17.10:

# crash.py
from rain.client import Client, tasks, blob

client = Client("localhost", 7210)

with client.new_session() as session:
    task = tasks.execute(["echo", "-n"], stdout=True)
    task.output.keep()

    session.submit()
    print(task.output.fetch().get_bytes())  # Should print b""
$ ./rain start --simple && python crash.py
$ ./rain start --simple && python crash.py
๐ŸŒง  INFO 2018-04-09T09:51:14Z Starting Rain 0.2.0
๐ŸŒง  INFO 2018-04-09T09:51:14Z Log directory: /tmp/rain-logs/worker-cmp-12356
๐ŸŒง  INFO 2018-04-09T09:51:14Z Starting local server (0.0.0.0:7210)
๐ŸŒง  INFO 2018-04-09T09:51:14Z Dashboard: http://cmp:8080/
๐ŸŒง  INFO 2018-04-09T09:51:14Z Server pid = 12357
๐ŸŒง  INFO 2018-04-09T09:51:14Z Process 'server' is ready
๐ŸŒง  INFO 2018-04-09T09:51:14Z Starting 1 local worker(s)
๐ŸŒง  INFO 2018-04-09T09:51:14Z Process 'worker-0' is ready
๐ŸŒง  INFO 2018-04-09T09:51:14Z Rain started. ๐ŸŒง
Traceback (most recent call last):
  File "crash.py", line 10, in <module>
    print(task.output.fetch().get_bytes())  # Should print b""
  File "/home/user/anaconda3/lib/python3.6/site-packages/rain/client/data.py", line 86, in fetch
    return self.session.fetch(self)
  File "/home/user/anaconda3/lib/python3.6/site-packages/rain/client/session.py", line 232, in fetch
    return self.client._fetch(dataobject)
  File "/home/user/anaconda3/lib/python3.6/site-packages/rain/client/client.py", line 125, in _fetch
    result = req.send().wait()
  File "capnp/lib/capnp.pyx", line 1932, in capnp.lib.capnp._RemotePromise.wait (capnp/lib/capnp.cpp:41109)
capnp.lib.capnp.KjException: capnp/rpc.c++:2092: disconnected: Peer disconnected.
stack: 0x7f6787fa9840 0x7f6787fa8140 0x7f6787f39db7 0x7f6787f23f03 0x7f6787f24027 0x7f6787f2440b 0x55751bc9501a 0x55751bd22bec 0x55751bd4719a 0x55751bd1c7db 0x55751bd22cc5 0x55751bd4719a 0x55751bd1c7db 0x55751bd22cc5 0x55751bd4719a 0x55751bd1c7db 0x55751bd22cc5 0x55751bd4719a 0x55751bd1d529 0x55751bd1e2cc 0x55751bd9aaf4 0x55751bd9aef1 0x55751bd9b0f4 0x55751bd9ec28 0x55751bc6671e 0x7f67893411c1 0x55751bd4dc98

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "crash.py", line 10, in <module>
    print(task.output.fetch().get_bytes())  # Should print b""
  File "/home/user/anaconda3/lib/python3.6/site-packages/rain/client/session.py", line 108, in __exit__
    self.close()
  File "/home/user/anaconda3/lib/python3.6/site-packages/rain/client/session.py", line 116, in close
    self.client._close_session(self)
  File "/home/user/anaconda3/lib/python3.6/site-packages/rain/client/client.py", line 163, in _close_session
    self._service.closeSession(session.session_id).wait()
  File "capnp/lib/capnp.pyx", line 1932, in capnp.lib.capnp._RemotePromise.wait (capnp/lib/capnp.cpp:41109)
capnp.lib.capnp.KjException: capnp/rpc.c++:2092: disconnected: Peer disconnected.
stack: 0x7f6787fa8140 0x7f6787f39db7 0x7f6787f23f03 0x7f6787f24027 0x7f6787f2440b 0x55751bc9501a 0x55751bd22bec 0x55751bd4719a 0x55751bd1c7db 0x55751bd22cc5 0x55751bd4719a 0x55751bd1c7db 0x55751bd22cc5 0x55751bd4719a 0x55751bd1ce4b 0x55751bc9539f 0x55751bc99ff3 0x55751bc951bb 0x55751bcb340d 0x55751bd48d22 0x55751bd1d529 0x55751bd1e2cc 0x55751bd9aaf4 0x55751bd9aef1 0x55751bd9b0f4 0x55751bd9ec28 0x55751bc6671e 0x7f67893411c1 0x55751bd4dc98

Replace capnp with another protocol

While it currently serves reasonably well, Capnp has several drawbacks for Rain:

  • The interface and support for some languages is limited (e.g. no RPC for some, Pytohn API is very low-level).
    • No support in browser means we need REST(-like) API for monitoring anyway.
  • Compiling in some HPC environments may be an additional problem (may not improve with other solutions).
  • The message size is limited to 64MB (see #8) which makes large submits (and possibly other information calls) complicated.
  • We do not really need the advanced features of Capnp such as remote objects (these also need an additional packet to free even if they have no methods, see capnproto/capnproto#534).
    • The current capnp-rust sends every message in 2 packets - a further minor slowdown
    • However we do need bi-directional RPC calls, so no simple server/client RPC would do.

Therefore it might be better to use different protocols for different interfaces:

  • For client, low latency is not critical while portability and ease of use are, so it may make sense to have a REST API (or gRPC). The monitoring javascript app already uses REST for information.
    • It would be nice to leverage serde-json for the messages
    • API can be specified with a tool to generate code stubs and verifiers (like Apiary?)
  • For server-worker communication, it would be ergonomic to use serde, tarpc or possibly even abomonation with some simple framing.
  • For worker-subworker communication, it would be best to have some multi-platform and simple and yet fast protocol. Here the solution is less clear, but the communication schema is simpler (even if ideally with bi-directional messages).

This issue is a basis for discussion of requirements and options. The transition from Capnp is not critical.

Create a Rust task library

Implement a library to make rust subworker implementation easier. One possible inspiration is Rocket (as in the example below).

use rain_task::{Context, DataInstance, Output, Result, Subworker};

// One possibility: macro magic
#[rain_task("sometask")]
pub fn sometask(ctx: &mut Context, in1: &DataInstance, in2: &DataInstance, out1: &mut Output) -> Result<()> {
    write!(out1, "Length of in1 is {}, type of in2 is {}", in1.len(), in2.content_type())?;
    out1.set_context_type("text")?;
    // Set user attributes to anything serializable
    ctx.set_attribute("my-attr", [42, 43])?;
}

// Variable inputs / outputs
pub fn othertask(ctx: &mut Context, ins: &[DataInstance], outs: &mut [Output]) -> Result<()> {
  // ...
}

pub fn main() {
    // annotated functions are added autoamtically here
    let s = Subworker::with_annotated();
    // macro to add fixed arity function
    add_task!(s, 2, 1, sometask).unwrap();
    // add variadic task
    s.add_task("othertask", othertask).unwrap();
    s.run().unwrap();
}

Write/Link methods

Write/Link methods

The purpose of this proposal to make more public and more explicit two already internally implemented functions in worker (under different names). They serves to mapping data instances to file system. The proposed concepts are:

  • linking - Linking a data object to file system creates read-only representation on filesystem. If data instance already lays on the file system in workdir it actually creates a symlink to these data.
  • writing - Creates a stand-alone copy of data on the file system that has no further connection with the data instance and can be freely modified.

(1) Note that linking may silently fallback to writing, for example when we have only in-memory representation of a data object.
(2) Note that in upcomming directory support, it is expect that linking/writing "dir" data object creates directory representation on filesystem.

Write/Link in @Remote

I propose to implement two new methods on data instance: link and write. They should behave as described above.

@remote()
def my_remote(ctx, data):
    data.write("mydata")  # creates stand-alone copy of data instance into file "mydata"
    data.link("mydata2")  # creates a read-only copy of data instance (may possibily create a symlink)

Write/Link in tasks.execute()

I propose to add a new parameter write for class Input. When set to True, it forces write operation when data object is mapped to the file system.

tasks.execute(["/some/binary", Input("mydata", write=True, dataobj=x)])

When write is False, then link operation is used (this is behaviour in the current version).

Python client segfaults with server running under WSL

I tried executing the example from the README using Windows Subsystem for Linux. The server starts up with no apparent errors but executing the Python client script results in the client crashing with a SIGSEGV address boundary error. I tried running it with strace but there nothing obvious in the output. I can include the strace output if you want, but I'm not sure what other information I can gather to help diagnose.

I'm using Windows 10 Enterprise 10.0.16299 with Ubuntu 16.04 in WSL. Something to note is that I had to install the cython system package before running pip3 install rain-python. I used virtualenv to create a virtualized Python install into which I installed rain-python.

Testing the same, but without the virtualized Python, in a Ubuntu 16.04 Docker container works so it's something to do with WSL.

Is it possible to get the server working on Windows directly? I tried compiling it but it makes use of crates that provide Unix-only functions like gethostname. Is there no cross-platform equivalent in Rust std or another crate?

Python API for data type in data objects

Python API for data type in data objects

In the current version, data instance has data types with possible values: blob, directory. After some discussions it seems that it makes sense to put data types also into data objects, i.e. task graph already contains information about data type.

This is a proposal how to handle this in Python API. This is relevant for Python tasks and tasks.execute + Program class.

Now, the user indicates that output is directory by setting content_type to 'dir', e.g.:

Output("mydata", content_type="dir")

Here we propose to introduce OutputDir class indicate directory output, and reserve ```Output`` for blob data type. Both can be used in the remote python tasks, tasks.execute, and Program.

Output("mydata")  # blob
OutputDir("mydata")  # directory

To make it symetric, we can also introduce Input/InputDir for blob/directory inputs. Strictly speaking it is not necessary as data type may be obtain from provided object (and in case Program, decision of data type may be postponed). However, the ideas is to provide additional level of "type" check:

Input("mydata", dataobj=d)  # Fail if 'd' is data object of directory type
InputDir("mydata", dataobj=d)  # Fail if 'd' is data object of blob type

In the case of implicit input, right data type is derived from provided data object:

tasks.execute(["du", "-h", d])  # This will work for 'd' being directory or blob

Alternatives

  • Provide extra argument for Input and Output:
Input("mydata", data_type=DataType.Blob)
Output("mydata", data_type=DataType.Directory)
  • Do not put data type into data objects (the current state)

error in pip3 install rain-python

@ALL

hi all:

when installing rain-python using pip3 install rain-python

error information

Installing collected packages: pycapnp, rain-python
  Running setup.py install for pycapnp ... error
    Complete output from command /Library/Frameworks/Python.framework/Versions/3.6/bin/python3.6 -u -c "import setuptools, tokenize;__file__='/private/var/folders/y3/xb_8fx894lq_89d7kv0k3gq00000gp/T/pip-install-pdz0rc5o/pycapnp/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /private/var/folders/y3/xb_8fx894lq_89d7kv0k3gq00000gp/T/pip-record-5zxre8wk/install-record.txt --single-version-externally-managed --compile:
    Compiling capnp/lib/capnp.pyx because it changed.
    [1/1] Cythonizing capnp/lib/capnp.pyx
    running install
    running build
    running build_py
    creating build/lib.macosx-10.6-intel-3.6
    creating build/lib.macosx-10.6-intel-3.6/capnp
    copying capnp/version.py -> build/lib.macosx-10.6-intel-3.6/capnp
    copying capnp/__init__.py -> build/lib.macosx-10.6-intel-3.6/capnp
    copying capnp/_gen.py -> build/lib.macosx-10.6-intel-3.6/capnp
    copying capnp/__init__.pxd -> build/lib.macosx-10.6-intel-3.6/capnp
    copying capnp/schema.capnp -> build/lib.macosx-10.6-intel-3.6/capnp
    copying capnp/c++.capnp -> build/lib.macosx-10.6-intel-3.6/capnp
    creating build/lib.macosx-10.6-intel-3.6/capnp/helpers
    copying capnp/helpers/helpers.pxd -> build/lib.macosx-10.6-intel-3.6/capnp/helpers
    copying capnp/helpers/non_circular.pxd -> build/lib.macosx-10.6-intel-3.6/capnp/helpers
    copying capnp/helpers/__init__.pxd -> build/lib.macosx-10.6-intel-3.6/capnp/helpers
    copying capnp/helpers/rpcHelper.h -> build/lib.macosx-10.6-intel-3.6/capnp/helpers
    copying capnp/helpers/fixMaybe.h -> build/lib.macosx-10.6-intel-3.6/capnp/helpers
    copying capnp/helpers/capabilityHelper.h -> build/lib.macosx-10.6-intel-3.6/capnp/helpers
    copying capnp/helpers/asyncHelper.h -> build/lib.macosx-10.6-intel-3.6/capnp/helpers
    copying capnp/helpers/checkCompiler.h -> build/lib.macosx-10.6-intel-3.6/capnp/helpers
    copying capnp/helpers/serialize.h -> build/lib.macosx-10.6-intel-3.6/capnp/helpers
    creating build/lib.macosx-10.6-intel-3.6/capnp/includes
    copying capnp/includes/types.pxd -> build/lib.macosx-10.6-intel-3.6/capnp/includes
    copying capnp/includes/__init__.pxd -> build/lib.macosx-10.6-intel-3.6/capnp/includes
    copying capnp/includes/schema_cpp.pxd -> build/lib.macosx-10.6-intel-3.6/capnp/includes
    copying capnp/includes/capnp_cpp.pxd -> build/lib.macosx-10.6-intel-3.6/capnp/includes
    creating build/lib.macosx-10.6-intel-3.6/capnp/lib
    copying capnp/lib/capnp.pxd -> build/lib.macosx-10.6-intel-3.6/capnp/lib
    copying capnp/lib/__init__.pxd -> build/lib.macosx-10.6-intel-3.6/capnp/lib
    copying capnp/lib/__init__.py -> build/lib.macosx-10.6-intel-3.6/capnp/lib
    copying capnp/lib/pickle_helper.py -> build/lib.macosx-10.6-intel-3.6/capnp/lib
    copying capnp/lib/capnp.pyx -> build/lib.macosx-10.6-intel-3.6/capnp/lib
    creating build/lib.macosx-10.6-intel-3.6/capnp/templates
    copying capnp/templates/setup.py.tmpl -> build/lib.macosx-10.6-intel-3.6/capnp/templates
    copying capnp/templates/module.pyx -> build/lib.macosx-10.6-intel-3.6/capnp/templates
    running build_ext
    creating var/folders/y3/xb_8fx894lq_89d7kv0k3gq00000gp/T/tmpl5n886va
    /usr/bin/clang -fno-strict-aliasing -Wsign-compare -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -arch i386 -arch x86_64 -g -c /var/folders/y3/xb_8fx894lq_89d7kv0k3gq00000gp/T/tmpl5n886va/vers.cpp -o var/folders/y3/xb_8fx894lq_89d7kv0k3gq00000gp/T/tmpl5n886va/vers.o --std=c++11
    /var/folders/y3/xb_8fx894lq_89d7kv0k3gq00000gp/T/tmpl5n886va/vers.cpp:4:10: fatal error: 'capnp/common.h' file not found
    #include "capnp/common.h"
             ^~~~~~~~~~~~~~~~
    1 error generated.
    *WARNING* no libcapnp detected or rebuild forced. Will download and build it from source now. If you have C++ Cap'n Proto installed, it may be out of date or is not being detected. Downloading and building libcapnp may take a while.
    already have /private/var/folders/y3/xb_8fx894lq_89d7kv0k3gq00000gp/T/pip-install-pdz0rc5o/pycapnp/bundled/capnproto-c++
    /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ranlib: file: /private/var/folders/y3/xb_8fx894lq_89d7kv0k3gq00000gp/T/pip-install-pdz0rc5o/pycapnp/build/lib/libkj-async.a(async-win32.o) has no symbols
    /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ranlib: file: /private/var/folders/y3/xb_8fx894lq_89d7kv0k3gq00000gp/T/pip-install-pdz0rc5o/pycapnp/build/lib/libkj-async.a(async-io-win32.o) has no symbols
    /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ranlib: file: /private/var/folders/y3/xb_8fx894lq_89d7kv0k3gq00000gp/T/pip-install-pdz0rc5o/pycapnp/build/lib/libcapnp.a(list.o) has no symbols
    building 'capnp.lib.capnp' extension
    creating build/temp.macosx-10.6-intel-3.6
    creating build/temp.macosx-10.6-intel-3.6/capnp
    creating build/temp.macosx-10.6-intel-3.6/capnp/lib
    /usr/bin/clang -fno-strict-aliasing -Wsign-compare -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -arch i386 -arch x86_64 -g -I. -I/Library/Frameworks/Python.framework/Versions/3.6/include/python3.6m -I/private/var/folders/y3/xb_8fx894lq_89d7kv0k3gq00000gp/T/pip-install-pdz0rc5o/pycapnp/build/include -c capnp/lib/capnp.cpp -o build/temp.macosx-10.6-intel-3.6/capnp/lib/capnp.o --std=c++11
    In file included from capnp/lib/capnp.cpp:520:
    ./capnp/helpers/checkCompiler.h:4:8: warning: "Your compiler supports C++11 but your C++ standard library does not.  If your system has libc++ installed (as should be the case on e.g. Mac OSX), try adding -stdlib=libc++ to your CFLAGS (ignore the other warning that says to use CXXFLAGS)." [-W#warnings]
          #warning "Your compiler supports C++11 but your C++ standard library does not.  If your system has libc++ installed (as should be the case on e.g. Mac OSX), try adding -stdlib=libc++ to your CFLAGS (ignore the other warning that says to use CXXFLAGS)."
           ^
    In file included from capnp/lib/capnp.cpp:520:
    In file included from ./capnp/helpers/checkCompiler.h:9:
    In file included from /private/var/folders/y3/xb_8fx894lq_89d7kv0k3gq00000gp/T/pip-install-pdz0rc5o/pycapnp/build/include/capnp/dynamic.h:40:
    In file included from /private/var/folders/y3/xb_8fx894lq_89d7kv0k3gq00000gp/T/pip-install-pdz0rc5o/pycapnp/build/include/capnp/schema.h:33:
    In file included from /private/var/folders/y3/xb_8fx894lq_89d7kv0k3gq00000gp/T/pip-install-pdz0rc5o/pycapnp/build/include/capnp/schema.capnp.h:7:
    In file included from /private/var/folders/y3/xb_8fx894lq_89d7kv0k3gq00000gp/T/pip-install-pdz0rc5o/pycapnp/build/include/capnp/generated-header-support.h:31:
    In file included from /private/var/folders/y3/xb_8fx894lq_89d7kv0k3gq00000gp/T/pip-install-pdz0rc5o/pycapnp/build/include/capnp/raw-schema.h:29:
    In file included from /private/var/folders/y3/xb_8fx894lq_89d7kv0k3gq00000gp/T/pip-install-pdz0rc5o/pycapnp/build/include/capnp/common.h:34:
    /private/var/folders/y3/xb_8fx894lq_89d7kv0k3gq00000gp/T/pip-install-pdz0rc5o/pycapnp/build/include/kj/string.h:29:10: fatal error: 'initializer_list' file not found
    #include <initializer_list>
             ^~~~~~~~~~~~~~~~~~
    1 warning and 1 error generated.
    error: command '/usr/bin/clang' failed with exit status 1
    
    ----------------------------------------
Command "/Library/Frameworks/Python.framework/Versions/3.6/bin/python3.6 -u -c "import setuptools, tokenize;__file__='/private/var/folders/y3/xb_8fx894lq_89d7kv0k3gq00000gp/T/pip-install-pdz0rc5o/pycapnp/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /private/var/folders/y3/xb_8fx894lq_89d7kv0k3gq00000gp/T/pip-record-5zxre8wk/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /private/var/folders/y3/xb_8fx894lq_89d7kv0k3gq00000gp/T/pip-install-pdz0rc5o/pycapnp/

clang version

/usr/bin/clang --version
Apple LLVM version 9.1.0 (clang-902.0.39.2)
Target: x86_64-apple-darwin17.7.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

can some one give an advice to fix this error?

Create a C++ subworker library

Implement a library to make C++ subworker implementation easier. In spirit, this should consist of I/O and Context types, way to declare and register task functions and a simple mainloop function, similarly to #34. Would you sketch the API, @spirali?

Worker crashes with 'DataObject is not finished'

Rain worker crashes with thread 'main' panicked at 'DataObject is not finished', src/worker/graph/dataobj.rs:81:18.

The deployment consisted of 1 local and 1 remote worker running on Exoscale infrastructure connecting from a remote client.

Worker log including stack backtrace:

ubuntu@test-0:~$ rain worker <server-ip>:7210 --listen=7211
๐ŸŒง  INFO 2018-03-26T10:22:40Z Starting Rain 0.1.1 as worker
๐ŸŒง  INFO 2018-03-26T10:22:40Z Resources: 1 cpus
๐ŸŒง  INFO 2018-03-26T10:22:40Z Working directory: "/tmp/rain-work/worker-test-0-1261"
๐ŸŒง  INFO 2018-03-26T10:22:40Z Server address <server-ip>:7210 was resolved as <server-ip>:7210
๐ŸŒง  INFO 2018-03-26T10:22:40Z Start listening on port=7211
๐ŸŒง  INFO 2018-03-26T10:22:40Z Connecting to server addr=<server-ip>:7210
๐ŸŒง  INFO 2018-03-26T10:22:40Z Connected to server; registering as worker
๐ŸŒง  INFO 2018-03-26T10:22:43Z New connection from <server-ip>:40118
๐ŸŒง  INFO 2018-03-26T10:23:33Z New connection from 127.0.0.1:54122
thread 'main' panicked at 'DataObject is not finished', src/worker/graph/dataobj.rs:81:18
note: Run with `RUST_BACKTRACE=1` for a backtrace.
ubuntu@test-0:~$ RUST_BACKTRACE=1 rain worker 185.150.9.32:7210 --listen=7211
๐ŸŒง  INFO 2018-03-26T10:24:19Z Starting Rain 0.1.1 as worker
๐ŸŒง  INFO 2018-03-26T10:24:19Z Resources: 1 cpus
๐ŸŒง  INFO 2018-03-26T10:24:19Z Working directory: "/tmp/rain-work/worker-test-0-1317"
๐ŸŒง  INFO 2018-03-26T10:24:19Z Server address 185.150.9.32:7210 was resolved as 185.150.9.32:7210
๐ŸŒง  INFO 2018-03-26T10:24:19Z Start listening on port=7211
๐ŸŒง  INFO 2018-03-26T10:24:19Z Connecting to server addr=185.150.9.32:7210
๐ŸŒง  INFO 2018-03-26T10:24:19Z Connected to server; registering as worker
๐ŸŒง  INFO 2018-03-26T10:24:23Z New connection from 185.150.9.32:40124
๐ŸŒง  INFO 2018-03-26T10:24:24Z New connection from 127.0.0.1:54126
thread 'main' panicked at 'DataObject is not finished', src/worker/graph/dataobj.rs:81:18
stack backtrace:
   0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace
             at /checkout/src/libstd/sys/unix/backtrace/tracing/gcc_s.rs:49
   1: std::sys_common::backtrace::print
             at /checkout/src/libstd/sys_common/backtrace.rs:68
             at /checkout/src/libstd/sys_common/backtrace.rs:57
   2: std::panicking::default_hook::{{closure}}
             at /checkout/src/libstd/panicking.rs:381
   3: std::panicking::default_hook
             at /checkout/src/libstd/panicking.rs:397
   4: std::panicking::rust_panic_with_hook
             at /checkout/src/libstd/panicking.rs:577
   5: std::panicking::begin_panic
   6: <librain::worker::rpc::datastore::DataStoreImpl as librain::datastore_capnp::data_store::Server>::create_reader
   7: <librain::datastore_capnp::data_store::ServerDispatch<_T> as capnp::capability::Server>::dispatch_call
   8: <futures::future::lazy::Lazy<F, R> as futures::future::Future>::poll
   9: <capnp_rpc::attach::AttachFuture<F, T> as futures::future::Future>::poll
  10: <futures::future::chain::Chain<A, B, C>>::poll
  11: <futures::future::chain::Chain<A, B, C>>::poll
  12: futures::task_impl::std::set
  13: <capnp_rpc::forked_promise::ForkedPromise<F> as futures::future::Future>::poll
  14: <futures::future::chain::Chain<A, B, C>>::poll
  15: <futures::future::select::Select<A, B> as futures::future::Future>::poll
  16: <futures::future::map::Map<A, F> as futures::future::Future>::poll
  17: <futures::future::map_err::MapErr<A, F> as futures::future::Future>::poll
  18: <futures::future::chain::Chain<A, B, C>>::poll
  19: futures::task_impl::std::set
  20: <futures::stream::futures_unordered::FuturesUnordered<T> as futures::stream::Stream>::poll
  21: <capnp_rpc::task_set::TaskSet<T, E> as futures::future::Future>::poll
  22: <futures::future::chain::Chain<A, B, C>>::poll
  23: futures::task_impl::std::set
  24: <futures::stream::futures_unordered::FuturesUnordered<T> as futures::stream::Stream>::poll
  25: <capnp_rpc::task_set::TaskSet<T, E> as futures::future::Future>::poll
  26: <futures::future::map_err::MapErr<A, F> as futures::future::Future>::poll
  27: <futures::task_impl::Spawn<T>>::poll_future_notify
  28: <std::thread::local::LocalKey<T>>::with
  29: <tokio::executor::current_thread::scheduler::Scheduler<U>>::tick
  30: <scoped_tls::ScopedKey<T>>::set
  31: <std::thread::local::LocalKey<T>>::with
  32: <std::thread::local::LocalKey<T>>::with
  33: tokio_core::reactor::Core::poll
  34: tokio_core::reactor::Core::turn
  35: rain::main
  36: std::rt::lang_start::{{closure}}
  37: std::panicking::try::do_call
             at /checkout/src/libstd/rt.rs:59
             at /checkout/src/libstd/panicking.rs:480
  38: __rust_maybe_catch_panic
             at /checkout/src/libpanic_unwind/lib.rs:101
  39: std::rt::lang_start_internal
             at /checkout/src/libstd/panicking.rs:459
             at /checkout/src/libstd/panic.rs:365
             at /checkout/src/libstd/rt.rs:58
  40: main
  41: __libc_start_main
  42: _start

Server log:

๐ŸŒง  INFO 2018-03-26T10:24:05Z Starting Rain 0.1.1 server at port 0.0.0.0:7210
๐ŸŒง  INFO 2018-03-26T10:24:05Z Start listening on address=0.0.0.0:7210
๐ŸŒง  INFO 2018-03-26T10:24:05Z Dashboard is running at http://test-1:8080/
๐ŸŒง  INFO 2018-03-26T10:24:05Z Lite dashboard is running at http://test-1:8080/lite/
๐ŸŒง  INFO 2018-03-26T10:24:16Z New connection from 127.0.0.1:54004
๐ŸŒง  INFO 2018-03-26T10:24:16Z Connection 127.0.0.1:54004 registered as worker 127.0.0.1:7211 with Resources { cpus: 1 }
๐ŸŒง  INFO 2018-03-26T10:24:20Z New connection from 185.150.8.101:42994
๐ŸŒง  INFO 2018-03-26T10:24:20Z Connection 185.150.8.101:42994 registered as worker 185.150.8.101:7211 with Resources { cpus: 1 }
๐ŸŒง  INFO 2018-03-26T10:24:24Z New connection from 195.113.113.226:55798
๐ŸŒง  INFO 2018-03-26T10:24:24Z Connection 195.113.113.226:55798 registered as client
๐ŸŒง  INFO 2018-03-26T10:24:24Z New task submission (11 tasks, 11 data objects) from client 195.113.113.226:55798
๐ŸŒง  INFO 2018-03-26T10:24:24Z New get_state request (0 tasks, 1 data objects) from client
๐ŸŒง  INFO 2018-03-26T10:24:24Z Client 195.113.113.226:55798 disconnected
๐ŸŒง  INFO 2018-03-26T10:24:25Z New connection from 195.113.113.226:55800
๐ŸŒง  INFO 2018-03-26T10:24:25Z Connection 195.113.113.226:55800 registered as client
๐ŸŒง  INFO 2018-03-26T10:24:25Z New task submission (11 tasks, 11 data objects) from client 195.113.113.226:55800
๐ŸŒง ERROR 2018-03-26T10:24:26Z Connection to worker 185.150.8.101:7211 lost
thread 'main' panicked at 'not yet implemented', src/server/state.rs:94:9
note: Run with `RUST_BACKTRACE=1` for a backtrace.

Fix cargo install command in README.md

README.md:

Installation via cargo

If you have installed Rust, you can install and start Rain as follows:

$ cargo install rust_server

$ pip3 install rain-python

$ rain start --simple

Shouldn't it be:
cargo install rain_server
?

Task pinning to workers

Add task attribute to allow only a subset of workers. An empty attribute

Usage:

  • Working on files local to a worker (either reading or writing).
    • Useful for open and store tasks (but not limited to them).
  • Selecting suitable worker subset for a taks when (virtual) resources are not appropriate.
    • However, for GPUs and other features, we should use resources.
  • Testing.

Constant data objects already have a similar feature for placement, but this API is not planned to be published (and it is not really relevant for computed data objects).

Alternative: Every worker could be a resource, required by the task. However, this is not ergonomic and we do not have a clear picture of how to do (virtual) resources in the right way.

Roadmap after v0.3

A document to track the directions from 0.3, replacing #26. Our mid- and long-term goals, their [priority], (asignee) and any sub-tasks.

Any help is welcome with mentoring available for most tasks!

Remaining enhancements from v0.3

Will be updated after prioritization discussion.

Client-side protocols

Replace capnp RPC and the current monitoring dashboard HTTP API with common protocol.
Part of #11 (more discussion there) but specific to the public API.

Improve the dashboard with more information and post-mortem analysis

  • Design and revamp the dashborad. Depends on the client API development (@gavento) [medium/low] (#38)
  • Include stats for task/object groups and possibly names/labels from #32 [low] (#38)

Fix current bugs

  • #7 (occurs under heavy load only) [medium]
  • #13 (seems to be bound to Exoscale deployment) [high]

Custom tasks (subworkers) in more languages

  • Python subworker as a library [low] (run standalone scripts as opposed to defining them in the client only)

Easier deployment in the cloud

Packaging for easier deployment

Multiple options, priorities may vary. (@spirali)

  • AppImage/Snap packages [low] (we already have static binaries)
  • Deb/other distro packages [low]

Improve Python API

Pythonize the client API.

  • Draft content-type loaders/extensions (@gavento) [low]
  • Task/object groups and names/labels (#32) [low]

Improve testing infrastructure

  • Scripts/containers/... to test deployment and running in a network. (@vojtechcima) [medium]
    • Test rain start and running on OpenStack, Exoscale, AWS. Does not have to be a part of CI (even for running locally). Depends on / part of #37.

More real-world code examples

Lower priority, best based on real use-cases. Ideas: numpy subtasks, C++/Rust subworkers

Enhancements to revisit in the (not so distant) future

  • Integration with some popular libraries
    • Apache Arrow content-type
      • Basic type and loading is implemented. We could add more operations (filter, split, merge, ...)
    • XGBoost tasks, etc ...
    • Why not now: Not clear what would be the demand
  • Worker configuration files (needed for common (CPU) and special resources (GPU), different subworker locatins and configurations, ...)
    • Partially done
    • Why not now: Needs to be thought-through (esp. w.r.t. resources), not needed now
  • Separate session construction and running (save/load session)
    • Why not now: Not clear what would be the use-cases, not difficult when API stabilized
  • Clients in other languages: Rust, C++, Java, ...
    • Why not now: Not clear what would be the demand. Easier after the protocol/Python API stabilization.
  • Scale the scheduler, benchmarks
    • There is a benchmark in utils/bench/simple_task_scaling.py. The results as of 0.2 are here.
    • Why not now: While eventually crucial, the scheduler is sufficient when there are <1000 tasks to be scheduled at once.

Cannot create log file error

$ ./rain start --worker-host-file=hosts.txt
๐ŸŒง  INFO 2018-04-11T08:15:45Z Starting Rain 0.2.0
๐ŸŒง  INFO 2018-04-11T08:15:45Z Log directory: /tmp/rain-logs/worker-p1-16085
๐ŸŒง  INFO 2018-04-11T08:15:45Z Starting local server (0.0.0.0:7210)
๐ŸŒง  INFO 2018-04-11T08:15:45Z Dashboard: http://p1:8080/
๐ŸŒง  INFO 2018-04-11T08:15:45Z Server pid = 16086
๐ŸŒง  INFO 2018-04-11T08:15:45Z Process 'server' is ready
๐ŸŒง  INFO 2018-04-11T08:15:45Z Starting 1 remote worker(s)
๐ŸŒง  INFO 2018-04-11T08:15:45Z Connecting to jupiter (remote log dir: "/tmp/rain-logs/worker-p1-16085")
๐ŸŒง ERROR 2018-04-11T08:15:45Z Remote process at jupiter, the following error occurs: Cannot create log file

๐ŸŒง  INFO 2018-04-11T08:15:45Z Error occurs; clean up started processes ...

I believe the log directory should be created first by calling mkdir -p {log_dir} before trying to create the log file at

touch {log_out:?} || (echo \"Error: Cannot create log file\"; exit 1)\n

touch does not create parent directories on my Debian server.

Change task types to class hierarchy

In the Python API, all the tasks are currently an instance of the Task class with different types indicated by task_type. It would be more pythonic to implement them as a hierarchy of classes, one for each task_type (or even one for every Program or python Remote func. - TBD). This would have the advantages to allow using isinstance and better introspection (__repr__), access to parameters specific to the task type and allow subclassing.

Independently of this change, we can keep the task-creation operations as functions (and lower-cased), e.g. keep tasks.concat in additon to having tasks.ConcatTask.

This should be mostly non-disruptive change and needs a more detailed spec of Task interface.

Attributes refactor and specification

The attributes are currently not well-specified and there seem to be ways to implement them better before we release the (much more stable) v0.3.

Goals:

  • Rust structures (with serde) with typed values for known attributes (and a map for user/unknown)
  • Split Task and DataObject attributes (required by the above, will allow checking in Python and other clients)
  • Specify a fixed encoding for rich and user-specified values (current proposal: JSON with base64 binary data) (note that non-trivial binary data in attributes are generally discouraged)
  • Create a rich Attributes class in Python (underway in #60)
  • Fix and document the new structure semantics

Is support async/await syntax in roadmap or not

@ALL

hi all:
I have read the doc on https://substantic.github.io/rain/docs/index.html.

The example given in get-started as below

from rain.client import Client, tasks, blob

# Connect to server
client = Client("localhost", 7210)

# Create a new session
with client.new_session() as session:

    # Create task (and two data objects)
    task = tasks.Concat((blob("Hello "), blob("world!"),))

    # Mark that the output should be kept after submit
    task.output.keep()

    # Submit all crated tasks to server
    session.submit()

    # Wait for completion of task and fetch results and get it as bytes
    result = task.output.fetch().get_bytes()

    # Prints 'Hello world!'
    print(result)

This is so amusing.

But Python3.5+ support async/await syntax, does this project support this syntax?

in roadmap I find this task Update in the Python API (using aiohttp for async API) (@gavento) [medium]. Is it in order to support `async/await1 syntax?

After supporting async/await syntax, the example maybe shows as below

from rain.client import Client, tasks, blob

# Connect to server
client = Client("localhost", 7210)

# Create a new session
async with client.new_session() as session:

    # Create task (and two data objects)
    task = tasks.Concat((blob("Hello "), blob("world!"),))

    # Mark that the output should be kept after submit
    task.output.keep()

    # Submit all crated tasks to server
    session.submit()

    # Wait for completion of task and fetch results and get it as bytes
    result = await task.output.fetch().get_bytes()

    # Prints 'Hello world!'
    print(result)

Startup and control scripts for CloudStack and others

While Rain startup on PBS already has basic support, it would be great to have scripts to setup and control (at least teardown) deployment on some of: CloudStack API (Exoscale), AWS, Goole cloud, ...

@vojtechcima is already working on that - Vojta, would you please take over and fill some plans and status?

Roadmap after v0.2

We have received many feedbacks from our reddit post (https://www.reddit.com/r/rust/comments/89yppv/rain_rust_based_computational_framework/). I think that now is time to recap our plans and maybe reconsider priorities to reflect real needs of users. This is meant as a kind of brainstorming white board; each individal task should have own issue at the end. Also, I would like to focus on a relatively short term road plan (let us say things that could be finished within 3 months), maybe we can create a similar post for our long term plans.

EDIT (gavento): Moved @spirali's TODO to a post below.

Rare server crash (parallel inter-task dependencies + other conditions)

Rain server panics while a task becomes redy here. The relevant part of the log seems to be the following:

...
DEBUG 2018-03-17T15:31:49Z: librain::server::scheduler: Scheduler: New ready task (1,23092)
... [many New ready task info lines, various IDs]
DEBUG 2018-03-17T15:31:49Z: librain::server::scheduler: Scheduler: New ready task (1,23092)
thread 'main' panicked at 'assertion failed: r', src/server/scheduler.rs:148:17
stack backtrace:
   0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace
DEBUG 2018-03-17T15:31:49Z: tokio_reactor: loop process - 1 events, 0.000s
             at libstd/sys/unix/backtrace/tracing/gcc_s.rs:49
   1: std::sys_common::backtrace::_print
             at libstd/sys_common/backtrace.rs:71
   2: std::panicking::default_hook::{{closure}}
             at libstd/sys_common/backtrace.rs:59
             at libstd/panicking.rs:207
   3: std::panicking::default_hook
             at libstd/panicking.rs:223
   4: std::panicking::rust_panic_with_hook
             at libstd/panicking.rs:402
   5: std::panicking::begin_panic
   6: librain::server::scheduler::ReactiveScheduler::schedule
   7: librain::server::state::State::run_scheduler
   8: librain::server::state::<impl librain::common::wrapped::WrappedRcRefCell<librain::server::state::State>>::turn
   9: rain::main
  10: std::rt::lang_start::{{closure}}
  11: std::panicking::try::do_call
             at libstd/rt.rs:59
             at libstd/panicking.rs:306
  12: __rust_maybe_catch_panic
             at libpanic_unwind/lib.rs:102
  13: std::rt::lang_start_internal
             at libstd/panicking.rs:285
             at libstd/panic.rs:361
             at libstd/rt.rs:58
  14: main
  15: __libc_start_main
  16: _start
DEBUG 2018-03-17T15:31:49Z: tokio_reactor: loop process - 1 events, 0.000s
DEBUG 2018-03-17T15:31:49Z: tokio_reactor::background: shutting background reactor down NOW
...

However, a small test for multiple identical inputs passes, even with subsequent submits. The benchmark only fails with >500 tasks per layer. See the benchmark attached. It was run as python3 scalebench.py net -l 256 -w 1024 -s 0, the error happens around layer 10.
The debug checks with RAIN_DEBUG_MODE=1 do not find any consistency problems.

scalebench.py.txt

dependency problem of crate log 0.3.x

When rain is built or installed with log 0.3.x, it fails.

error[E0432]: unresolved import `log::Level`
   --> src/bin.rs:349:21
    |
349 |                 use log::Level;
    |                     ^^^^^^^^^^ no `Level` in the root

error: aborting due to previous error

It seems to depend on a change rust-lang-nursery/log#162. So If we use log 0.4.x, build or install success.

So, we should change "*" in Cargo.toml to >=0.4.
I'll make a PR.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.