rayon-rs / rayon Goto Github PK
View Code? Open in Web Editor NEWRayon: A data parallelism library for Rust
License: Apache License 2.0
Rayon: A data parallelism library for Rust
License: Apache License 2.0
Per this comment, we should deprecate the existing weight APIs. I think we probably still want some room for manual tuning, but it should be simpler by default -- and if we are going to let people give fine-grained numbers, it should be tied to the length (like seq. cutoff) rather than some abstract notion of weight. Ideally though we'd add these APIs along with representative benchmarks where they are needed.
e98c137 broke quicksort demo.
jobsteal seems to be a nice library. It might make sense to switch the underlying parallel code here to use it, and just concentrate on the higher-level abstractions in Rayon. Got to do more work on those benchmarks. :)
The docs recommend use rayon::par_iter::*;
-- but this includes every pub mod
, and those have very generic names. So something simple like this fails:
use rayon::par_iter::*;
use std::slice;
error: a module named
slice has already been imported in this module
If those submodules really need to be public, perhaps there should be a prelude to glob use instead?
I see that only ranges and slices can be converted into parallel iterators.
i wonder how to best utilize rayon while reading from a file.
ATM I read files into buffers, then parse them using an iterator. i imagine there’s a better way to use rayon than:
let buffer = String::new();
File::open(some_path)?.read_to_string(buffer)?; // iterate unindexed data once
let records: Vec<_> = ParseIter::new(buffer).collect(); // iterate unindexed data again
records.par_iter_mut().for_each(do_stuff); // iterate indexed data in parallel
Full disclaimer, this is likely my fault in some way :)
I'm using Rayon master as the dependency, since I wanted some of the 0.3 features.
My code uses par_iter_mut
to operate on a vector of "Pages" in parallel. There are 3969 Pages in total, and each page has variable amount of work. Total runtime is about 60 seconds on my Macbook air, and it appears to be using at least some of my cores (two cores are fairly heavily utilized, the two virtual hyperthreaded cores less so).
However, I tried to run this on my linux "heavy testing" server (hexacore Xeon E5-1650V2, 12 cores if you count hyperthreading) and find that only a single core is used. Worse, it appears only a single thread is being spawned with the default rayon settings (e.g. no custom Configuration):
$ top -H -p 11163
Threads: 2 total, 1 running, 1 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.3 us, 0.0 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 13200303+total, 4213284 used, 12778974+free, 95156 buffers
KiB Swap: 4190204 total, 0 used, 4190204 free. 2135916 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
11163 tongz 20 0 1442108 1.296g 2640 R 99.8 1.0 0:39.55 bench_grow
11164 tongz 20 0 1442108 1.296g 2640 S 0.0 1.0 0:00.00 log4rs config r
If I manually specify how many threads should be used, Rayon spawns the full set of threads... but only one thread is actually used:
rayon::initialize(rayon::Configuration::new().set_num_threads(12));
$ top -H -p 10495
Threads: 14 total, 1 running, 13 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.3 us, 0.0 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 13200303+total, 4212232 used, 12779080+free, 95128 buffers
KiB Swap: 4190204 total, 0 used, 4190204 free. 2135916 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
10495 tongz 20 0 2253264 1.296g 2680 R 99.7 1.0 0:28.41 bench_grow
10496 tongz 20 0 2253264 1.296g 2680 S 0.0 1.0 0:00.00 log4rs config r
10497 tongz 20 0 2253264 1.296g 2680 S 0.0 1.0 0:00.00 bench_grow
10498 tongz 20 0 2253264 1.296g 2680 S 0.0 1.0 0:00.00 bench_grow
10499 tongz 20 0 2253264 1.296g 2680 S 0.0 1.0 0:00.00 bench_grow
10500 tongz 20 0 2253264 1.296g 2680 S 0.0 1.0 0:00.00 bench_grow
10501 tongz 20 0 2253264 1.296g 2680 S 0.0 1.0 0:00.00 bench_grow
10502 tongz 20 0 2253264 1.296g 2680 S 0.0 1.0 0:00.00 bench_grow
10503 tongz 20 0 2253264 1.296g 2680 S 0.0 1.0 0:00.00 bench_grow
10504 tongz 20 0 2253264 1.296g 2680 S 0.0 1.0 0:00.00 bench_grow
10505 tongz 20 0 2253264 1.296g 2680 S 0.0 1.0 0:00.00 bench_grow
10506 tongz 20 0 2253264 1.296g 2680 S 0.0 1.0 0:00.00 bench_grow
10507 tongz 20 0 2253264 1.296g 2680 S 0.0 1.0 0:00.00 bench_grow
10508 tongz 20 0 2253264 1.296g 2680 S 0.0 1.0 0:00.00 bench_grow
Is it possible the compiler is optimizing away something? After the processing is done, I'm printing out some values from the computation to make sure the result is used. And processing still takes ~40s on my server, which makes me think it isn't just optimized away.
I thought the work chunks might be too small, so I tried variously sized work units and it seems to display the same behavior.
I'm not sure what would be helpful to debug. The code itself is here (and is mostly an unmitigated disaster atm, I haven't had a chance to clean it up). I'm using the bench_grow
example as a simple test: cargo run --release --example bench_grow
The salient code looks like this:
fn main() {
rayon::initialize(rayon::Configuration::new().set_num_threads(12));
let num_pages = 63u32;
let dimension = num_pages * PAGE_WIDTH;
let mut cajal = Cajal::new(num_pages, 0.01, &[1,2,3,7]);
let start = SteadyTime::now();
cajal.grow();
let elapsed = SteadyTime::now() - start;
let signal = (*cajal.get_cell(10, 10)).get_signal();
println!("Elapsed time: {:?} (signal: {})", elapsed, signal);
}
fn grow() {
self.pages.par_iter_mut()
.for_each(|page| page.grow());;
// some sequential processing
self.pages.par_iter_mut()
.for_each(|page| page.update());
}
I noticed that the kernel scheduler was showing up a lot in my perf report, and I tracked this down to having so many sched_yield
syscalls in waiting threads. Such busy work is wasted cpu time, not only for your own process but also if anything else is trying to run on the system.
I tried a crude patch below to use Condvar
for the blocking latch wait, and it already shows a clear benefit. The other yield loop is in steal_until
, which is harder because that's looping for either latch completion or available work -- suggestions welcome.
diff --git a/src/latch.rs b/src/latch.rs
index a81831d1cc7a..fbeff95acbd4 100644
--- a/src/latch.rs
+++ b/src/latch.rs
@@ -1,9 +1,11 @@
+use std::sync::{Mutex, Condvar};
use std::sync::atomic::{AtomicBool, Ordering};
-use std::thread;
/// A Latch starts as false and eventually becomes true. You can block
/// until it becomes true.
pub struct Latch {
+ m: Mutex<()>,
+ c: Condvar,
b: AtomicBool
}
@@ -11,19 +13,24 @@ impl Latch {
#[inline]
pub fn new() -> Latch {
Latch {
+ m: Mutex::new(()),
+ c: Condvar::new(),
b: AtomicBool::new(false)
}
}
/// Set the latch to true, releasing all threads who are waiting.
pub fn set(&self) {
+ let _guard = self.m.lock().unwrap();
self.b.store(true, Ordering::SeqCst);
+ self.c.notify_all();
}
/// Spin until latch is set. Use with caution.
pub fn wait(&self) {
+ let mut guard = self.m.lock().unwrap();
while !self.probe() {
- thread::yield_now();
+ guard = self.c.wait(guard).unwrap();
}
}
Before, running on Fedora 23 x86_64 with an i7-2600:
Running `target/release/quicksort 1000`
Array length : 1000K 32-bit integers
Repetitions : 10
Parallel time : 281422159
Sequential time : 848279007
Parallel speedup: 3.014258045685734
After, with Condvar wait:
Running `target/release/quicksort 1000`
Array length : 1000K 32-bit integers
Repetitions : 10
Parallel time : 225591277
Sequential time : 873530292
Parallel speedup: 3.872181157075502
We are currently using the deque from the crates.io deque
package -- however, @aturon informs me that over long uses that deque can leak memory, since it lacks any form of GC to reclaim nodes. The crossbeam deque is a port to use crossbeam's memory reclamation infrastructure, which should keep memory use in check. We should probably switch; however, it would be interesting to first have some Rayon benchmarks in place.
I just converted a for loop in cage to a parallel iteration. The original loop used try!
, giving me .map(|pod| -> Result<()> {
.
In a perfect world, I might like to collect all the errors in some order, but that would require messing around with reducing a "vector of errors" type, which is going to be allocation intensive. We can do almost as well by just choosing one of many errors, and showing that one to the user.
@cuviper helped me figure out the following solution:
// If more than one parallel branch fails, just return one error.
.reduce_with(|result1, result2| result1.and(result2).and(Ok(())))
.unwrap_or(Ok(()))
Having this as a standard, built-in reduction would make it much easier to parallelize loops using try!
.
We need to have a start at a standard Rayon benchmark suite. @cuviper got us started with https://github.com/nikomatsakis/rayon/pull/32. But we need a bit more.
par_iter
par_iter
perhaps?More suggestions or pointers welcome.
We also need some kind of script to compile and run the demos and collect the data. Ideally, we'd run it regularly on some server so we can track performance automatically.
Is it possible to tell rayon to use an user provided thread-pool (e.g. in my crate's main function)?
For example if I'd had a crate using tokio where I also want to use rayon, I would like to have a single thread-pool/event loop/task manager... that is used as the backend for both (and that does workstealing for both), instead of two competing ones.
even autogenned APIdocs without additional effort would be nice.
https://nikomatsakis.github.io/rayon would be an obvious place
As for normal iterators, but look for any matching item in parallel. (position
wouldn't make much sense.)
This warning probably ought to be updated now that #10 is closed:
WARNING: Rayon is still in experimental status. It is probably not very robust and not (yet) suitable for production applications. In particular, the handling of panics is known to be terrible. See nikomatsakis/rayon#10 for more details.
Currently, the question is, do we want a different name than threadpool::install()
? Something that makes it clearer that we will execute the closure in the thread-pool?
Original post:
Currently, threadpool::install
injects its closure argument in one of the worker threads for the threadpool. Since we are then executing in the worker threads, calls to join
or scope::spawn
naturally execute in that threadpool. I chose this design because it's zero overhead -- the logic for join
and scope::spawn
are the same ("if in worker thread, push, else inject into global pool").
However, I have been contemplating a change where we instead set a thread-local variable to point at the thread-pool, and then consult this thread-local variable when injecting threads. The reason is that I want people to be able to install
at a very high point in their callstack without disturbing thread-local data they may need to use before they spawn tasks (once they spawn, there is not much I can do about that of course).
I was trying some implementations for #88 std collections (WIP), but I hit a stumbling block on IntoParallelRefIterator
and IntoParallelRefMutIterator
. These both require their Item
to be the type behind the reference, so the ParallelIterator
's Item
is this &Item
and &mut Item
respectively. That's OK for most collections, but the map iterators use (&K, &V)
and (&K, &mut V)
.
It would be nicer if the Item
represented the same as the parallel iterator, so both the typical &T
and map's (&K, &V)
can work. Perhaps some future collection would even like something that only emulates a reference, like a cell Ref
or a MutexGuard
.
With this made more general, it can also have a blanket implementation, like:
pub trait IntoParallelRefIterator<'data> {
type Iter: ParallelIterator<Item=Self::Item>;
type Item: Send + 'data;
fn par_iter(&'data self) -> Self::Iter;
}
impl<'data, T: 'data + ?Sized> IntoParallelRefIterator<'data> for T
where &'data T: IntoParallelIterator
{
type Iter = <&'data T as IntoParallelIterator>::Iter;
type Item = <&'data T as IntoParallelIterator>::Item;
fn par_iter(&'data self) -> Self::Iter {
self.into_par_iter()
}
}
Users of this trait would now lose the ability to assume that par_iter()
yields simple references, but I'm not sure that's actually needed or valuable.
Thoughts?
Have you considered using Crossbeam ( https://aturon.github.io/blog/2015/08/27/epoch/ ) for managing the thread queues?
If a worker thread panics, I would expect rayon to either return an error result or simply propagate that panic. Instead, it just hangs waiting for a thread that will never finish. Example:
extern crate rayon;
fn main() {
let s = std::time::Duration::new(1, 0);
rayon::join(|| std::thread::sleep(s),
|| unimplemented!());
}
The unimplemented panic will be reported, but then the process just spins forever waiting for it.
In #9, I toyed with a Mutex
and Condvar
. If that mutex were held throughout the stolen job execution, then a panic would poison it. I think this will wake the condvar wait with the poisoned result, but I haven't tried it. We'd also need the steal_until
loop to detect panics.
The panic handling tests do not test the scope API!
We need tests to check that if a job in a scope panics:
#36 added an nbody demo. It would be nice to glium instead of SDL though.
master doesn't work anymore with 2016-05-12 nightly.
So I was attempting to use into_par_iter()
on something that needed mutability, and ran into what seems like a limitation.
My setup is a cellular automata grid. A single Grid holds a vector of Pages, and each Page holds a vector of Cells. The goal was to into_par_iter()
over the pages, since they can be processed independently. However, the grow()
method on each Page requires mutability because they modify their state internally...which seems to be a problem.
A rough outline:
pub struct Grid {
pages: Vec<Page>
}
impl Grid {
fn start(&mut self) {
loop {
let active_cells = (&mut self.pages).into_par_iter()
.map(|page| page.grow(GrowthPhase::Axon))
.sum();
//error: cannot borrow immutable borrowed content `*page` as mutable
// .map(|page| page.grow(GrowthPhase::Axon))
// ^~~~
if active_cells == 0 {
break;
}
}
}
}
pub struct Page {
cells: Vec<Cell>
}
impl Page {
pub fn grow(&mut self, phase: GrowthPhase) -> u32 {
// modifies state internally here
}
}
Would it be possible for Rayon to support this, or is there something subtle I'm missing? Thanks!
In general, it'd be nice if collect_into()
were more flexible and convenient. We may need multiple APIs in the end. Some things I would like:
collect()
that allocates a new collection, as with sequential iterators;collect_into()
into extend()
and have it work like extend()
?
RwLock<&mut HashMap>
Well, ok, that's not very organized. But I feel like we can put a bit more effort into making this ergonomic and "bridging" between various things. I'm torn. =)
I suspect we'll wind up with more than just collect_into
, in any case, though specialization might help here too.
As https://github.com/nikomatsakis/rayon/pull/16 indicated, incorrectly implementing ParallelIterator
can lead to crashes. We ought to just make the trait unsafe, I imagine (i.e., unsafe to implement, not to use).
I think I am just missing something in my haste to get something very small working…
From what I can tell:
thing.par_iter().map(…).sum()
does not do anything in parallel, it computes the maps sequentially.
I find that:
(0..n).into_par_iter…
fails if n
is u64
, but if I do:
(0..n as u32).into_par_iter…
everything works fine. Well mostly, but that is a different issue.
It would be nice to have one to allow uses like:
paths
.map(File::open)
.flat_map(Result::ok)
// ...
Currently, when a scope
is created, it does not shift into a worker thread. This complicates the logic mildly. In particular, then when the scope blocks waiting for its children, we don't know if that code is on a worker thread or not. Right now it just blocks -- but if it is on a worker thread, it should help.
My opinion is that we should (a) shift to a worker thread when a scope is created to execute the body of the scope and (b) help out when blocking.
I was thinking that if you have an &str
, you can split it into words/lines in parallel in a pretty simple way. Let's just think about making a par_split_whitespace
that acts like split_whitespace
:
as_bytes
here, I don't think we have a helper for this -- but utf-8 is designed to make such "alignment" easy, you just have to check for bytes with the 0x80
bit set I think).This will give you two roughly strings of roughly even length that we can further subdivide.
The same principle can be applied to make a par_lines
that acts like lines
.
I don't think you can do this in the general case for split()
, since it takes arbitrary functions, but you could probably do it for many cases (e.g., an arbitrary split-string).
This seems like a fun starter task for hacking on Rayon. If you're interested in taking it up, please reach out through the gitter channel.
UPDATE: par_lines()
has been implemented, but not par_split_whitespace()
; see this comment for details.
Currently we have a lot of method for doing "fold-like" things:
fn reduce_with<OP>(self, op: OP) -> Option<Self::Item>
where OP: Fn(Self::Item, Self::Item) -> Self::Item + Sync;
fn reduce_with_identity<OP>(self, identity: Self::Item, op: OP) -> Self::Item
where OP: Fn(Self::Item, Self::Item) -> Self::Item + Sync,
Self::Item: Clone + Sync;
fn reduce<REDUCE_OP>(self, reduce_op: &REDUCE_OP) -> Self::Item
where REDUCE_OP: ReduceOp<Self::Item>;
And of course there is also the (experimental, but faster in some cases) fold
operation:
fn fold<I,FOLD_OP,REDUCE_OP>(self,
identity: I,
fold_op: FOLD_OP,
reduce_op: REDUCE_OP)
-> I
where FOLD_OP: Fn(I, Self::Item) -> I + Sync,
REDUCE_OP: Fn(I, I) -> I + Sync,
I: Clone + Sync + Send;
Part of the challenge here is that parallel reduce is different from fold: you can't just start from the left and combine elements until the end. Instead, you wind up dividing the list into a kind of tree, folding each of those segments from left-to-right, and then reducing the results as you walk up the tree.
Each of the variations we currently have makes sense in its own way:
reduce_with
is simplest, but it's the most inefficient. Because it requires two values from the iterator before we can invoke the closure, it has to introduce options under the hood to track things. Maybe there is a nicer way to implement it?reduce_with_identity
avoids the problem of introducing Option
by requiring an initial element to use as the "accumulator" value. However, unlike in sequential fold, this value must be cloneable, which is unfortunate sometimes. (It has to be cloneable because we will need to use it for each of the parallel segments we wind up with.)reduce
was intended for letting people write custom operations that configure each detail. Its pretty silly for this to have the best name, but it might make sense to keep it around.fold
operation exposes this "tree-and-then-walk-back-up" structure directly. I added it to try and make a benchmark faster. It worked, but it's again somewhat complicated. I think I've decided that it does make sense to keep it around, but I'm not 100% sure.My feeling is that we should probably have a variation on reduce_with_identity
as the only reduce operation. It would instead have a closure that produces the identity, so that sum
looks like this:
iterator.reduce(|| 0, |a, b| a+b)
At that point, there is no need for the current reduce
, since the Reduce
trait just provides two methods that are basically equivalent to these two closures. Moreover, we can lift the Clone
bounds.
If you need to artifically inject an identity value, as the current reduce_with
does, it can be expressed like so:
iterator.reduce_with(|| None, |a, b| match (a, b) {
(Some(a), Some(b)) => a+b,
(s @ Some(_), None) | (None, s @ Some(_)) => s,
(None, None) => None,
})
But I think nobody ever will.
So that just leaves the question of whether to keep fold
around. My feeling now is that we should, but modify it to take a closure. I can easily imagine cases where it's useful to be able to retain state across the sequential folds, which fold
makes possible. fold
also basically corresponds to the map-reduce
problem -- you can express this as .map().reduce()
but sometimes it more efficient to convert N items into the new type than to convert each item individually, which is basically the point.
As an example, the nbody demo uses fold like this:
iterator.fold(
|| (0, 0), // represents the total "difference" in current position of this item
|(diff1, diff2), item| ..., // adjust diff1, diff2 based on item
|(diff1, diff2), (diff1b, diff2b)| (diff1 + diff1b, diff2 + diff2b));
Sound good?
cc @jorendorff, @cuviper -- with whom I talked about this
There is no indication how or whether rayon is open source.
Currently there is no way to force par_iter to run in parallel and not fall back to sequential execution (except setting weight to an arbitrarily high value).
I think a simple method that guarantees the par_iter will run in parallel would suffice.
Currently if you want parallelize iterating through anything that can't easily be turned into a Vec<T>
or &[T]
you have perform a collect
, which can be quite slow on large collections.
hi, i’d like to put things coming out of a ParallelIterator into a HashMap.
currently i don’t see a way to do it, so i’d either like to see a for_each_mut
that takes a FnMut
or something else.
The scope API introduced in https://github.com/nikomatsakis/rayon/pull/71 heap-allocates jobs. The initial version leaked all of those heap-allocated boxes. While that has been fixed (I think), there is no automatic detection of that in the travis testing.
The ideal would be to use valgrind or some other such tool, but that's a bit tricky (perhaps with a custom allocation crate or alloc_system
? haven't tried). I experimented with custom counters and some other approaches but they did not work:
The threshold and/or the cost function is highly dependant on the workload.
For maximum performance on different workloads the user should be able to modify them.
crates.io has 0.1.0 and 0.2.0.
i want to see 0.2.0 code on github, but can’t because the versions aren’t tagged.
Currently parallel iterators assume cheap operations. I am thinking that should be changed to assume expensive operations (i.e., fine-grained parallel splits), and have people opt-in to the current behavior by manually adjusting the weights or calling weight_min
. My reasoning is:
Thoughts?
We recently introduced the InitError
for "configuration gone wrong". This enum should implement the Error
trait.
At the moment we only have an impl on &[T]
. This makes it expensive to do anything on large objects - you need to clone, especially if you want to reduce.
A possible way could be to pull apart a Vec
by way of unsafe code. We could use split_off
, but that would be expensive for large n-sizes.
Is there any way to add parallelism with rayons to loops that look like this:
for x in 1..g.width() - 1 {
for y in 1..g.height() - 1 {
...processing....
}
}
I do not even know if this would be possible in theory, but very helpful for some problems.
Hi:
https://crates.io/crates/rayon clames rayon = "0.4.0"
but RELEASES.md clames Release 0.4 (not yet released)
What is new in 0.4.0?
To test unsafe consumers like collect, we need some negative tests of evil producers that fail to uphold their end of the bargain in various ways; we would want to be sure that the collect
consumer doesn't fail under those cases.
Here are some ideas:
complete
Since we are guaranteed to execute closure A on the main thread, it occurs to me that it does not really need to be Send
. Lifting this restriction would add a bit of flexibility. For example, one could thread some non-thread-safe data through the first closure, spawning tasks with the right closure. (Though you'd still have to write tasks in a kind of recursive style.)
UPDATE: This doesn't quite work, but see this comment.
→ rustc -V
rustc 1.9.0 (e4e8b6668 2016-05-18)
→ cargo -V
cargo 0.10.0-nightly (10ddd7d 2016-04-08)
Compiling rayon v0.4.3
Running rustc /Users/daniel/.cargo/registry/src/github.com-88ac128001ac3a9a/rayon-0.4.3/src/lib.rs --crate-name rayon --crate-type lib -g -C metadata=5441ea551292b9c3 -C extra-filename=-5441ea551292b9c3 --out-dir /Users/daniel/game/target/debug/deps --emit=dep-info,link -L dependency=/Users/daniel/game/target/debug/deps -L dependency=/Users/daniel/game/target/debug/deps --extern rand=/Users/daniel/game/target/debug/deps/librand-c724acb3942597d1.rlib --extern num_cpus=/Users/daniel/game/target/debug/deps/libnum_cpus-0d0496d141db0602.rlib --extern libc=/Users/daniel/game/target/debug/deps/liblibc-56a988296c48c60c.rlib --extern deque=/Users/daniel/game/target/debug/deps/libdeque-9a8e24ef39a44da6.rlib --cap-lints allow
Fresh flate2 v0.2.14
Fresh core-foundation-sys v0.2.2
Fresh cgl v0.1.5
Fresh png v0.5.2
Fresh core-foundation v0.2.2
Fresh time v0.1.35
Fresh piston-float v0.2.0
Fresh core-graphics v0.3.2
Fresh vecmath v0.2.0
Fresh piston-viewport v0.2.0
Fresh cocoa v0.3.3
Fresh pistoncore-input v0.14.0
Fresh piston2d-graphics v0.18.0
Fresh glutin v0.6.2
Fresh pistoncore-window v0.23.0
Fresh pistoncore-event_loop v0.26.0
Fresh pistoncore-glutin_window v0.31.0
Fresh piston v0.26.0
/Users/daniel/.cargo/registry/src/github.com-88ac128001ac3a9a/rayon-0.4.3/src/par_iter/internal.rs:96:47: 96:48 error: no rules expected the token ;
/Users/daniel/.cargo/registry/src/github.com-88ac128001ac3a9a/rayon-0.4.3/src/par_iter/internal.rs:96 thread_local!{ static ID: bool = false; }
^
error: Could not compile rayon
.
Caused by:
Process didn't exit successfully: rustc /Users/daniel/.cargo/registry/src/github.com-88ac128001ac3a9a/rayon-0.4.3/src/lib.rs --crate-name rayon --crate-type lib -g -C metadata=5441ea551292b9c3 -C extra-filename=-5441ea551292b9c3 --out-dir /Users/daniel/game/target/debug/deps --emit=dep-info,link -L dependency=/Users/daniel/game/target/debug/deps -L dependency=/Users/daniel/game/target/debug/deps --extern rand=/Users/daniel/game/target/debug/deps/librand-c724acb3942597d1.rlib --extern num_cpus=/Users/daniel/game/target/debug/deps/libnum_cpus-0d0496d141db0602.rlib --extern libc=/Users/daniel/game/target/debug/deps/liblibc-56a988296c48c60c.rlib --extern deque=/Users/daniel/game/target/debug/deps/libdeque-9a8e24ef39a44da6.rlib --cap-lints allow
(exit code: 101)
https://github.com/nikomatsakis/rayon/blob/c91ea503ba0dee8dd1cf3575e8a4adf86e2d33d8/src/par_iter/internal.rs#L96
Any ideas? (Mac OSX)
Currently Rayon uses all the CPUs / cores available (determined by num_cpus::get()). It would be nice if the user can specify a maximum limit of how many threads Rayon is allowed to use in parallel.
Not sure how possible this is, but I'd like to be able to use parallel iterators with non-commutative reduction functions (e.g., collecting error messages in a deterministic order).
Maybe best to use .enumerate()
and .sort()
a Vec
populated by the parallel iterator?
I have been seeing travis builds fail unreliably in the negative panic tests. I'll try to pin this down at some point.
cc @Amanieu
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.