Comments (8)
Hi @HenrikBengtsson!
First, thank you very much for taking the time to inspect + open this issue! You raise a good point about silent behaviour issues that I didn't fully think about. That said, I find the pbapply
approach to be jarringly atypical of the future
style, which is why I avoided such argument declarations in the current approach (e.g., early on I contemplated adding a future = TRUE
argument to do something similar to overloading the cl
input but decided against it as it felt awkward).
I agree that checking whether the package is attached is not ideal, but would still prefer a standard future-type specification. I wonder if it's possible to check whether the default plan()
has been overwritten, in which case the use of future.apply()
would have less ambiguity (e.g., if plan is other than the stock-standard sequential default with ncpus=1
then the front-end user clearly meant to use a different computational plan, regardless of whether this is in a main.R
file or source()
ed at some point). That's slightly better than checking whether future
was simply attached, though of course doesn't fix the issue sourcing in files without knowing about package or function masking consequences.
Any thoughts you have in this area are appreciated as balancing the global specification approach used by future
is a little tricky to navigate. Cheers.
Phil
from simdesign.
I see that one can use
if(is(future::plan(), "sequential") { ... }
which appears to be a reasonable solution, however it doesn't appear correct in the situation where multiple plan()
s are defined.
library(future)
plan(list('multisession', 'sequential'))
is(plan(), 'multisession') # TRUE
is(plan(), 'sequential') # FALSE
plan(list('sequential', 'multisession'))
is(plan(), 'multisession') # FALSE
is(plan(), 'sequential') # TRUE
from simdesign.
The non-documented future::plan("next")
returns the next future strategy on the stack. There's also nbrOfWorkers()
, which returns 1L
for sequential. OTH, it can return 1L
also for other backends.
from simdesign.
For the bigger question: As you probably understand by now, I'm trying to avoid function arguments that control how and if a specific function runs in parallel. With futureverse, I'm trying to push toward that goal as far as I ever can. I consider that too low-level specific for an API that does analysis.
Looking at your Analysis()
, you can several different options for parallelization:
if("future" %in% (.packages())){
...
} else if(is.null(cl)){
...
results <- if(progress){
try(pbapply::pblapply(1L:replications, mainsim, condition=condition,
...)
} else {
try(lapply(1L:replications, mainsim, condition=condition,
...)
}
} else {
if(MPI){
...
results <- try(foreach(i=1L:replications, .export=export_funs, .packages=packages,
.options.mpi=.options.mpi) %dopar%
...)
} else {
...
results <- if(progress){
try(pbapply::pblapply(1L:replications, mainsim,
...)
} else {
try(parallel::parLapply(cl, 1L:replications, mainsim,
...)
}
}
}
I think you can replace all special cases with that single future_lapply()
version. Then you can remove arguments cl
and MPI
, and let the current plan()
control how parallelization is done, and if not specified, then it defaults to sequential processing.
Another advantage of this approach is that you no longer have to write separate package tests for each of those cases to make sure you have a high test code coverage. Testing and validation toward different backends is done by the futureverse framework, so you don't have to worry about it (https://www.futureverse.org/quality.html).
So that's my view and take on it. That said, I don't want to "force" futureverse on anyone, and I understand there are other reasons for using alternatives.
from simdesign.
plan is other than the stock-standard sequential default with ncpus=1 then the front-end user clearly meant to use a different computational plan
Note that your function might be called in a parallel worker by some other code. Then, at least in the future framework, the default is to run with plan(sequential)
to avoid CPU overuse from nested parallelization.
Point is, it's really hard to predict how, where, and in what context ones will be used. We also don't control what happens in the future, so an update to another package might change this all of a sudden.
from simdesign.
Thanks so much for the detailed replies; they have given me a lot to think about. I've decided to use a parallel = 'future'
approach instead of the previous behaviour, though while doing so I've uncovered somewhat of a snag using the new tests (e.g., some internal objects must be exported as they are not visible when using a plan other than plan(sequential)
).
For now I'll roll back the current version on CRAN until a suitable future
option is available and well tested in this package. Thanks again for all your help and pointing me in better directions moving forward.
P.S., While I have your attention, if you're able to direct me to some future
equivalent of parallel::clusterExport(cl, ..., envir)
that would be very helpful since in the current setup I export objects for different enviroment locations at runtime. I can see this is possible with future::plan(...)
, though in my early attempts this has failed quite miserably and doesn't feel kosher.
from simdesign.
While I have your attention, if you're able to direct me to some future equivalent of parallel::clusterExport(cl, ..., envir) that would be very helpful since in the current setup I export objects for different enviroment locations at runtime.
There is no counterpart - by design. The way to think about futures is that they may end up running anywhere, and we should assume it runs in a fresh environment each time. Futureverse tries to identify all global variables needed automatically, but it's not 100%. But, you guide Futureverse in the right direction when this happens. See https://future.futureverse.org/articles/future-4-issues.html#missing-globals-false-negatives for common solutions. If a function/method is missing, then it could be that it fails to detect required packages. See the same vignette for how to guide what packages should be added.
from simdesign.
Looks to be patched now. Thanks again for all your helpful comments and references to solutions! The current specification now works well with the following structure
plan(multisession)
results <- runSimulation(..., parallel = 'future', ...)
from simdesign.
Related Issues (20)
- Rstudio server crashes when debugging runSimulation() HOT 5
- Is it possible to capture all errors when using multiple analysis functions? HOT 9
- Error in load_packages(packages) : task 1 failed - "could not find function "load_packages"" HOT 6
- Code in SimDesign Package Won't Run for window 10!! HOT 1
- Question / Issue HOT 5
- Variable naming error when using `runSimulation()` HOT 5
- Implement sample size planning functions HOT 2
- Implement MCSE for performance measure functions HOT 2
- runSimulation fails if design and estimate are named the same HOT 1
- Validate function outputs HOT 4
- Random number seed management HOT 1
- Improve documentation on parallel processing HOT 5
- Features to support runnign as array jobs on a cluster HOT 15
- aggregate_simulations and selective loading of results HOT 5
- aggregate_simulations and summary results HOT 5
- Checking for incomplete designs in large simulations HOT 1
- Seeds and reproducibility in distributed simulations HOT 11
- Allow RAM testing considerations
- check design dataset on resume HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from simdesign.