Comments (4)
This is a good question, and is something I initially struggled with as well. I'm hesitant to add something like a complete seed specification because it really won't generalize across methods (hence, if you are using MPI/network-parallel/local parallel/single core set-ups the seeds will all do different things). So I don't think there is an easy, transparent, or even safe way to do this, so I don't think I'll be adding support for it any time soon. IMHO seeds are usually a false sense of security in MCSs, so I tend not to recommend their use.
That being said, I don't actually think you need to set the seeds to do what you want. For example, if you passed the save_generate_data = TRUE
while performing your pilot studies then all the datasets will be saved in your working directory as suitable objects (I only recommend doing this for debugging/pilot studies....full blow MCS could very well fill your hard-drive!). Then you can just pass an edit = 'analyse'
call and read in the dataset of interest manually with readRDS('/path/to/file.rds')
. This would skip all the mess of setting seeds, and already put you in R's debugging mode with the suitable data of interest already active.
This approach could also be done in parallel too, so you could still pass the parallel=TRUE
flags at any point while saving the datasets in the pilot runs (which is a huge bonus IMO, because the faster all computations are performed the faster all the bugs can be chased down anyway). Conceptually I find this a lot nicer than dealing with seeds, and IMO I think it will be easier to track things down. Let me know if this is a suitable solution to you.
from simdesign.
Actually, now that I think about it there may be a better way to do this without completely saving all the datasets and wasting a good deal of hard-drive space. Let me chew on this for a few days and I'll see what I can come up with.
from simdesign.
Thanks for your quick and thoughtful reply!
I was aware of the save_generate_data = TRUE
possibility, the reason why I am not so fan of it is precisely the one you gave: it will very well fill the hard-drive (or your assigned disk quota in a server) for a large MCS. Going for an approach of first having a "pilot study and debug" -as you described- plus a "large study" is nice conceptualy and it works fine if you are able to catch all the bugs at the first stage. But sometimes you will find bugs that will only show up in very rare situations (or at least that happens to me), hence the need of being able to replicate a particular run of the large study to see what was the problem. While saving all datasets is not an option, saving the seed is a succinct way of allowing replication.
But I see that the main problem is to have a coherent way of fixing the seed across for different methods when you go parallel. It will be of little use to have a complete seed specification that you can only replicate under the particular method in which it was used.
I will keep an eye on the other approaches. An "if-unexpected-result-then-save-dataset" option might do partially the job for the runs with error, albeit it will only be valid for replicationg those ones.
Thanks again!
from simdesign.
Okay, so I've provided a pretty decent and robust solution to the problem. It's now possible to save and read-in the .Random.seed
terms just prior to simulating each replication. So if something goes wrong you can have a complete history of how to generate each specific cell. These are exported as simple text files so they shouldn't take up much room at all on your hard-drive as well, even for very large simulations. You can check the test
folder to see how this works for now (I'll probably make a wiki example sometime though, but it's pretty straightforward). HTH.
from simdesign.
Related Issues (20)
- Rstudio server crashes when debugging runSimulation() HOT 5
- Is it possible to capture all errors when using multiple analysis functions? HOT 9
- Error in load_packages(packages) : task 1 failed - "could not find function "load_packages"" HOT 6
- Code in SimDesign Package Won't Run for window 10!! HOT 1
- Implicit activation of parallelization can be risky HOT 8
- Question / Issue HOT 5
- Variable naming error when using `runSimulation()` HOT 5
- Implement sample size planning functions HOT 2
- Implement MCSE for performance measure functions HOT 2
- runSimulation fails if design and estimate are named the same HOT 1
- Validate function outputs HOT 4
- Random number seed management HOT 1
- Improve documentation on parallel processing HOT 5
- Features to support runnign as array jobs on a cluster HOT 15
- aggregate_simulations and selective loading of results HOT 5
- aggregate_simulations and summary results HOT 5
- Checking for incomplete designs in large simulations HOT 1
- Seeds and reproducibility in distributed simulations HOT 11
- Allow RAM testing considerations
- check design dataset on resume HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from simdesign.