Hi Phil, First of all, thanks for your wonderful package! I find it

Seeds for each replication inside a condition? about simdesign HOT 4 CLOSED

philchalmers commented on May 26, 2024

Seeds for each replication inside a condition?

from simdesign.

Comments (4)

philchalmers commented on May 26, 2024

This is a good question, and is something I initially struggled with as well. I'm hesitant to add something like a complete seed specification because it really won't generalize across methods (hence, if you are using MPI/network-parallel/local parallel/single core set-ups the seeds will all do different things). So I don't think there is an easy, transparent, or even safe way to do this, so I don't think I'll be adding support for it any time soon. IMHO seeds are usually a false sense of security in MCSs, so I tend not to recommend their use.

That being said, I don't actually think you need to set the seeds to do what you want. For example, if you passed the save_generate_data = TRUE while performing your pilot studies then all the datasets will be saved in your working directory as suitable objects (I only recommend doing this for debugging/pilot studies....full blow MCS could very well fill your hard-drive!). Then you can just pass an edit = 'analyse' call and read in the dataset of interest manually with readRDS('/path/to/file.rds'). This would skip all the mess of setting seeds, and already put you in R's debugging mode with the suitable data of interest already active.

This approach could also be done in parallel too, so you could still pass the parallel=TRUE flags at any point while saving the datasets in the pilot runs (which is a huge bonus IMO, because the faster all computations are performed the faster all the bugs can be chased down anyway). Conceptually I find this a lot nicer than dealing with seeds, and IMO I think it will be easier to track things down. Let me know if this is a suitable solution to you.

from simdesign.

philchalmers commented on May 26, 2024

Actually, now that I think about it there may be a better way to do this without completely saving all the datasets and wasting a good deal of hard-drive space. Let me chew on this for a few days and I'll see what I can come up with.

from simdesign.

egarpor commented on May 26, 2024

Thanks for your quick and thoughtful reply!

I was aware of the save_generate_data = TRUE possibility, the reason why I am not so fan of it is precisely the one you gave: it will very well fill the hard-drive (or your assigned disk quota in a server) for a large MCS. Going for an approach of first having a "pilot study and debug" -as you described- plus a "large study" is nice conceptualy and it works fine if you are able to catch all the bugs at the first stage. But sometimes you will find bugs that will only show up in very rare situations (or at least that happens to me), hence the need of being able to replicate a particular run of the large study to see what was the problem. While saving all datasets is not an option, saving the seed is a succinct way of allowing replication.

But I see that the main problem is to have a coherent way of fixing the seed across for different methods when you go parallel. It will be of little use to have a complete seed specification that you can only replicate under the particular method in which it was used.

I will keep an eye on the other approaches. An "if-unexpected-result-then-save-dataset" option might do partially the job for the runs with error, albeit it will only be valid for replicationg those ones.

Thanks again!

from simdesign.

philchalmers commented on May 26, 2024

Okay, so I've provided a pretty decent and robust solution to the problem. It's now possible to save and read-in the .Random.seed terms just prior to simulating each replication. So if something goes wrong you can have a complete history of how to generate each specific cell. These are exported as simple text files so they shouldn't take up much room at all on your hard-drive as well, even for very large simulations. You can check the test folder to see how this works for now (I'll probably make a wiki example sometime though, but it's pretty straightforward). HTH.

from simdesign.

Seeds for each replication inside a condition? about simdesign HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent