Coder Social home page Coder Social logo

Comments (4)

philchalmers avatar philchalmers commented on May 26, 2024

This is a good question, and is something I initially struggled with as well. I'm hesitant to add something like a complete seed specification because it really won't generalize across methods (hence, if you are using MPI/network-parallel/local parallel/single core set-ups the seeds will all do different things). So I don't think there is an easy, transparent, or even safe way to do this, so I don't think I'll be adding support for it any time soon. IMHO seeds are usually a false sense of security in MCSs, so I tend not to recommend their use.

That being said, I don't actually think you need to set the seeds to do what you want. For example, if you passed the save_generate_data = TRUE while performing your pilot studies then all the datasets will be saved in your working directory as suitable objects (I only recommend doing this for debugging/pilot studies....full blow MCS could very well fill your hard-drive!). Then you can just pass an edit = 'analyse' call and read in the dataset of interest manually with readRDS('/path/to/file.rds'). This would skip all the mess of setting seeds, and already put you in R's debugging mode with the suitable data of interest already active.

This approach could also be done in parallel too, so you could still pass the parallel=TRUE flags at any point while saving the datasets in the pilot runs (which is a huge bonus IMO, because the faster all computations are performed the faster all the bugs can be chased down anyway). Conceptually I find this a lot nicer than dealing with seeds, and IMO I think it will be easier to track things down. Let me know if this is a suitable solution to you.

from simdesign.

philchalmers avatar philchalmers commented on May 26, 2024

Actually, now that I think about it there may be a better way to do this without completely saving all the datasets and wasting a good deal of hard-drive space. Let me chew on this for a few days and I'll see what I can come up with.

from simdesign.

egarpor avatar egarpor commented on May 26, 2024

Thanks for your quick and thoughtful reply!

I was aware of the save_generate_data = TRUE possibility, the reason why I am not so fan of it is precisely the one you gave: it will very well fill the hard-drive (or your assigned disk quota in a server) for a large MCS. Going for an approach of first having a "pilot study and debug" -as you described- plus a "large study" is nice conceptualy and it works fine if you are able to catch all the bugs at the first stage. But sometimes you will find bugs that will only show up in very rare situations (or at least that happens to me), hence the need of being able to replicate a particular run of the large study to see what was the problem. While saving all datasets is not an option, saving the seed is a succinct way of allowing replication.

But I see that the main problem is to have a coherent way of fixing the seed across for different methods when you go parallel. It will be of little use to have a complete seed specification that you can only replicate under the particular method in which it was used.

I will keep an eye on the other approaches. An "if-unexpected-result-then-save-dataset" option might do partially the job for the runs with error, albeit it will only be valid for replicationg those ones.

Thanks again!

from simdesign.

philchalmers avatar philchalmers commented on May 26, 2024

Okay, so I've provided a pretty decent and robust solution to the problem. It's now possible to save and read-in the .Random.seed terms just prior to simulating each replication. So if something goes wrong you can have a complete history of how to generate each specific cell. These are exported as simple text files so they shouldn't take up much room at all on your hard-drive as well, even for very large simulations. You can check the test folder to see how this works for now (I'll probably make a wiki example sometime though, but it's pretty straightforward). HTH.

from simdesign.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.