Coder Social home page Coder Social logo

Question / Issue about simdesign HOT 5 CLOSED

jgman86 avatar jgman86 commented on June 6, 2024
Question / Issue

from simdesign.

Comments (5)

philchalmers avatar philchalmers commented on June 6, 2024 1

Hi Jan,

Could this be attributed to a version-specific change? It's worth noting that everything operates as expected on my home setup.

That is my first suspicion as I have had to modify a few of the internal objects to allow more flexible storage properties. Apologies if this is what is breaking your code, though I believe the changes are the most optimal moving forward.

On further exploration, I discerned that it's feasible to store the intermediate results from the analyse function directly, eliminating the need for an additional saving routine:

I can see this as being a problem for sure, particularly because the files are not checked for their uniqueness before saving. That's something I'll add to the next version so that such an issue is avoidable and valuable computing time is not lost.

In your case for now, it may be better to supply unique dirnames that are associated with the Design row identifies (even just the row number). Alternately, if your HPC RAM is not a huge issue or your analysis results are not overly large, you can use the store_results = TRUE flag to store the results internally rather than writing them to the drive. They could then be extracted after the fact on an object-by-object basis using SimExtract(). HTH.

from simdesign.

philchalmers avatar philchalmers commented on June 6, 2024 1

I've allowed for a new save_results_filename to be supplied which disables the save checks for directory storage uniqueness, thereby allowing all files to be stored to the same specified directory asynchronously. That should solve your issue so long as you pass a unique save_results_filename argument for each row condition. Note that file names are not checked with this approach, so the unique names will fall on user side. Feel free to reopen if there are issues, and thanks for the report!

from simdesign.

jgman86 avatar jgman86 commented on June 6, 2024

Hi Phil,

yes I did this and supplied the unique dirnames, which works as expected. However, because it now generates 10k - 60k folders with one file each, the saving feature would be - for larger simulations which are heavily parallelized to one row per job - a big W.

Another feature I was thinking about was the support for the h5 - fileformat for the saving routine. In my case, I'm saving all information in an .RDS file at the end of the analyse function - including the draws of the MCMC fitting (10 - 30 MB per file). For such a huge amount of small files, the I/O process takes a lot of time and ressources (RAM is insufficient on normal machines), because it is only possible to read in the whole file, including also large objects within the list (e.g. the draws).

A big advantage of the h5-file system (Link)
is that it is possible to directly read desired data from the file, without the need of further processing operations (e.g. loop over all data to read in, save the desired objects from the list on each iteration of the loop). It has a hierachical (file-system like) architecture. For SimDesign this would allow to store all information of a simulation in a single file with an hieracichal structure like this:

##         group   name       otype   dclass       dim
## 0           /    SimGenerate   H5I_GROUP                   
## 1        /SimGenerate/     df_gen H5I_DATASET COMPOUND         5
## 2           /    SimAnalyse   H5I_GROUP                   
## 3        /SimAnalyse      df_ResultsA H5I_DATASET  FLOAT   100 x 20
## 4        /SimAnalyse      df_ResultsB H5I_DATASET  FLOAT  100 x 10
## 5         / simSummarise               
## 6      /simSummarise/    SummariseObjects H5I_DATASET   FLOAT  10 x 10

In the context of simulation, this seems to be a pretty nice feature ! Thanks for your support !

from simdesign.

philchalmers avatar philchalmers commented on June 6, 2024

The h5 approach seems interesting, but I worry it may be overkill for most users. Of course, if you are looping over each simulation condition independently then nothing really stops you from implementing your own h5 structure to store the object information directly (the store_results = TRUE approach).

w.r.t. this issue it might be easiest to allow the filenames to be better exposed to the user so at least you have control over the naming structures, as I agree so many folders containing one file is completely unnecessary.

from simdesign.

jgman86 avatar jgman86 commented on June 6, 2024

That's awsome ! I think this is really W !

from simdesign.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.