Comments (5)
Hi Jan,
Could this be attributed to a version-specific change? It's worth noting that everything operates as expected on my home setup.
That is my first suspicion as I have had to modify a few of the internal objects to allow more flexible storage properties. Apologies if this is what is breaking your code, though I believe the changes are the most optimal moving forward.
On further exploration, I discerned that it's feasible to store the intermediate results from the analyse function directly, eliminating the need for an additional saving routine:
I can see this as being a problem for sure, particularly because the files are not checked for their uniqueness before saving. That's something I'll add to the next version so that such an issue is avoidable and valuable computing time is not lost.
In your case for now, it may be better to supply unique dirname
s that are associated with the Design
row identifies (even just the row number). Alternately, if your HPC RAM is not a huge issue or your analysis results are not overly large, you can use the store_results = TRUE
flag to store the results internally rather than writing them to the drive. They could then be extracted after the fact on an object-by-object basis using SimExtract()
. HTH.
from simdesign.
I've allowed for a new save_results_filename
to be supplied which disables the save checks for directory storage uniqueness, thereby allowing all files to be stored to the same specified directory asynchronously. That should solve your issue so long as you pass a unique save_results_filename
argument for each row condition. Note that file names are not checked with this approach, so the unique names will fall on user side. Feel free to reopen if there are issues, and thanks for the report!
from simdesign.
Hi Phil,
yes I did this and supplied the unique dirnames, which works as expected. However, because it now generates 10k - 60k folders with one file each, the saving feature would be - for larger simulations which are heavily parallelized to one row per job - a big W.
Another feature I was thinking about was the support for the h5 - fileformat for the saving routine. In my case, I'm saving all information in an .RDS file at the end of the analyse function - including the draws of the MCMC fitting (10 - 30 MB per file). For such a huge amount of small files, the I/O process takes a lot of time and ressources (RAM is insufficient on normal machines), because it is only possible to read in the whole file, including also large objects within the list (e.g. the draws).
A big advantage of the h5-file system (Link)
is that it is possible to directly read desired data from the file, without the need of further processing operations (e.g. loop over all data to read in, save the desired objects from the list on each iteration of the loop). It has a hierachical (file-system like) architecture. For SimDesign this would allow to store all information of a simulation in a single file with an hieracichal structure like this:
## group name otype dclass dim
## 0 / SimGenerate H5I_GROUP
## 1 /SimGenerate/ df_gen H5I_DATASET COMPOUND 5
## 2 / SimAnalyse H5I_GROUP
## 3 /SimAnalyse df_ResultsA H5I_DATASET FLOAT 100 x 20
## 4 /SimAnalyse df_ResultsB H5I_DATASET FLOAT 100 x 10
## 5 / simSummarise
## 6 /simSummarise/ SummariseObjects H5I_DATASET FLOAT 10 x 10
In the context of simulation, this seems to be a pretty nice feature ! Thanks for your support !
from simdesign.
The h5 approach seems interesting, but I worry it may be overkill for most users. Of course, if you are looping over each simulation condition independently then nothing really stops you from implementing your own h5 structure to store the object information directly (the store_results = TRUE
approach).
w.r.t. this issue it might be easiest to allow the filenames to be better exposed to the user so at least you have control over the naming structures, as I agree so many folders containing one file is completely unnecessary.
from simdesign.
That's awsome ! I think this is really W !
from simdesign.
Related Issues (20)
- Rstudio server crashes when debugging runSimulation() HOT 5
- Is it possible to capture all errors when using multiple analysis functions? HOT 9
- Error in load_packages(packages) : task 1 failed - "could not find function "load_packages"" HOT 6
- Code in SimDesign Package Won't Run for window 10!! HOT 1
- Implicit activation of parallelization can be risky HOT 8
- Variable naming error when using `runSimulation()` HOT 5
- Implement sample size planning functions HOT 2
- Implement MCSE for performance measure functions HOT 2
- runSimulation fails if design and estimate are named the same HOT 1
- Validate function outputs HOT 4
- Random number seed management HOT 1
- Improve documentation on parallel processing HOT 5
- Features to support runnign as array jobs on a cluster HOT 15
- aggregate_simulations and selective loading of results HOT 5
- aggregate_simulations and summary results HOT 5
- Checking for incomplete designs in large simulations HOT 1
- Seeds and reproducibility in distributed simulations HOT 11
- Allow RAM testing considerations
- check design dataset on resume HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from simdesign.