Coder Social home page Coder Social logo

Comments (15)

jvegreg avatar jvegreg commented on August 18, 2024 1

I see now that my suggestion was not correctly written in my comment, for some reason.

Anyway, I still think that we should split and reverse the namelist and date part from the path:

<output_dir>/<namelist>/YYYYMMDD_HHMMSS/preproc/

This will make a lot easier to search for outputs if you run multiple namelists multiple times

If we don't want to add another nesting level, I think we should at least switch the order.

<output_dir>/<namelist>_YYYYMMDD_HHMMSS/preproc/

from esmvaltool.

mattiarighi avatar mattiarighi commented on August 18, 2024 1

Ready to merge (PR167)

The output of the perfmetrics namelist now looks like this:

<output_dir>/tmp
<output_dir>/tmp/ANNUAL_CYCLE_ta
<output_dir>/preproc
<output_dir>/preproc/CMIP5
<output_dir>/preproc/OBS
<output_dir>/work
<output_dir>/work/perfmetrics_main
<output_dir>/plots
<output_dir>/plots/perfmetrics_main

Don't forget to update your config-user.yml file!

from esmvaltool.

nielsdrost avatar nielsdrost commented on August 18, 2024

New plan:

Have two folder, one for really "temporary files", and one for output.

temporary files default to /tmp

the output folder will have further subfolders automatically generated for plots, netdf output, preprocessed files, etc.

from esmvaltool.

bouweandela avatar bouweandela commented on August 18, 2024

To create a temporary dir, it is possibly best to use the standard library function tempfile.mkdtemp

from esmvaltool.

bouweandela avatar bouweandela commented on August 18, 2024

At the moment, if the output directory already exists, it is moved to some other name and a new directory is created. This can cause nasty surprises for users, who suddenly find their files moved. Of course this can be remedied by generating a new config(-user).yml file for every run, but that is not very convenient.

A better setup would be the following:
Inside the run_dir defined in config.yml, create a directory with the name of the namelist and current datetime, e.g. for a run starting on 2017-09-25 15:48:03 UTC, create /path/to/run_dir/20170925_154803_namelist_MyVar and put all output that goes to run_dir in there.

For the other paths specified in config.yml:

  • preproc_dir
  • work_dir
  • plot_dir
    If they are relative paths, put them inside the directory mentioned above, e.g.
    /path/to/run_dir/20170925_154803_namelist_MyVar/preproc_dir
    /path/to/run_dir/20170925_154803_namelist_MyVar/work_dir
    /path/to/run_dir/20170925_154803_namelist_MyVar/plots_dir
    if they are absolute paths, create a namelist + current datetime subfolder inside them too, like so:
    /path/to/preproc_dir/20170925_154803_namelist_MyVar
    /path/to/work_dir/20170925_154803_namelist_MyVar
    /path/to/plot_dir/20170925_154803_namelist_MyVar

This is convenient and minimizes the risk of overwriting files.

Edge case: it seems very unlikely that two namelists are started at the exact same second, but if this happens, we can add _1, _2, etc to the name, or raise an exception, TBD.

from esmvaltool.

valeriupredoi avatar valeriupredoi commented on August 18, 2024

from esmvaltool.

mattiarighi avatar mattiarighi commented on August 18, 2024

I find the structure of the output directory a bit confusing. An unexperienced user may have problems finding the output.

I think we should have only 2 output paths in config (as originally suggested): one for the output (output_dir), containing the subdirs preproc, work and plots, and one for the temporary files (run_dir or tmp_dir), containing a subdir with the diagnostic id (I don't think we need to have another subdir interface_data here).

The directory structure should look like

<output_dir>/YYYYMMDD_HHMMSS_<namelist>/preproc/
<output_dir>/YYYYMMDD_HHMMSS_<namelist>/work/
<output_dir>/YYYYMMDD_HHMMSS_<namelist>/plots/

<tmp_dir>/YYYYMMDD_HHMMSS_<namelist>/<diag_id>/

There is no risk of overwriting files in work and plots, as every diagnostic in the namelist produces different ones. The advantage would be that the files in preproc could be recycled, for example when 2 variables in the same namelist are processed using the same preproc_id.

One problem I could foreseen is if we are going to allow for parallel execution of the diagnostics from the same namelist. But in that case we would need anyway to sort out in advance the diagnostics using the same preproc_id, to allow for recycling the preprocessed file, and also to manage the dependencies (i.e., diagnostics which needs other diagnostics to be run first).

from esmvaltool.

jvegreg avatar jvegreg commented on August 18, 2024

We also have the issue that we can not use previously preprocessed files with any of this structures...
... but anyway, I also think that the preproc dir should be in the output folder.

One small thing: I think we should also change the folder to the following structure
<output_dir>//YYYYMMDD_HHMMSS /

My output folder is getting too many folders and also becomes difficult to find which ones correspond to a given namelist.

from esmvaltool.

mattiarighi avatar mattiarighi commented on August 18, 2024

We also have the issue that we can not use previously preprocessed files with any of this structures...

Right, only within the same namelist. But I think this is OK for the moment: recycling of preproc files across different namelists shouldn't be a very common case (@axel-lauer ?).

One small thing: I think we should also change the folder to the following structure
<output_dir>//YYYYMMDD_HHMMSS /

Fine with me.

from esmvaltool.

valeriupredoi avatar valeriupredoi commented on August 18, 2024

from esmvaltool.

mattiarighi avatar mattiarighi commented on August 18, 2024

As discussed in the telecon, we now go for 1 user-specified output path (output_dir), containing the following:

<output_dir>/YYYYMMDD_HHMMSS_<namelist>/preproc/
<output_dir>/YYYYMMDD_HHMMSS_<namelist>/work/
<output_dir>/YYYYMMDD_HHMMSS_<namelist>/plots/
<output_dir>/YYYYMMDD_HHMMSS_<namelist>/tmp/<diag_id>/

The path is mandatory.

from esmvaltool.

valeriupredoi avatar valeriupredoi commented on August 18, 2024

from esmvaltool.

nielsdrost avatar nielsdrost commented on August 18, 2024

Sound good. I have a slight preference for the "combined" format, but not by much.

<output_dir>/<namelist>_YYYYMMDD_HHMMSS/

from esmvaltool.

mattiarighi avatar mattiarighi commented on August 18, 2024

I'd also like to avoid too much nesting.

from esmvaltool.

valeriupredoi avatar valeriupredoi commented on August 18, 2024

from esmvaltool.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.