Coder Social home page Coder Social logo

Comments (10)

ThomasMBury avatar ThomasMBury commented on August 20, 2024

Hi Dr. Boettiger,
Thank you for taking the time of going through our research and leaving these constructive comments! Reproducibility is important to us, so I will make the time to address your comments as best I can and will report back.
Best,
Tom

from deep-early-warnings-pnas.

cboettig avatar cboettig commented on August 20, 2024

Thanks! And sorry for the long list, it's mostly things I could figure out anyway but could help to clarify. The main part where I'm stuck is just how to load the best_model saved by the dl_train script and run it on some example timeseries data. Couldn't see where that step happens.

from deep-early-warnings-pnas.

cboettig avatar cboettig commented on August 20, 2024

@ThomasMBury Great to see your paper out in PNAS today! 🎉 Congrats again on a really nice analysis.

Also, any updates on this thread?

from deep-early-warnings-pnas.

ThomasMBury avatar ThomasMBury commented on August 20, 2024

Thank you @cboettig! I'm still working on making this repo as intuitive as possible for viewers to run on their own systems. So far I've made edits code in /training_data such that

  1. There are scripts to generate the full set of time series (e.g submit_multi_batch_500.py). Though these scripts are specific to the Slurm workload manager that we use to submit to CPU clusters at UWaterloo. Do you have any advice on how to share code that has been written for a specific cluster system? Is best practise to share a script that would run all the batches on a personal computer (even though this would take ages!).
  2. Time series length is now a command line parameter to run_single_batch.sh
  3. I've stored the training data on Zenodo (see Readme).

I've collected the code from my collaborator who trained the deep learning model, including the pkl files that contain the trained classifiers. It's in the repo under dl_train/. Combining the code to work with the same directory names etc. and writing the workflow is something I haven't go around to -- but is high up on my list of things to do (busy start to term!).

One other question regarding intermediate files. I uploaded intermediate files so viewers could run certain scripts such as the ROC computations from predictions made by the DL and EWS, without having to run the longer scripts of actually generating the training data / DL model. Is this appropriate, or should intermediate files be removed altogether?

Cheers,
Tom

from deep-early-warnings-pnas.

cboettig avatar cboettig commented on August 20, 2024

Hi @ThomasMBury , thanks for the update! Looking forward to trying it out, but in brief this sounds very good. A few replies:

In principle the cluster details can be abstracted away (e.g. in #rstats world drake and now targets do a reasonable job of handling execution over slurm or other queues as well as other parallelizations). But I'm not too worried about this part (I think this part worked out just fine for me after a few tweaks at least). The slurm script should help document more precisely what you ran, and having the simulated training data on Zenodo means that many users won't have to actually run it.

Having the pkl files is great. Definitely looking forward to trying the trained classifier out on some other timeseries. If you're really serious about this line of work, I think it wouldn't be bad idea to consider a little python packaging infrastructure so we can just install the trained classifier from PyPi (or at least from github) as a standard python module and call it on a given timeseries, but of course all software engineering takes time. Definitely looking forward to seeing the workflow code though, I think that will be a big help since that's where I got stuck on my first pass.

Personally I'm all for including intermediate files! I think the best-practices in the field always include these, but include some make-like workflow setup so that the end products can be quickly re-made from these stored intermediate products, but the ambitious user also has the option to run something like make clean to purge these intermediate objects and then see if they can reproduce things from scratch. (Again there's various prior art on this and some literature, but I know it more on the R side. I'm going to take the liberty of tagging @wlandau who is an expert in these things).

Cheers,
Carl

from deep-early-warnings-pnas.

wlandau avatar wlandau commented on August 20, 2024

Thanks, Carl. Anything specific I can help with?

Yes, intermediate files help break down the workflow into manageable pieces. targets does this by abstracting each target as an R object and saving it in a data store to be retrieved as an R object later. The intermediate file is there, but to the user, it behaves like an R variable. IIRC there are Python pipeline toolkits that do this as well. Maybe Prefect? Definitely Metaflow (although both are more Airflow-like than Make-like).

from deep-early-warnings-pnas.

wlandau avatar wlandau commented on August 20, 2024

https://github.com/pditommaso/awesome-pipeline has a bunch of other pipeline tools, mostly language-agnostic or Python-focused. Trying to remember the other one I saw with file/object abstraction, but it's not coming to me.

from deep-early-warnings-pnas.

ThomasMBury avatar ThomasMBury commented on August 20, 2024

Thanks @wlandau I'll check those out.

from deep-early-warnings-pnas.

ThomasMBury avatar ThomasMBury commented on August 20, 2024

Hi @cboettig, I've spent some time improving the workflow for training the DL classifiers (DL_training.py) and applying them to other time series data (DL_apply.py). They are also now connected via relative file paths. Let me know if you have any other comments/ issues with running the code. Thanks again for your feedback.

from deep-early-warnings-pnas.

cboettig avatar cboettig commented on August 20, 2024

Thanks for the heads up! We'll give it a spin.

from deep-early-warnings-pnas.

Related Issues (1)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.