Coder Social home page Coder Social logo

Comments (4)

xiki-tempula avatar xiki-tempula commented on September 2, 2024 1

@jthorton Thanks for the explanation. I have been using TorsionDrive during my research and have done some dihedral parameterisation myself.
The TorsionDrive uses Work Queue from cctools for parameterisation. If I do
work_queue_worker --cores=96 to set up the worker
Then in the torsiondrive, I modified the source code a bit such that torsiondrive would submit a 8 core job whenever a job is ready.
Using this setup, for a molecule with 6 dihedrals, I could spawn 12 jobs * 8 cores in the beginning and then dynamically occupy all 96 cores. I wonder if the bespokenfit could have a similar setup? So we set the number of maximum number of cores and bespokenfit would dynamically fill in all the space.

With regard to BEFLOW_QC_COMPUTE_WORKER_N_TASKS.
I have export BEFLOW_QC_COMPUTE_WORKER_N_TASKS=2 but it seems that still only 5 8-cores jobs were running at the same time. Which is the same as export BEFLOW_QC_COMPUTE_WORKER_N_TASKS=1

from openff-bespokefit.

jthorton avatar jthorton commented on September 2, 2024 1

I like that idea for openff-bespoke executor run entry point so users could supply a total number of cores and it would spin up N workers with X tasks per worker to best get through the jobs I'll defiantly look into adding this feature!

With regard to BEFLOW_QC_COMPUTE_WORKER_N_TASKS.

I think you might be running into some settings choices we hardcoded here, we found that using 8 cores with Psi4 gave good performance so we only divide the tasks if each one can have at least 8 cores so you would need to give each worker 16 cores to have both tasks running. Maybe we could remove this hard set limit though and let users decide. I also think I am using the wrong torsion drive procedure here and need to change this to our custom parallel version. I'll make a PR to fix these two issues!

from openff-bespokefit.

jthorton avatar jthorton commented on September 2, 2024

Hi @xiki-tempula good question! this is a tricky one and its hard to know in advance how to best split the resources as this depends on the number of torsiondrive tasks produced for the molecule. Currently, each worker can consume 1 torsiondrive task at a time so the fact you have 5 activate tasks probably means the molecule makes 5 torsiondrives, so it would be better to decrease the number of workers but give them each more cores.

We can also add some parallelisation to the torsiondrive tasks by performing multiple constrained optimisations simultaneously by editing an environment variable (note there are a lot of variables in bespokefit see here) the important one is BEFLOW_QC_COMPUTE_WORKER_N_TASKS which controls how many parallel optimisations each worker can do in a torsiondrive. So doing export BEFLOW_QC_COMPUTE_WORKER_N_TASKS=2 before a run (you can also set this in your bashrc) would allow each worker to run up to 2 optimisations at a time.

I would also look at adding this to your run command --qc-compute-max-mem this controls how much memory per-core workers can use, by default it will try and give every worker access to all of the memory which can lead to segfaults.

from openff-bespokefit.

xiki-tempula avatar xiki-tempula commented on September 2, 2024

Hi, with regard to the parallelisation. I wonder if this is currently done at fragment level, where each fragment occupy a worker? Or at torsion level, where each torsion scan occupy a worker? Or at the TorsionDrive level, where TorsionDrive will attempt to do a forward and backward drive so each torsion will spawn at least two workers?

from openff-bespokefit.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.