I'm running the openff-b

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Parallelisation with --n-qc-compute-workers about openff-bespokefit HOT 4 OPEN

openforcefield commented on September 2, 2024

Parallelisation with --n-qc-compute-workers

from openff-bespokefit.

Comments (4)

xiki-tempula commented on September 2, 2024 1

@jthorton Thanks for the explanation. I have been using TorsionDrive during my research and have done some dihedral parameterisation myself.
The TorsionDrive uses Work Queue from cctools for parameterisation. If I do
work_queue_worker --cores=96 to set up the worker
Then in the torsiondrive, I modified the source code a bit such that torsiondrive would submit a 8 core job whenever a job is ready.
Using this setup, for a molecule with 6 dihedrals, I could spawn 12 jobs * 8 cores in the beginning and then dynamically occupy all 96 cores. I wonder if the bespokenfit could have a similar setup? So we set the number of maximum number of cores and bespokenfit would dynamically fill in all the space.

With regard to BEFLOW_QC_COMPUTE_WORKER_N_TASKS.
I have export BEFLOW_QC_COMPUTE_WORKER_N_TASKS=2 but it seems that still only 5 8-cores jobs were running at the same time. Which is the same as export BEFLOW_QC_COMPUTE_WORKER_N_TASKS=1

from openff-bespokefit.

jthorton commented on September 2, 2024 1

I like that idea for openff-bespoke executor run entry point so users could supply a total number of cores and it would spin up N workers with X tasks per worker to best get through the jobs I'll defiantly look into adding this feature!

With regard to BEFLOW_QC_COMPUTE_WORKER_N_TASKS.

I think you might be running into some settings choices we hardcoded here, we found that using 8 cores with Psi4 gave good performance so we only divide the tasks if each one can have at least 8 cores so you would need to give each worker 16 cores to have both tasks running. Maybe we could remove this hard set limit though and let users decide. I also think I am using the wrong torsion drive procedure here and need to change this to our custom parallel version. I'll make a PR to fix these two issues!

from openff-bespokefit.

jthorton commented on September 2, 2024

Hi @xiki-tempula good question! this is a tricky one and its hard to know in advance how to best split the resources as this depends on the number of torsiondrive tasks produced for the molecule. Currently, each worker can consume 1 torsiondrive task at a time so the fact you have 5 activate tasks probably means the molecule makes 5 torsiondrives, so it would be better to decrease the number of workers but give them each more cores.

We can also add some parallelisation to the torsiondrive tasks by performing multiple constrained optimisations simultaneously by editing an environment variable (note there are a lot of variables in bespokefit see here) the important one is BEFLOW_QC_COMPUTE_WORKER_N_TASKS which controls how many parallel optimisations each worker can do in a torsiondrive. So doing export BEFLOW_QC_COMPUTE_WORKER_N_TASKS=2 before a run (you can also set this in your bashrc) would allow each worker to run up to 2 optimisations at a time.

I would also look at adding this to your run command --qc-compute-max-mem this controls how much memory per-core workers can use, by default it will try and give every worker access to all of the memory which can lead to segfaults.

from openff-bespokefit.

xiki-tempula commented on September 2, 2024

Hi, with regard to the parallelisation. I wonder if this is currently done at fragment level, where each fragment occupy a worker? Or at torsion level, where each torsion scan occupy a worker? Or at the TorsionDrive level, where TorsionDrive will attempt to do a forward and backward drive so each torsion will spawn at least two workers?

from openff-bespokefit.

Parallelisation with --n-qc-compute-workers about openff-bespokefit HOT 4 OPEN

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent