This is a thread devoted to the parallel execution feature of SoS. Basically, we need

Confirmed. The took about 4s without the patch and 12s now. So the Celery stuff

The problem is with the pool.close() and <code class=

All bout the -j option. about sos HOT 19 CLOSED

vatlab commented on August 30, 2024

All bout the -j option.

from sos.

Comments (19)

gaow commented on August 30, 2024

It is extremely slow on my end to browse either site but from reading the home pages they both look promising for the job! As long as installation can be done via pip it should not be a problem to set it up.

from sos.

BoPeng commented on August 30, 2024

I have implemented a simple version of parallel execution that allows step processes within the same step to be executed in parallel. This is very crude but to do that I have already had to change SoS syntax to execute step processes in separate processes independent of SoS itself. This is certainly necessary otherwise no step process can be executed safely outside of SoS (e.g submit to Celery as independent task).

I am using multiprocessing pool but we can switch to python 3's async libraries or Celery as long as we have the DAG correct. I suspect that we can switch backend executioner for different running environment.

from sos.

gaow commented on August 30, 2024

Great and looking to test this feature in the next couple of days!

from sos.

BoPeng commented on August 30, 2024

Celery is officially used in SoS now because a multiprocessing bug (feature) prevents us from spawning a new workflow from a workflow step (nested workflow). Celery.billiard is used and works ok now.

Note that we do not need to have a DAG to execute steps in parallel. I mean, all we need to do is

get the dependencies of all steps.
put all steps in a pool, and execute any step (up to -j) with met dependencies.
Whenever a step is completed, update the pool and execute one or more steps with met dependencies.

SoS is now Celery ready (things are submitted in separate processes) so we can easily extend the code to cluster after the above is done.

from sos.

gaow commented on August 30, 2024

Great! But I noticed big performance issues after I upgraded. For example this one
test.sos.txt

Command sos run test.sos.txt DSC -d takes 12 seconds which I believe was 1 second before the update. Is it the same problem on your end, or something is wrong about my packages?

from sos.

BoPeng commented on August 30, 2024

Confirmed. The script took about 4s without the patch and 12s now. So the Celery stuff is very costly? This is certainly unexpected although I tend to think 8s is nothing for workflows if this is the cost of creating processes.

from sos.

gaow commented on August 30, 2024

I just worry that the slowness is proportional to the number of commands implied by the script (e.g. concurrent for loops) ... in that case parallelation with celery will actually harm ...

from sos.

BoPeng commented on August 30, 2024

The problem is with the pool.close() and pool.join(), which waits about 1 s. So there is about 1 second wait for the completion of each workflow and your example is the worst because it has 9 workflows. I am still investigating because some pool.close is fast.

ERROR: start waiting
ERROR: Step completed 0.006140947341918945
ERROR: Step completed 0.0060999393463134766
ERROR: results returned 0.007298946380615234
ERROR: wait join 1.0097160339355469
ERROR: Step completed 3.094782829284668
ERROR: start waiting
ERROR: results returned 1.7881393432617188e-05
ERROR: wait join 9.083747863769531e-05

from sos.

gaow commented on August 30, 2024

If the slowness is on workflow level then I agree we can live with it. There must be good reason celery decides to wait. But it is interesting the wait time differ by several magnitudes!

from sos.

gaow commented on August 30, 2024

Not sure how easy it is but if we make -j1 not using celery at all then at least dryrun will not be that frustrating, e.g. faking a "null" interface after celery and use that for j1.

from sos.

BoPeng commented on August 30, 2024

I have further investigated this issue and it turns out that pool.join is slow. The fast join is when there is only one process. The current behavior is wrong anyway because '-j1' should not trigger pool even when concurrent=True. I have fixed this issue so your example should run in sequential mode, and be fast without all the overhead of processes.

Overall I do not think this is an issue because step processes are supposed to be running much longer than 1s and can benefit from multiprocessing. Fast processes should be put without or before process so they will always be executed sequentially. SoS certainly allows such flexibility.

from sos.

gaow commented on August 30, 2024

Good! I confirm j1 works and I agree the overhead is acceptable.

from sos.

BoPeng commented on August 30, 2024

If I understand correctly, a so-called celery cluster requires us to

start celery worker process on a few computing nodes
start message passing between nodes
distribute tasks to workers

I like this approach because the current VPT approach requires us to use ssh headnode qsub jobs to submit big jobs and we have no control over the submitted jobs and can only quite or wait for the completion of the tasks.

Also, the celery.group etc and flower monitor system might be helpful for us.

from sos.

BoPeng commented on August 30, 2024

http://dask.pydata.org/en/latest/ also looks promising.

For record

http://distributed.readthedocs.org/en/latest/related-work.html

dask vs spark: http://dask.pydata.org/en/latest/spark.html

More on tasks and celery https://www.fullstackpython.com/task-queues.html

from sos.

BoPeng commented on August 30, 2024

You can also have a look at snakemake's dag class. At this point we can actually learn many things from snakemake, such as dag, cluster support etc. It is called stealing though. :-)

from sos.

gaow commented on August 30, 2024

Yes that's a 900 lines of python script. I was under the impression it requires the graph structure known at the beginning, thus I thought may not be a good option. Will see if that's the case.

Snakemake cluster support might have problems, though:

https://bitbucket.org/snakemake/snakemake/issues/84/high-virtual-memory-on-cluster-master-and#comment-13180538

from sos.

BoPeng commented on August 30, 2024

I see from the thread that snakemake runs on each node ... this is not what I have in mind because I would like to send the jobs to computing nodes. But that approach has the advantage that it might run a whole branch of jobs, instead of a single job, on a node...

from sos.

gaow commented on August 30, 2024

To me, if we cannot interact with the cluster directly and have to rely on qsub like snakemake, it is no better than we support an --export command that prepares all the resources and export commands / scripts to 'parallelable' batches, so that users can easily submit jobs. I think this is also easier to troubleshoot. On cluster system it helps to be more transparent. Attempts to interact with it may be ill-fated. The question on that thread may well be cluster environment specific but it ended up becoming snakemake's headache.

from sos.

BoPeng commented on August 30, 2024

Discussed on more specific threads.

from sos.

All bout the -j option. about sos HOT 19 CLOSED

Comments (19)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent