Coder Social home page Coder Social logo

Comments (33)

justanhduc avatar justanhduc commented on May 27, 2024 9

Sounds like an interesting feature. I will keep this issue open for updates of this feature.

from task-spooler.

kylincaster avatar kylincaster commented on May 27, 2024 1

Hey @kylincaster. You made a PR in your fork. Could you please make the PR again in here?

Ok, I have done with the full detail about the feature/bug in my work.

from task-spooler.

wolfram77 avatar wolfram77 commented on May 27, 2024 1

I set TS_SOCKET=/tmp/ts.socket in /etc/environment and chmod 777 "$TS_SOCKET".

from task-spooler.

justanhduc avatar justanhduc commented on May 27, 2024

Hi @shaoyucheng. No, multiple users cannot share a queue. Each user will create their own server based on their UID.

from task-spooler.

shaoyucheng avatar shaoyucheng commented on May 27, 2024

Hi @shaoyucheng. No, multiple users cannot share a queue. Each user will create their own server based on their UID.

got it, i think it should be a good feature which will make your project like a enhanced version of atd service.

from task-spooler.

wolfram77 avatar wolfram77 commented on May 27, 2024

I need this too for our shared Volta GPU server.

from task-spooler.

wolfram77 avatar wolfram77 commented on May 27, 2024

It seems i was able to setup a shared queue with $TS_SOCKET, as mentioned in TRICKS. Thanks for making task-spooler.

from task-spooler.

justanhduc avatar justanhduc commented on May 27, 2024

Hi @wolfram77. Yes sharing the server file can be a quick and dirty way to share the queue, but be aware that it has a lot of limitations since jobs are user-independent (like -C will erase all your colleagues' queues, and -K can be invoked by anyone).

from task-spooler.

wolfram77 avatar wolfram77 commented on May 27, 2024

@justanhduc While trying it out yesterday i saw -K deletes the socket file. Again had to chmod it. It shouldnt be a problem, but now i put a message about in help text in the server.

from task-spooler.

fearedspark avatar fearedspark commented on May 27, 2024

I too would be interested in the multi-users mode, even if it means all users can kill tasks form anyone

from task-spooler.

justanhduc avatar justanhduc commented on May 27, 2024

Hey @fearedspark. Thanks for your interest. Indeed, there is a working prototype in the branch global. However, there's an ambiguity in setting the number of slots. Should we use the same or different number of slots for all users? What is the proper number? Or it has to be something that users should compromise? I am not able to come up with a good solution, so please suggest anything.

from task-spooler.

fearedspark avatar fearedspark commented on May 27, 2024

Well, I will speak about the way I'm managing it on our machine, and maybe it will provide some insight.
I have it configured as as many slots as there are threads on the machine. A user starting a task defines the number of slots it takes based on the number of threads it can use. It would be nice to have a default slot size that can be configured so that when a user doesn't give a number of slots, it defaults to the max.
Then each user is free to use how many slots they desire. This however works well if all the user behaves properly, which is the case for us. It could be a good idea to have a maximum number of slot allowed per user, defaulting to the max number of slots.

from task-spooler.

justanhduc avatar justanhduc commented on May 27, 2024

Hey @fearedspark. Yeah basically we still have to depend on the kindness of other users 😅. Then I will try to look at the prototype again and see whether I can make it stable or not. Thanks a lot for the initiatives!

from task-spooler.

kylincaster avatar kylincaster commented on May 27, 2024

Dear all,
I already developed a multi-user version at only for cpu-only at task-spooler
If you feel interesting or useful, maybe we could try to merge it back.
However, I am not a expert on linux. So there are still much space and bug to be improved.
Cheers

from task-spooler.

justanhduc avatar justanhduc commented on May 27, 2024

Dear all,

I already developed a multi-user version at only for cpu-only at task-spooler

If you feel interesting or useful, maybe we could try to merge it back.

However, I am not a expert on linux. So there are still much space and bug to be improved.

Cheers

Hey @kylincaster. Awesome! Would you mind sending a PR? I will try to review it and we can discuss more how to improve from there.

from task-spooler.

justanhduc avatar justanhduc commented on May 27, 2024

Dear all,

I already developed a multi-user version at only for cpu-only at task-spooler

If you feel interesting or useful, maybe we could try to merge it back.

However, I am not a expert on linux. So there are still much space and bug to be improved.

Cheers

Hey @kylincaster. Awesome! Would you mind sending a PR? I will try to review it and we can discuss more how to improve from there.

from task-spooler.

kylincaster avatar kylincaster commented on May 27, 2024

Dear all,
I already developed a multi-user version at only for cpu-only at task-spooler
If you feel interesting or useful, maybe we could try to merge it back.
However, I am not a expert on linux. So there are still much space and bug to be improved.
Cheers

Hey @kylincaster. Awesome! Would you mind sending a PR? I will try to review it and we can discuss more how to improve from there.

I just submit the PR. you could have a try @justanhduc

from task-spooler.

justanhduc avatar justanhduc commented on May 27, 2024

Hey @kylincaster. You made a PR in your fork. Could you please make the PR again in here?

from task-spooler.

kylincaster avatar kylincaster commented on May 27, 2024

@justanhduc I found if i wanted to precisely control the task, the PID of all subprocessors needed to be known in advance.
So I use a bash script to control the running state of the task.
The transfering of the bash script into a C code would be hard work.

from task-spooler.

justanhduc avatar justanhduc commented on May 27, 2024

Hi @kylincaster. Sorry for the late reply. What do you mean by "precise control"? What is your use case thay -p is not enough?

from task-spooler.

kylincaster avatar kylincaster commented on May 27, 2024

Hi, @justanhduc, I mean to pause or kill a process by ts. not only the process itself, but also all subprocesses should be handled. So a revursive code is necessary to find the PID for all subprocesses

from task-spooler.

justanhduc avatar justanhduc commented on May 27, 2024

Hi @kylincaster. To kill or pause a process and its children, can we just simply send the signal to the whole process group like the memo here? Or is there anything I missed?

from task-spooler.

kylincaster avatar kylincaster commented on May 27, 2024

Hi @justanhduc I ever try to kill the process directly. Unfortunately, the stop signal is not compatible for task with the subprocesses. The following is the example script which cannot be held on by kill -stop -- -XXX command

#!/bin/bash
#

for i in {2..1000}
do
        dt=`date`
        echo "output: ${dt} $i" >> log.txt
        sleep 1
done

with ts command ts mpirun -np 1 loop.sh
Only the parent process mpirun is paused rather than the bash subprocess

from task-spooler.

justanhduc avatar justanhduc commented on May 27, 2024

Hey @kylincaster. According to the documentation of mpirun 2.1.1 on Ubuntu 18.04, mpirun only propagates a selected number of signals. When dealing with such kinda program like mpirun, imo, ts has no authority to manipulate the created subprocesses because, well, it will violate the purpose of such program.

And specifically for your problem, be sure to check the Ubuntu version and mpirun version. If you run on 18.04 and mpirun 2.1.1 like me, I successfully stop/continue by the following commands

ts mpirun --mca orte_forward_job_control 1 -np 1 toy.sh
kill -20 $(ts -p <jobid>)  # stop the mpi process. Note that SIGSTOP does not work per documentation
kill -18 $(ts -p <jobid>)  # continue

Ps: Our discussion about sending signal seems not to be in the scope of this issue, so if you still have any problem it's better to open another ticket and we can continue there.

from task-spooler.

kylincaster avatar kylincaster commented on May 27, 2024

Thanks for @justanhduc's comments on the performance of mpirun.
Unfortunately, it depends on the implementation of MPI. The intel mpi processes didn't forwards such signal.
So my solution to this problem is the following bash code which will be called inside the task-spooler.

#!/bin/bash

# getting children generally resolves nicely at some point
get_child() {
    echo $(pgrep -laP $1 | awk '{print $1}')
}

get_children() {
    __RET=$(get_child $1)
    __CHILDREN=
    while [ -n "$__RET" ]; do
        __CHILDREN+="$__RET "
        __RET=$(get_child $__RET)
    done

    __CHILDREN=$(echo "${__CHILDREN}" | xargs | sort)

    echo "${__CHILDREN} $1"
}

if [ 1 -gt $# ]; 
then
    echo "not input PID"
    exit 1
fi

owner=`ps -o user= -p $1`
if [ -z "$owner" ]; 
then
    # echo "not a valid PID"
    exit 1
fi
pids=`get_children $1`

user=`whoami`

extra=""
if [[ "$owner" != "$user" ]]; then
    extra="sudo"
fi

for pid in ${pids}; 
do
    if [ -z $2 ]
    then
        echo "${extra} ${pid}"
    else
        ${extra} kill -s $2 ${pid}
    fi
done

from task-spooler.

sadikyalcin avatar sadikyalcin commented on May 27, 2024

It seems i was able to setup a shared queue with $TS_SOCKET, as mentioned in TRICKS. Thanks for making task-spooler.

Can you share details in how you got this setup? I've defined a socket but still can't see anything from other users... @justanhduc would you be able to help with this?

from task-spooler.

sadikyalcin avatar sadikyalcin commented on May 27, 2024

Thanks. I was calling tsp via a bash script - turns out environment variables aren't exposed to bash scripts by default.

What about your logs though? I've got the shared queue working but still can't access logs from tasks queued from other users.

from task-spooler.

wolfram77 avatar wolfram77 commented on May 27, 2024

Is it tsp? I am able to see the tasks queued by other users with ts or ts -l. I store the program output with a pipe like stdbuf --output=L ts -nf -N 32 ./a.out | tee -a "a.log" from a script. Are you interested in the program output of other users?

from task-spooler.

sadikyalcin avatar sadikyalcin commented on May 27, 2024

Is it tsp? I am able to see the tasks queued by other users with ts or ts -l. I store the program output with a pipe like stdbuf --output=L ts -nf -N 32 ./a.out | tee -a "a.log" from a script. Are you interested in the program output of other users?

I run a node process - which can take {x} duration which does print progress / res. Ie, the below has an error and is run by the webserver but tmp/ts-out.1LkaYj doesn't exist for me. I run apache and ssh into the server as the same user (ubuntu).

52 finished /tmp/ts-out.1LkaYj 1 84.95/1.43/0.16 {my_command}

Screenshot 2023-08-02 at 13 17 10

from task-spooler.

wolfram77 avatar wolfram77 commented on May 27, 2024

Could you try redirecting both stdout and stderr to a file? If that does not work for you, @justanhduc may be able to help you.

from task-spooler.

justanhduc avatar justanhduc commented on May 27, 2024

Hi @sadikyalcin @wolfram77. First of all, tsp is the original version, not the one in this fork. Please uninstall it using apt and install the one here using make cpu. If the same problem happens, could you see verify you have the right to write in /tmp? Also, why is the ts.socket file not in /tmp?

from task-spooler.

justanhduc avatar justanhduc commented on May 27, 2024

Also, if you want a proper multi-user task spooler, the fork of @kylincaster is probably a better choice.

from task-spooler.

kylincaster avatar kylincaster commented on May 27, 2024

Dear all,

If anyone is looking for a multi-queue task manager, you are welcome to try my fork at kylincaster/task-spooler-PLUS. It has been enhanced with numerous useful features, including multiple user support, fatal crash recovery, and processor allocation and binding.

Best regards,
Kylin

from task-spooler.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.