Comments (33)
Sounds like an interesting feature. I will keep this issue open for updates of this feature.
from task-spooler.
Hey @kylincaster. You made a PR in your fork. Could you please make the PR again in here?
Ok, I have done with the full detail about the feature/bug in my work.
from task-spooler.
I set TS_SOCKET=/tmp/ts.socket
in /etc/environment
and chmod 777 "$TS_SOCKET"
.
from task-spooler.
Hi @shaoyucheng. No, multiple users cannot share a queue. Each user will create their own server based on their UID.
from task-spooler.
Hi @shaoyucheng. No, multiple users cannot share a queue. Each user will create their own server based on their UID.
got it, i think it should be a good feature which will make your project like a enhanced version of atd service.
from task-spooler.
I need this too for our shared Volta GPU server.
from task-spooler.
It seems i was able to setup a shared queue with $TS_SOCKET, as mentioned in TRICKS. Thanks for making task-spooler.
from task-spooler.
Hi @wolfram77. Yes sharing the server file can be a quick and dirty way to share the queue, but be aware that it has a lot of limitations since jobs are user-independent (like -C
will erase all your colleagues' queues, and -K
can be invoked by anyone).
from task-spooler.
@justanhduc While trying it out yesterday i saw -K deletes the socket file. Again had to chmod it. It shouldnt be a problem, but now i put a message about in help text in the server.
from task-spooler.
I too would be interested in the multi-users mode, even if it means all users can kill tasks form anyone
from task-spooler.
Hey @fearedspark. Thanks for your interest. Indeed, there is a working prototype in the branch global
. However, there's an ambiguity in setting the number of slots. Should we use the same or different number of slots for all users? What is the proper number? Or it has to be something that users should compromise? I am not able to come up with a good solution, so please suggest anything.
from task-spooler.
Well, I will speak about the way I'm managing it on our machine, and maybe it will provide some insight.
I have it configured as as many slots as there are threads on the machine. A user starting a task defines the number of slots it takes based on the number of threads it can use. It would be nice to have a default slot size that can be configured so that when a user doesn't give a number of slots, it defaults to the max.
Then each user is free to use how many slots they desire. This however works well if all the user behaves properly, which is the case for us. It could be a good idea to have a maximum number of slot allowed per user, defaulting to the max number of slots.
from task-spooler.
Hey @fearedspark. Yeah basically we still have to depend on the kindness of other users 😅. Then I will try to look at the prototype again and see whether I can make it stable or not. Thanks a lot for the initiatives!
from task-spooler.
Dear all,
I already developed a multi-user version at only for cpu-only at task-spooler
If you feel interesting or useful, maybe we could try to merge it back.
However, I am not a expert on linux. So there are still much space and bug to be improved.
Cheers
from task-spooler.
Dear all,
I already developed a multi-user version at only for cpu-only at task-spooler
If you feel interesting or useful, maybe we could try to merge it back.
However, I am not a expert on linux. So there are still much space and bug to be improved.
Cheers
Hey @kylincaster. Awesome! Would you mind sending a PR? I will try to review it and we can discuss more how to improve from there.
from task-spooler.
Dear all,
I already developed a multi-user version at only for cpu-only at task-spooler
If you feel interesting or useful, maybe we could try to merge it back.
However, I am not a expert on linux. So there are still much space and bug to be improved.
Cheers
Hey @kylincaster. Awesome! Would you mind sending a PR? I will try to review it and we can discuss more how to improve from there.
from task-spooler.
Dear all,
I already developed a multi-user version at only for cpu-only at task-spooler
If you feel interesting or useful, maybe we could try to merge it back.
However, I am not a expert on linux. So there are still much space and bug to be improved.
CheersHey @kylincaster. Awesome! Would you mind sending a PR? I will try to review it and we can discuss more how to improve from there.
I just submit the PR. you could have a try @justanhduc
from task-spooler.
Hey @kylincaster. You made a PR in your fork. Could you please make the PR again in here?
from task-spooler.
@justanhduc I found if i wanted to precisely control the task, the PID of all subprocessors needed to be known in advance.
So I use a bash script to control the running state of the task.
The transfering of the bash script into a C code would be hard work.
from task-spooler.
Hi @kylincaster. Sorry for the late reply. What do you mean by "precise control"? What is your use case thay -p
is not enough?
from task-spooler.
Hi, @justanhduc, I mean to pause or kill a process by ts. not only the process itself, but also all subprocesses should be handled. So a revursive code is necessary to find the PID for all subprocesses
from task-spooler.
Hi @kylincaster. To kill or pause a process and its children, can we just simply send the signal to the whole process group like the memo here? Or is there anything I missed?
from task-spooler.
Hi @justanhduc I ever try to kill
the process directly. Unfortunately, the stop signal is not compatible for task with the subprocesses. The following is the example script which cannot be held on by kill -stop -- -XXX
command
#!/bin/bash
#
for i in {2..1000}
do
dt=`date`
echo "output: ${dt} $i" >> log.txt
sleep 1
done
with ts command ts mpirun -np 1 loop.sh
Only the parent process mpirun is paused rather than the bash subprocess
from task-spooler.
Hey @kylincaster. According to the documentation of mpirun
2.1.1 on Ubuntu 18.04, mpirun
only propagates a selected number of signals. When dealing with such kinda program like mpirun
, imo, ts
has no authority to manipulate the created subprocesses because, well, it will violate the purpose of such program.
And specifically for your problem, be sure to check the Ubuntu version and mpirun
version. If you run on 18.04 and mpirun
2.1.1 like me, I successfully stop/continue by the following commands
ts mpirun --mca orte_forward_job_control 1 -np 1 toy.sh
kill -20 $(ts -p <jobid>) # stop the mpi process. Note that SIGSTOP does not work per documentation
kill -18 $(ts -p <jobid>) # continue
Ps: Our discussion about sending signal seems not to be in the scope of this issue, so if you still have any problem it's better to open another ticket and we can continue there.
from task-spooler.
Thanks for @justanhduc's comments on the performance of mpirun.
Unfortunately, it depends on the implementation of MPI. The intel mpi processes didn't forwards such signal.
So my solution to this problem is the following bash code which will be called inside the task-spooler.
#!/bin/bash
# getting children generally resolves nicely at some point
get_child() {
echo $(pgrep -laP $1 | awk '{print $1}')
}
get_children() {
__RET=$(get_child $1)
__CHILDREN=
while [ -n "$__RET" ]; do
__CHILDREN+="$__RET "
__RET=$(get_child $__RET)
done
__CHILDREN=$(echo "${__CHILDREN}" | xargs | sort)
echo "${__CHILDREN} $1"
}
if [ 1 -gt $# ];
then
echo "not input PID"
exit 1
fi
owner=`ps -o user= -p $1`
if [ -z "$owner" ];
then
# echo "not a valid PID"
exit 1
fi
pids=`get_children $1`
user=`whoami`
extra=""
if [[ "$owner" != "$user" ]]; then
extra="sudo"
fi
for pid in ${pids};
do
if [ -z $2 ]
then
echo "${extra} ${pid}"
else
${extra} kill -s $2 ${pid}
fi
done
from task-spooler.
It seems i was able to setup a shared queue with $TS_SOCKET, as mentioned in TRICKS. Thanks for making task-spooler.
Can you share details in how you got this setup? I've defined a socket but still can't see anything from other users... @justanhduc would you be able to help with this?
from task-spooler.
Thanks. I was calling tsp
via a bash script - turns out environment variables aren't exposed to bash scripts by default.
What about your logs though? I've got the shared queue working but still can't access logs from tasks queued from other users.
from task-spooler.
Is it tsp
? I am able to see the tasks queued by other users with ts
or ts -l
. I store the program output with a pipe like stdbuf --output=L ts -nf -N 32 ./a.out | tee -a "a.log"
from a script. Are you interested in the program output of other users?
from task-spooler.
Is it
tsp
? I am able to see the tasks queued by other users withts
orts -l
. I store the program output with a pipe likestdbuf --output=L ts -nf -N 32 ./a.out | tee -a "a.log"
from a script. Are you interested in the program output of other users?
I run a node process - which can take {x} duration which does print progress / res. Ie, the below has an error and is run by the webserver but tmp/ts-out.1LkaYj
doesn't exist for me. I run apache and ssh into the server as the same user (ubuntu
).
52 finished /tmp/ts-out.1LkaYj 1 84.95/1.43/0.16 {my_command}
from task-spooler.
Could you try redirecting both stdout and stderr to a file? If that does not work for you, @justanhduc may be able to help you.
from task-spooler.
Hi @sadikyalcin @wolfram77. First of all, tsp
is the original version, not the one in this fork. Please uninstall it using apt
and install the one here using make cpu
. If the same problem happens, could you see verify you have the right to write in /tmp
? Also, why is the ts.socket
file not in /tmp
?
from task-spooler.
Also, if you want a proper multi-user task spooler, the fork of @kylincaster is probably a better choice.
from task-spooler.
Dear all,
If anyone is looking for a multi-queue task manager, you are welcome to try my fork at kylincaster/task-spooler-PLUS. It has been enhanced with numerous useful features, including multiple user support, fatal crash recovery, and processor allocation and binding.
Best regards,
Kylin
from task-spooler.
Related Issues (20)
- Advice on how to cancel (kill or remove) task HOT 5
- Prompt to uninstall the apt installation of tsp before running ts in README HOT 1
- install breaks without CUDA HOT 1
- Bug: cannot add a very long command to queue HOT 14
- Structured output HOT 2
- Please edit the README HOT 2
- make cpu giving error: implicitly declaring library function 'snprintf' with type 'int HOT 2
- Evaluate $(...) in commands at run not at enqueue HOT 2
- Separate logging and queueing? HOT 3
- Using `-n` `-f` flags: pass through SIGINT (and other signals?) HOT 1
- ts -F stochastically crashes the server HOT 3
- Contributors HOT 7
- GUI addon link point to 404 HOT 3
- asynchronous launch HOT 6
- Timeout HOT 3
- json format for listing jobs HOT 1
- Unable to redirect output from command line HOT 2
- Enhancement request: support for priorities HOT 2
- Enhancement request: ability to postpone jobs HOT 2
- Enhancement request: bigger queue size HOT 9
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from task-spooler.