runpod / runpodctl Goto Github PK
View Code? Open in Web Editor NEW๐งฐ | RunPod CLI for pod management
Home Page: https://www.runpod.io/
License: GNU General Public License v3.0
๐งฐ | RunPod CLI for pod management
Home Page: https://www.runpod.io/
License: GNU General Public License v3.0
When I'm doing runpodctl start pod {podId}
is there any way to pass in a command argument to the pod? Like send the docker command, or something that would be appended to the docker command, or set a bash environment variable, any other way I can pass an argument string to the pod from my remote command line where I'm invoking runpodctl? My goal here is to be able to start a pod remotely and point it at a target URL that it should process. I know I can set a startup docker command from within the web interface, but I'm hoping to be able to do something like that from the command line.
Failed to deploy project: Your worker concurrency cannot go beyond the maximum limit of (20). Please contact support if you wish to scale past this number.
Perhaps, this could be checked on before having to wait a few minutes when deploying a new endpoint.
I could see how that experience would frustrate a user.
I ran this command.
CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \
--stage sft \
--model_name_or_path openlm-research/open_llama_7b \
--do_train \
--dataset train \
--template default \
--finetuning_type lora \
--lora_target q_proj,v_proj \
--output_dir checkpoint \
--overwrite_cache \
--per_device_train_batch_size 4 \
--gradient_accumulation_steps 4 \
--lr_scheduler_type cosine \
--logging_steps 10 \
--save_steps 2000 \
--learning_rate 5e-5 \
--num_train_epochs 3.0 \
--plot_loss \
--fp16
[INFO|training_args.py:1345] 2023-12-07 06:09:02,164 >> Found safetensors installation, but --save_safetensors=False. Safetensors should be a preferred weights saving format due to security and performance reasons. If your model cannot be saved by safetensors please feel free to open an issue at https://github.com/huggingface/safetensors!
[INFO|training_args.py:1798] 2023-12-07 06:09:02,164 >> PyTorch: setting up devices
[INFO|trainer.py:1760] 2023-12-07 06:09:03,760 >> ***** Running training *****
[INFO|trainer.py:1761] 2023-12-07 06:09:03,761 >> Num examples = 78,303
[INFO|trainer.py:1762] 2023-12-07 06:09:03,761 >> Num Epochs = 3
[INFO|trainer.py:1763] 2023-12-07 06:09:03,761 >> Instantaneous batch size per device = 4
[INFO|trainer.py:1766] 2023-12-07 06:09:03,761 >> Total train batch size (w. parallel, distributed & accumulation) = 16
[INFO|trainer.py:1767] 2023-12-07 06:09:03,761 >> Gradient Accumulation steps = 4
[INFO|trainer.py:1768] 2023-12-07 06:09:03,761 >> Total optimization steps = 14,682
[INFO|trainer.py:1769] 2023-12-07 06:09:03,762 >> Number of trainable parameters = 4,194,304
0%| | 0/14682 [00:00<?, ?it/s][WARNING|logging.py:290] 2023-12-07 06:09:03,766 >> You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
warnings.warn(
Traceback (most recent call last):
File "/workspace/LLaMA-Factory/src/train_bash.py", line 14, in <module>
main()
File "/workspace/LLaMA-Factory/src/train_bash.py", line 5, in main
run_exp()
File "/workspace/LLaMA-Factory/src/llmtuner/train/tuner.py", line 26, in run_exp
run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
File "/workspace/LLaMA-Factory/src/llmtuner/train/sft/workflow.py", line 68, in run_sft
train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1591, in train
return inner_training_loop(
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1950, in _inner_training_loop
self.accelerator.clip_grad_norm_(
File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 2040, in clip_grad_norm_
self.unscale_gradients()
File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 2003, in unscale_gradients
self.scaler.unscale_(opt)
File "/usr/local/lib/python3.10/dist-packages/torch/cuda/amp/grad_scaler.py", line 307, in unscale_
optimizer_state["found_inf_per_device"] = self._unscale_grads_(
File "/usr/local/lib/python3.10/dist-packages/torch/cuda/amp/grad_scaler.py", line 229, in _unscale_grads_
raise ValueError("Attempting to unscale FP16 gradients.")
ValueError: Attempting to unscale FP16 gradients.
ๆๆจๅคฉไฝฟ็จๆๆฏๆญฃๅธธ็๏ผไฝ็ถๆไปๅคฉๆน่ฎไบ่ณๆ้ๅคงๅฐๅพๅบ็พไบ้ๅๅ้ก๏ผ่ซๅๆฏ็ผ็ไบ็้บผไบๅข?
Using runpodctl v1.8.0.
I have been trying to send a 172MB file in the last hour without any success. I keep retrying to no avail.
Sometimes when I send it will just stop in the middle of the job, and it stays like that, like frozen.
$ runpodctl send samples.zip
Sending 'samples.zip' (172.4 MB)
Code is: 1100-yahoo-boat-friend-0
On the other computer run
runpodctl receive 1100-yahoo-boat-friend-0
Sending (->XX.XX.XXX:40806)
samples.zip 90% |โโโโโโโโโโโโโโโโโโ | (156/172 MB, 478.021 kB/s) [3m42s:34s]
... so it never finishing downloading at my end.
But then there's another problem, when the sending actually completes with 100%, my receiving end (say my PC) will stop receiving in the middle of it. Again, like frozen. It's like a communication breakdown and thing doesn't know what to do next so it stays in a frozen state.
That would be helpful
Like
runpodctl send "t112_38080.safetensors","t112_38080.yaml"
The Windows install URL in the Readme is outdated and no longer works
wget https://github.com/runpod/runpodctl/releases/download/v1.9.0/runpodctl-windows-amd64.exe -O runpodctl.exe
Needs to be updated to
wget https://github.com/runpod/runpodctl/releases/download/v1.14.2/runpodctl-windows-amd64.exe -O runpodctl.exe
Not really a ๐ bug but not the expected behavior:
When you run runpodctl project start
you get a No 'runpod.toml' found in the current directory.
error - when it should be like unknown command
or alias for create.
rp % runpodctl version
runpodctl v1.14.1
rp % runpodctl project start
No 'runpod.toml' found in the current directory.
Please navigate to your project directory and try again.
rp % runpodctl project h
Develop and deploy projects entirely on RunPod's infrastructure.
Usage:
runpodctl project [command]
Available Commands:
build builds Dockerfile for current project
create Creates a new project
deploy deploys your project as an endpoint
dev Start a development session for the current project
Flags:
-h, --help help for project
Use "runpodctl project [command] --help" for more information about a command.
rp % runpodctl project cre
ate
Welcome to the RunPod Project Creator!
--------------------------------------
Provide a name for your project:
>
Hi!
I think it's much better to add homebrew installation option.
How do you think?
Thanks
The RunPod CLI tool to manage resources on runpod.io and develop serverless applications.
Usage:
runpodctl [command]
Aliases:
runpodctl, runpod
Available Commands:
completion Generate the autocompletion script for the specified shell
config Manage CLI configuration
create create a resource
exec Execute commands in a pod
get get resource
help Help about any command
project Manage RunPod projects
receive receive file(s), or folder
remove remove a resource
send send file(s), or folder
ssh SSH keys and commands
start start a resource
stop stop a resource
update update runpodctl
Flags:
-h, --help help for runpodctl
-v, --version Print the version of runpodctl
Some start with caps others do not. Should be consistent.
Also dont use (s):
Don't put optional plurals in parentheses. Instead, use either plural or singular constructions and keep things consistent throughout your documentation. Choose what is most appropriate for your documentation and your audience. If it's important in a specific context to indicate both, use one or more.
Can't receive data from runpod (docker image with no scp support)
$ runpodctl receive 1208-goat-boat-screen
panic: runtime error: index out of range [4] with length 4
goroutine 1 [running]:
cli/cmd/croc.glob..func1(0xc4c9e0, {0xc0000f1620, 0x1, 0x1})
/home/runner/work/runpodctl/runpodctl/cmd/croc/receive.go:47 +0x3d3
github.com/spf13/cobra.(*Command).execute(0xc4c9e0, {0xc0000f1600, 0x1, 0x1})
/home/runner/go/pkg/mod/github.com/spf13/[email protected]/command.go:860 +0x5f8
github.com/spf13/cobra.(*Command).ExecuteC(0xc4bae0)
/home/runner/go/pkg/mod/github.com/spf13/[email protected]/command.go:974 +0x3bc
github.com/spf13/cobra.(*Command).Execute(...)
/home/runner/go/pkg/mod/github.com/spf13/[email protected]/command.go:902
cli/cmd.Execute({0x9571a4, 0xc0000001a0})
/home/runner/work/runpodctl/runpodctl/cmd/root.go:26 +0x4a
main.main()
/home/runner/work/runpodctl/runpodctl/main.go:8 +0x27
root@bed533d5304a:/workspace/stable-diffusion-webui# ls -l
total 680
Is it possible to receive a file and change its name upon receiving it ?
For instance, say I'm sending samples.zip
but on my receiving end I'd like it to unzip it in a folder named samples-2.
runpodctl receive 5261-goat-module-brasil-8 samples-2
More specifically say I need to review from my pc a remote folder that has changing data in it such as logs or images which are being created every n minutes and I'd like to keep track of changes in different folders.
wget --quiet --show-progress https://github.com/Run-Pod/runpodctl/releases/download/v1.6.1/runpodctl-linux-amd -O runpodctl
chmod +x runpodctl
cp runpodctl /usr/bin/runpodctl
every single call to both the api and using runpodctl ends with errors like:
context deadline exceeded (Client.Timeout exceeded while awaiting headers)
This should be as simple as setting goarch=arm and arm64 and goos=android and linux, but my fork failed to build for some reason.I'm not familiar with this release please, that might be part of it?
How do we run an existing pod without creating a new pod in google colab?
Is there a way to watch container logs when starting a pod using this command line tool?
I'm not getting the ID from an active pod.
# runpodctl get pod
Error: data is nil: {"data":{"myself":null}}
Hi, is there any way to update container image for my running pod, just like edit pod
option?
It seems that it's only possible to create a new pod with a new gpu using create
command, but not with the gpu I already owned.
I hope there's a way to update container image only, not changing pod id & gpu.
Looks like graphql spec can return information about pod ip.
I wanna to create pod with ctl and then connect with ssh to created pod.
I tried to use runpodctl to upload dataset around 100G to runpod. to receive the files, I had to start the pod...however it has take the whole day, which means I pay the gpus for the whole day but get no chance to use it because runpodctl always fails..
How to create pod with an existing network volume attached using runpodctl
?
Can't seem to find it in the documentation.
Many thanks in advance.
Hello,
I would like to create non-gpu Pod for quick experimenting, before running GPU Pod. I cannot create CPU Pod, because runpodctl requires gpuType.
This was working for me just fine and then randomly out of the blue, running runpodctl send <file>
just exits without saying anything.
This happens both locally and on the pod itself. Is there any way to get some verbose output / logging info so I can help you troubleshoot?
I'm running it on a macbook m2, just installed it today v1.9.0. Same behavior on the pod itself so I don't know if it matters.
panic: runtime error: index out of range [4] with length 4
goroutine 1 [running]:
cli/cmd/croc.glob..func1(0xc4c9e0, {0xc000121610, 0x1, 0x1})
/home/runner/work/runpodctl/runpodctl/cmd/croc/receive.go:47 +0x3d3
github.com/spf13/cobra.(*Command).execute(0xc4c9e0, {0xc0001215f0, 0x1, 0x1})
/home/runner/go/pkg/mod/github.com/spf13/[email protected]/command.go:860 +0x5f8
github.com/spf13/cobra.(*Command).ExecuteC(0xc4bae0)
/home/runner/go/pkg/mod/github.com/spf13/[email protected]/command.go:974 +0x3bc
github.com/spf13/cobra.(*Command).Execute(...)
/home/runner/go/pkg/mod/github.com/spf13/[email protected]/command.go:902
cli/cmd.Execute({0x9571a4, 0xc0000001a0})
/home/runner/work/runpodctl/runpodctl/cmd/root.go:26 +0x4a
main.main()
/home/runner/work/runpodctl/runpodctl/main.go:8 +0x27
Hi is it possible to get balance information via runpodctl or via different call graphql or via SDK ?
I need this information for automatic monitoring purpose, and send alert when balance is low.
When you install runpodctl, you can't confirm a successful installation by checking the version. You get an error telling you to run runpodctl config
. After adding an API key, runpodctl version
works as expected.
It would be awesome if macOS homebrew users could install this from brew
:
Using the web interface I'm able to Deploy a spot Instance instead of an On Demand instance. It would be nice to be able to do this using the command line tool too.
I tried naively replacing podFindAndDeployOnDemand with podRentInterruptable, but this failed. I have no idea if this was a permission problem, a server problem, or a client problem. (If I could get it to work, I'd provide a pull request.) I can see the current spot price using runpodctl get cloud
. Once I create a pod through the web, I am able to see it and stop it using the command line interface.
I found this documentation.
Could you please make AUR package:
I was sending a stable diffusion model which is 2 gigabytes, but around 90%, the transfer just stopped. This happened the other day too, but at 80%.
After creating a pod with runpodctl, how can I get the same connection information that I get on the console to access the pod. I am talking about the ssh connection info (e.g: ssh [email protected] -i ~/.ssh/id_ed1111111)
Right now I have to login to the console to get this information. What is the preffered way to get this from the CLI ?
is there a way to obtain hostname and port of Pod's ssh using runpodctl? I would like to automate benchmarking my models, but I need to automate ssh connection.
In order to be able to orchestrate Serverless Runpod.io deployments as part of a continuous deployment workflow it would be desirable to be able to update the Serverless template using runpodctl. Specifically to change the Container image
setting on the template to point to a new version of the image.
Pointing the template to the :latest
label runs the risk of docker pull caches being out of sync and running an old version of the image. And it makes rollback difficult too.
Ideally I'd like it to be possible to execute a runpodctl command and point it to an existing Serverless template to a new image URL
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.